# Collecting Information and Audio Features into a Dataframe

## Importing Libaries

First we import the libraries necessary for using the Spotify API. Fortunately, Spotify has created an API wrapper called `spotipy` which allows us to interact with their API from a top level. 

We have two options for interacting with their API: Authorization Flow and Client Credentials. The Authorization flow is used when we need personal information from a Spotify user like their playlists, likes etc. Client Credentials flow is used to pull general information about Spotify's tracks, playlists etc. We will use the _Client Credentials_ flow. 

In [1]:
import os
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from dotenv import load_dotenv, dotenv_values

The dotenv library helps us load our environment variables. You can create your environment variables in a `.env` file.

Now, we have created a `.env` file in our local machines which has our API information like Client_ID, Client_Secret and Redirect_URL. You can create a similar file in the same directory that has this file. The template for the file is given below for your convenience. Save the file as `.env`

```
    CLIENT_ID="your_client_id_here"
    CLIENT_SECRET="your_client_secret_here"
    REDIRECT_URL="your_redirect_url_here"

```

## Loading environment variables and storing them

In [2]:
load_dotenv()

True

In [4]:
client_id = os.getenv("CLIENT_ID")
client_secret = os.getenv("CLIENT_SECRET")
redirect_url = os.getenv("REDIRECT_URL")

## Creating an API session

In [5]:
auth_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(auth_manager=auth_manager)

## Extracting information from the API

In [6]:
#we are using playlists from two categories - mood and toplists
toplists = sp.category_playlists(category_id="toplists", limit=30)
mood = sp.category_playlists(category_id="mood", limit=30)

#extract playlist information like playlist name and id
#will be used for extracting songs from each playlists and their audio features
def get_playlist_info():
    playlist_lists = []
    for playlist in toplists['playlists']['items']:
        playlist_id = playlist['id']
        playlist_lists.append(playlist_id)

    for playlist in mood['playlists']['items']:
        playlist_id = playlist['id']
        playlist_lists.append(playlist_id)
    
    return playlist_lists


Since there is a limit in Spotify WEB API for the number of requests, we are better off first collecting all the track ids in a single list.
Then we can partiton it into sublists with 100 track ids (we are allowed to fetch features for a max of 100 ids at a time). This will allow us to use a single request for multiple track ids.

In [24]:
#extract songs from a playlist and return a list of rows that can be inserted into a dataframe
def get_playlist_tracks(playlist_id):
    track_ids = []
    track_names =[]
    track_artists = []
    tracks = sp.playlist_tracks(playlist_id)['items']
    for track in tracks:
        track_ids.append(track['track']['id'])
        track_names.append(track['track']['name'])
        track_artists.append(track['track']['artists'])
    return track_ids, track_names, track_artists
        
#extract songs from all playlist 
def extract_track_ids(playlist_lists):
    all_track_ids = []
    all_track_names = []
    all_track_artists = []
    for playlist in playlist_lists:
        track_ids, track_names, track_artists = get_playlist_tracks(playlist)
        all_track_ids.extend(track_ids)
        all_track_names.extend(track_names)
        all_track_artists.extend(track_artists)
        
    return all_track_ids, all_track_names, all_track_artists

#partition function
def partition(lst, size):
    split = [lst[i:i+size] for i in range(0,len(lst),size)]
    return split

   


In [39]:
#getting necessary columns
playlist_ids = get_playlist_info() #list of playlist ids
track_ids, track_names, track_artists = extract_track_ids(playlist_ids)

In [52]:
artists = []
for i in range(0,len(track_artists)):
    artists.append(track_artists[i][0]['name'])

In [21]:
part = partition(track_ids,100) #list of partitioned tracks
audio_features = []
for p in part:
    audio_features.extend(sp.audio_features(p))

In [53]:
#checking if all lengths are same
print("No of artists: ", len(artists))
print("No. of track names: ", len(track_names))
print("No of track ids: ", len(track_ids))
print("No. of tracks with audio features: ", len(audio_features))

No of artists:  3127
No. of track names:  3127
No of track ids:  3127
No. of tracks with audio features:  3127


### All seem to be well. We can now collect it into a Pandas dataframe.

## Collecting extracted information in a Pandas dataframe

In [56]:
import pandas as pd
import numpy as np

df = pd.DataFrame(audio_features)
df.head()


Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.928,0.458,11,-6.751,1,0.187,0.0175,0.000446,0.118,0.242,144.995,audio_features,1a73gcEg6h6Re6hHXoVltJ,spotify:track:1a73gcEg6h6Re6hHXoVltJ,https://api.spotify.com/v1/tracks/1a73gcEg6h6R...,https://api.spotify.com/v1/audio-analysis/1a73...,172138,4
1,0.943,0.558,2,-4.911,1,0.0568,0.0026,2e-06,0.0937,0.606,104.983,audio_features,4xhsWYTOGcal8zt0J161CU,spotify:track:4xhsWYTOGcal8zt0J161CU,https://api.spotify.com/v1/tracks/4xhsWYTOGcal...,https://api.spotify.com/v1/audio-analysis/4xhs...,138411,4
2,0.75,0.733,6,-3.18,0,0.0319,0.256,0.0,0.114,0.844,111.018,audio_features,3rUGC1vUpkDG9CZFHMur1t,spotify:track:3rUGC1vUpkDG9CZFHMur1t,https://api.spotify.com/v1/tracks/3rUGC1vUpkDG...,https://api.spotify.com/v1/audio-analysis/3rUG...,131872,1
3,0.559,0.551,5,-7.231,1,0.132,0.141,0.0,0.11,0.392,143.008,audio_features,4iZ4pt7kvcaH6Yo8UoZ4s2,spotify:track:4iZ4pt7kvcaH6Yo8UoZ4s2,https://api.spotify.com/v1/tracks/4iZ4pt7kvcaH...,https://api.spotify.com/v1/audio-analysis/4iZ4...,201800,4
4,0.628,0.523,11,-8.307,0,0.0946,0.701,0.00274,0.219,0.416,169.982,audio_features,5mjYQaktjmjcMKcUIcqz4s,spotify:track:5mjYQaktjmjcMKcUIcqz4s,https://api.spotify.com/v1/tracks/5mjYQaktjmjc...,https://api.spotify.com/v1/audio-analysis/5mjY...,172965,4


In [57]:
#adding other columns
df.insert(loc=0, column='track_ids', value=track_ids)
df.insert(loc=1, column='track_name', value=track_names)
df.insert(loc=2, column='artists', value=artists)
df.head()

Unnamed: 0,track_ids,track_name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,1a73gcEg6h6Re6hHXoVltJ,FTCU,Nicki Minaj,0.928,0.458,11,-6.751,1,0.187,0.0175,...,0.118,0.242,144.995,audio_features,1a73gcEg6h6Re6hHXoVltJ,spotify:track:1a73gcEg6h6Re6hHXoVltJ,https://api.spotify.com/v1/tracks/1a73gcEg6h6R...,https://api.spotify.com/v1/audio-analysis/1a73...,172138,4
1,4xhsWYTOGcal8zt0J161CU,Lovin On Me,Jack Harlow,0.943,0.558,2,-4.911,1,0.0568,0.0026,...,0.0937,0.606,104.983,audio_features,4xhsWYTOGcal8zt0J161CU,spotify:track:4xhsWYTOGcal8zt0J161CU,https://api.spotify.com/v1/tracks/4xhsWYTOGcal...,https://api.spotify.com/v1/audio-analysis/4xhs...,138411,4
2,3rUGC1vUpkDG9CZFHMur1t,greedy,Tate McRae,0.75,0.733,6,-3.18,0,0.0319,0.256,...,0.114,0.844,111.018,audio_features,3rUGC1vUpkDG9CZFHMur1t,spotify:track:3rUGC1vUpkDG9CZFHMur1t,https://api.spotify.com/v1/tracks/3rUGC1vUpkDG...,https://api.spotify.com/v1/audio-analysis/3rUG...,131872,1
3,4iZ4pt7kvcaH6Yo8UoZ4s2,Snooze,SZA,0.559,0.551,5,-7.231,1,0.132,0.141,...,0.11,0.392,143.008,audio_features,4iZ4pt7kvcaH6Yo8UoZ4s2,spotify:track:4iZ4pt7kvcaH6Yo8UoZ4s2,https://api.spotify.com/v1/tracks/4iZ4pt7kvcaH...,https://api.spotify.com/v1/audio-analysis/4iZ4...,201800,4
4,5mjYQaktjmjcMKcUIcqz4s,Strangers,Kenya Grace,0.628,0.523,11,-8.307,0,0.0946,0.701,...,0.219,0.416,169.982,audio_features,5mjYQaktjmjcMKcUIcqz4s,spotify:track:5mjYQaktjmjcMKcUIcqz4s,https://api.spotify.com/v1/tracks/5mjYQaktjmjc...,https://api.spotify.com/v1/audio-analysis/5mjY...,172965,4


### Dropping irrelavant columns

In [59]:
df.columns

Index(['track_ids', 'track_name', 'artists', 'danceability', 'energy', 'key',
       'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness',
       'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href',
       'analysis_url', 'duration_ms', 'time_signature'],
      dtype='object')

In [64]:
df.drop(['id','track_href','analysis_url'],axis=1,inplace=True)
df.head()

Unnamed: 0,track_ids,track_name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,duration_ms,time_signature
0,1a73gcEg6h6Re6hHXoVltJ,FTCU,Nicki Minaj,0.928,0.458,11,-6.751,1,0.187,0.0175,0.000446,0.118,0.242,144.995,audio_features,spotify:track:1a73gcEg6h6Re6hHXoVltJ,172138,4
1,4xhsWYTOGcal8zt0J161CU,Lovin On Me,Jack Harlow,0.943,0.558,2,-4.911,1,0.0568,0.0026,2e-06,0.0937,0.606,104.983,audio_features,spotify:track:4xhsWYTOGcal8zt0J161CU,138411,4
2,3rUGC1vUpkDG9CZFHMur1t,greedy,Tate McRae,0.75,0.733,6,-3.18,0,0.0319,0.256,0.0,0.114,0.844,111.018,audio_features,spotify:track:3rUGC1vUpkDG9CZFHMur1t,131872,1
3,4iZ4pt7kvcaH6Yo8UoZ4s2,Snooze,SZA,0.559,0.551,5,-7.231,1,0.132,0.141,0.0,0.11,0.392,143.008,audio_features,spotify:track:4iZ4pt7kvcaH6Yo8UoZ4s2,201800,4
4,5mjYQaktjmjcMKcUIcqz4s,Strangers,Kenya Grace,0.628,0.523,11,-8.307,0,0.0946,0.701,0.00274,0.219,0.416,169.982,audio_features,spotify:track:5mjYQaktjmjcMKcUIcqz4s,172965,4


In [65]:
df.drop(['type'],inplace=True,axis=1)
df.head()

Unnamed: 0,track_ids,track_name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri,duration_ms,time_signature
0,1a73gcEg6h6Re6hHXoVltJ,FTCU,Nicki Minaj,0.928,0.458,11,-6.751,1,0.187,0.0175,0.000446,0.118,0.242,144.995,spotify:track:1a73gcEg6h6Re6hHXoVltJ,172138,4
1,4xhsWYTOGcal8zt0J161CU,Lovin On Me,Jack Harlow,0.943,0.558,2,-4.911,1,0.0568,0.0026,2e-06,0.0937,0.606,104.983,spotify:track:4xhsWYTOGcal8zt0J161CU,138411,4
2,3rUGC1vUpkDG9CZFHMur1t,greedy,Tate McRae,0.75,0.733,6,-3.18,0,0.0319,0.256,0.0,0.114,0.844,111.018,spotify:track:3rUGC1vUpkDG9CZFHMur1t,131872,1
3,4iZ4pt7kvcaH6Yo8UoZ4s2,Snooze,SZA,0.559,0.551,5,-7.231,1,0.132,0.141,0.0,0.11,0.392,143.008,spotify:track:4iZ4pt7kvcaH6Yo8UoZ4s2,201800,4
4,5mjYQaktjmjcMKcUIcqz4s,Strangers,Kenya Grace,0.628,0.523,11,-8.307,0,0.0946,0.701,0.00274,0.219,0.416,169.982,spotify:track:5mjYQaktjmjcMKcUIcqz4s,172965,4


## Packaging it into a csv file

In [67]:
df.to_csv('spotifytrackinfo.csv')