# Spotify API

Goal of the assignment: create and save a dataset containing information about every song in a given playlist by requesting data from Spotify's API. You will then use this dataset during the Artifical Intelligence I course to train a predictive model.

## Getting client credentials

Spotify's API uses OAuth as an Authentication scheme. Hence, before starting to make requests, we need to get client credentials to the Spotify API. For this, we need to have a Spotify account and then, "Create an app". Once the app has been created, we an see a “Client ID” and “Client Secret” on the left-hand side. These numbers correspond to our client credentials.

The following variables, <i>client_id</i> and <i>client_secret</i>, store your ID and Key, respectively

In [80]:
client_id= "901214bb5c8f4d7f8c9f353d0e69d912"
client_secret="ec5aa5b8630044dbacbc4b2c0fcd9be1"

## Getting an access token

To access the various endpoints of the Spotify API, we need to pass an access token. Therefore, we need to pass a ```POST``` request with our client credentials. This request creates a token resource in the server and respond back with it.

In [81]:
import requests

# URL for token resource
auth_url = 'https://accounts.spotify.com/api/token'

# request body
params = {'grant_type': 'client_credentials',
          'client_id': client_id,
          'client_secret': client_secret}

# POST the request
auth_response = requests.post(auth_url, params)

In [82]:
# convert the response to JSON
auth_response_data = auth_response.json()

# save the access token to a new variable
access_token = auth_response_data['access_token']

## Accessing the API

The API provides numerous endpoints to access things like album listings, artist information, playlists, even Spotify-generated audio analysis of individual tracks, which include their time signature or measurements such as their “danceability” or "loudness". All the information is available by reading the [Docs](https://developer.spotify.com/documentation/web-api/reference/).

Spotify's API expects us to include our access token in the requests header using a specific header called 'Authorization' for this purpose.

In [84]:
headers = {
    'Authorization': 'Bearer {token}'.format(token=access_token)
}

In order to get a feel of how the API works, we begin by making a ```GET``` requests to the ```audio-features``` endpoint to extract data for a specific track. For instance, let's retrieve all the information for Radiohead's **Creep**. 

The first thing you need is to identify the appropriate URL or path to direct your request to. The urls for all Spotify API endpoints follow the same structure, they use the base URL for the API and are then defined as a concatenication of ```base_url + endpoint```.

The ```base_url``` is defined below:

In [85]:
base_url = 'https://api.spotify.com/v1/'

And the endpoint for this case is defined as:

In [86]:
endpoint = 'audio-features'

In [87]:
url = base_url + endpoint

Checking the documentation we see that the ```audio-features``` endpoint takes the following query parameters.

<img src="https://www.dropbox.com/s/s4zs6wlue0u16cu/body.png?raw=1" width="500">

To extract data about Radiohead's Creep song, we need to locate its ```id```. This is its unique identifier.

![Creep](https://www.dropbox.com/s/kufj6ww2yn069gb/creep.png?raw=1)

We can get the ```id``` for any song by going to Spotify, looking for the song, clicking the “…” by the song name, then “Share” and then “Copy Spotify URI”. 

<i>track_id</i> stores the ID for Radiohead's song Creep.

In [88]:
track_id="6b2oQwSGFkzsMtQruIWm2p"

We need to provide the body in dictionary form using a variable called *params*.

In [89]:
params = {'ids': [track_id]}

Running the GET request to retrieve the data for the "Creep" song. Storing the response in a new variable called <i>creep</i>

In [90]:
creep=requests.get(url, params=params, headers=headers)
creep

<Response [200]>

Converting the response to JSON format to be able to manipulate it with greater ease.

In [91]:
creep=creep.json()
creep

{'audio_features': [{'acousticness': 0.0102,
   'analysis_url': 'https://api.spotify.com/v1/audio-analysis/6b2oQwSGFkzsMtQruIWm2p',
   'danceability': 0.515,
   'duration_ms': 238640,
   'energy': 0.43,
   'id': '6b2oQwSGFkzsMtQruIWm2p',
   'instrumentalness': 0.000141,
   'key': 7,
   'liveness': 0.129,
   'loudness': -9.935,
   'mode': 1,
   'speechiness': 0.0369,
   'tempo': 91.841,
   'time_signature': 4,
   'track_href': 'https://api.spotify.com/v1/tracks/6b2oQwSGFkzsMtQruIWm2p',
   'type': 'audio_features',
   'uri': 'spotify:track:6b2oQwSGFkzsMtQruIWm2p',
   'valence': 0.104}]}

In [92]:
creep["audio_features"][0]["danceability"]

0.515

## Getting data from a playlist

Here, we build a dataset containing data about different songs. We used the playlist proposed by the proffessor for this purpose. This is the [link](https://open.spotify.com/playlist/4NVeFUEHBybfh3ITNG1b8n?si=js9BKt5aTOiCWMm_Cx4Vvg) to the playlist.

<i>playlist_id</i> storing the id of your playlist of choice.

In [94]:
playlist_id="4NVeFUEHBybfh3ITNG1b8n"

Retrieving all the information about the chosen playlist in JSON form. Storing the response in a variable called <i>playlist</i>.

In [95]:
url= base_url+"playlists/"+playlist_id+"/tracks"
playlist=requests.get(url, headers=headers).json()

In [96]:
playlist.keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [97]:
playlist["items"][0]

{'added_at': '2020-10-11T08:39:57Z',
 'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/niakha'},
  'href': 'https://api.spotify.com/v1/users/niakha',
  'id': 'niakha',
  'type': 'user',
  'uri': 'spotify:user:niakha'},
 'is_local': False,
 'primary_color': None,
 'track': {'album': {'album_type': 'album',
   'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/4Z8W4fKeB5YxbusRsdQVPb'},
     'href': 'https://api.spotify.com/v1/artists/4Z8W4fKeB5YxbusRsdQVPb',
     'id': '4Z8W4fKeB5YxbusRsdQVPb',
     'name': 'Radiohead',
     'type': 'artist',
     'uri': 'spotify:artist:4Z8W4fKeB5YxbusRsdQVPb'}],
   'available_markets': ['AD',
    'AE',
    'AL',
    'AR',
    'AT',
    'AU',
    'BA',
    'BE',
    'BG',
    'BH',
    'BO',
    'BR',
    'BY',
    'CA',
    'CH',
    'CL',
    'CO',
    'CR',
    'CY',
    'CZ',
    'DE',
    'DK',
    'DO',
    'DZ',
    'EC',
    'EE',
    'EG',
    'ES',
    'FI',
    'FR',
    'GB',
    'GR',
    'GT',

In [98]:
playlist["items"][0]["track"].keys()

dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'episode', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track', 'track_number', 'type', 'uri'])

<div class="alert alert-info"><b>Task </b>Write the code to extract data about all the tracks included in your chosen playlist and save them into a pandas DataFrame object under the name <i>df</i>. The DataFrame should include the <i>album</i>, <i>artists</i>, <i>disc_number</i>, <i>duration_ms</i>, <i>explicit</i>, <i>name</i>, <i>popularity</i>, <i>release_date</i>, <i>track_number</i>, <i>uri</i>, <i>danceability</i>, <i>energy</i>, <i>key</i>, <i>loudness</i>, <i>mode</i>, <i>speechness</i>, <i>acousticness</i>, <i>instrulmentallness</i>, <i>liveness</i>, <i>valence</i> and <i>tempo</i> of every song. Use these same names as column names. In addition, your DataFrame should also include the total number of <i>followers</i>, the first listed <i>genre</i> and the <i>popularity</i> for the artists of each of the tracks. Store these data in columns called 'followers', 'genres' and 'artist_popularity'. The columns of your DataFrame should be ordered alphabetically. Use default index values.</div>

In [101]:
name=[]
album=[]
popularity=[]
artists=[]
disc_number=[]
duration_ms=[]
explicit=[]
track_number=[]
uri=[]
release_date=[]
for i in playlist["items"]:
    release_date.append(i["track"]["album"]["release_date"])
    uri.append(i["track"]["uri"])
    track_number.append(i["track"]["track_number"])
    explicit.append(i["track"]["explicit"])
    duration_ms.append(i["track"]["duration_ms"])
    disc_number.append(i["track"]["disc_number"])
    artists.append(i["track"]["artists"][0]["name"])
    popularity.append(i["track"]["popularity"])
    album.append(i["track"]["album"]["name"])
    name.append(i["track"]["name"])

**Retrieving the information for the artists**

In [102]:
# ex: playlist["items"][0]["track"]["artists"][0]["id"]
id_artist=[]
for i in playlist["items"]:
    id_artist.append(i["track"]["artists"][0]["id"]) #obtaining the id of each one of the artists

In [103]:
url_artist=base_url+"artists/"
artist_id={'ids': id_artist[0]}
artist_data=requests.get(url_artist,params=artist_id, headers=headers).json()
print(artist_data)
print(artist_data["artists"][0].keys())
print(artist_data["artists"][0]["followers"]["total"])
print(artist_data["artists"][0]["genres"][0])
print(artist_data["artists"][0]["popularity"])
#just to know how to find the right elements

{'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/4Z8W4fKeB5YxbusRsdQVPb'}, 'followers': {'href': None, 'total': 5599419}, 'genres': ['alternative rock', 'art rock', 'melancholia', 'oxford indie', 'permanent wave', 'rock'], 'href': 'https://api.spotify.com/v1/artists/4Z8W4fKeB5YxbusRsdQVPb', 'id': '4Z8W4fKeB5YxbusRsdQVPb', 'images': [{'height': 640, 'url': 'https://i.scdn.co/image/afcd616e1ef2d2786f47b3b4a8a6aeea24a72adc', 'width': 640}, {'height': 320, 'url': 'https://i.scdn.co/image/563754af10b3d9f9f62a3458e699f58c4a02870f', 'width': 320}, {'height': 160, 'url': 'https://i.scdn.co/image/4067ea225d8b42fa6951857d3af27dd07d60f3c6', 'width': 160}], 'name': 'Radiohead', 'popularity': 82, 'type': 'artist', 'uri': 'spotify:artist:4Z8W4fKeB5YxbusRsdQVPb'}]}
dict_keys(['external_urls', 'followers', 'genres', 'href', 'id', 'images', 'name', 'popularity', 'type', 'uri'])
5599419
alternative rock
82


In [106]:
followers=[]
genres=[]
artist_popularity=[]
url_artist=base_url+"artists/"
for i in id_artist:
    art_temp=requests.get(url_artist,params={"ids":i}, headers=headers).json()
    followers.append(art_temp["artists"][0]["followers"]["total"])
    genres.append(art_temp["artists"][0]["genres"][0])
    artist_popularity.append(art_temp["artists"][0]["popularity"])

**Retrieving the information for each track**

In [107]:
id_song=[]
for i in playlist["items"]:
    id_song.append(i["track"]["id"]) #Obtaining the id of each one of the songs

In [108]:
#Personal notes
url_track=base_url+endpoint
trial={'ids': id_song} #It did not work as expected since it only retrived the information of the first song in the list
track_data=requests.get(url_track,params=trial, headers=headers).json()
track_data

{'audio_features': [{'acousticness': 0.0102,
   'analysis_url': 'https://api.spotify.com/v1/audio-analysis/6b2oQwSGFkzsMtQruIWm2p',
   'danceability': 0.515,
   'duration_ms': 238640,
   'energy': 0.43,
   'id': '6b2oQwSGFkzsMtQruIWm2p',
   'instrumentalness': 0.000141,
   'key': 7,
   'liveness': 0.129,
   'loudness': -9.935,
   'mode': 1,
   'speechiness': 0.0369,
   'tempo': 91.841,
   'time_signature': 4,
   'track_href': 'https://api.spotify.com/v1/tracks/6b2oQwSGFkzsMtQruIWm2p',
   'type': 'audio_features',
   'uri': 'spotify:track:6b2oQwSGFkzsMtQruIWm2p',
   'valence': 0.104}]}

In [109]:
#danceability, energy, key, loudness, mode, speechness, acousticness, instrulmentallness, liveness, valence and tempo of every song. 
acousticness=[] 
danceability=[]
energy=[]
key=[]
loudness=[]
mode=[]
speechiness=[]
instrumentalness=[]
liveness=[]
valence=[]
tempo=[]
url_track=base_url+endpoint
for i in id_song:
    track_temp=requests.get(url_track,params={'ids': i}, headers=headers).json()
    acousticness.append(track_temp["audio_features"][0]["acousticness"])
    danceability.append(track_temp["audio_features"][0]["danceability"])
    energy.append(track_temp["audio_features"][0]["energy"])
    key.append(track_temp["audio_features"][0]["key"])
    loudness.append(track_temp["audio_features"][0]["loudness"])
    mode.append(track_temp["audio_features"][0]["mode"])
    speechiness.append(track_temp["audio_features"][0]["speechiness"])
    instrumentalness.append(track_temp["audio_features"][0]["instrumentalness"])
    liveness.append(track_temp["audio_features"][0]["liveness"])
    valence.append(track_temp["audio_features"][0]["valence"])
    tempo.append(track_temp["audio_features"][0]["tempo"])

**Creating the table**

In [110]:
import pandas as pd

In [111]:
df = pd.DataFrame(data={"acousticness":acousticness,
                        "danceability":danceability,
                        "energy":energy,
                        "key":key,
                        "loudness":loudness,
                        "mode":mode,
                        "speechiness":speechiness,
                        "instrumentalness":instrumentalness,
                        "liveness":liveness,
                        "valence":valence,
                        "tempo":tempo,
                        "followers":followers,
                        "genres":genres,
                        "artist_popularity":artist_popularity,
                        "album":album,
                        "artists":artists,
                        "disc_number":disc_number,
                        "duration_ms": duration_ms,
                        "explicit": explicit,
                        "popularity":popularity,
                        "release_date":release_date,
                        "track_number":track_number,
                        "uri":uri,
                        "name":name
                        })
df=df[sorted(df.columns)]
df

Unnamed: 0,acousticness,album,artist_popularity,artists,danceability,disc_number,duration_ms,energy,explicit,followers,genres,instrumentalness,key,liveness,loudness,mode,name,popularity,release_date,speechiness,tempo,track_number,uri,valence
0,0.010200,Pablo Honey,82,Radiohead,0.515,1,238640,0.430,True,5599419,alternative rock,0.000141,7,0.1290,-9.935,1,Creep,82,1993-02-22,0.0369,91.841,2,spotify:track:6b2oQwSGFkzsMtQruIWm2p,0.104
1,0.046900,The Eraser,64,Thom Yorke,0.613,1,289826,0.791,True,628104,art pop,0.708000,0,0.1280,-7.293,1,Black Swan,57,2006-07-10,0.0336,101.066,4,spotify:track:4VbV8Zyjuu1qz0QteX1wVC,0.509
2,0.327000,New Energy,66,Four Tet,0.551,1,252255,0.469,False,478960,alternative dance,0.161000,2,0.0939,-8.393,1,Two Thousand and Seventeen,63,2017-09-29,0.0296,75.495,2,spotify:track:2ZIaH69kaz55RM4Pjx6KXl,0.498
3,0.071800,The Bends,82,Radiohead,0.418,1,257480,0.383,False,5599419,alternative rock,0.017700,4,0.0896,-11.782,1,High And Dry,73,1995-03-28,0.0257,87.773,3,spotify:track:5jafMI8FLibnjkYTZ33m0c,0.352
4,0.062600,OK Computer,82,Radiohead,0.360,1,264066,0.505,False,5599419,alternative rock,0.000092,7,0.1720,-9.129,1,Karma Police,74,1997-05-28,0.0260,74.807,6,spotify:track:3SVAN3BRByDmHOhKyIDxfC,0.317
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0.792000,Conor Oberst,57,Conor Oberst,0.631,1,244173,0.566,False,129888,indie folk,0.000023,9,0.1080,-10.194,1,Cape Canaveral,25,2008-01-01,0.0473,126.166,1,spotify:track:5RlSc9SkmmV1Ma4Hd7TQYZ,0.631
96,0.808000,Around The Well,71,Iron & Wine,0.610,1,251280,0.275,False,915127,acoustic pop,0.913000,6,0.4030,-10.644,1,Such Great Heights,0,2009-05-19,0.0280,94.088,11,spotify:track:7vcuTZAFyu0Z5dgMRLR0h0,0.552
97,0.000226,Places,49,Lou Doillon,0.610,1,244280,0.645,False,59347,french indie pop,0.000068,7,0.1100,-8.321,0,Devil Or Angel,37,2012-01-01,0.0280,95.891,2,spotify:track:2B0vzJSvp2Y0Bx8HUN4jyT,0.284
98,0.367000,Moondance,78,Van Morrison,0.608,1,205613,0.524,False,1835548,classic rock,0.002540,8,0.1150,-10.266,1,Into the Mystic - 2013 Remaster,74,1970-02,0.0309,86.204,5,spotify:track:3lh3iiiJeiBXHSZw6u0kh6,0.797


## Saving the data


In [112]:
df.to_csv('spotify.csv', sep=',')