# Assignment 1

This notebook contains a set of exercises that will guide you through the different steps of this assignment. The aim of this assignment is to create and save a dataset containing information about every song in a given playlist by requesting data from Spotify's API. You will then use this dataset during the Artifical Intelligence I course to train a predictive model.

<div class="alert alert-danger"><b>Submission deadline:</b> Thursday, November 12, 20:00</div>

Read carefully the following instructions before starting the exercises.

### Instructions

- Write your code in the dedicated cells. You can use as many cells as you like. Just make sure to include all the necessary code before the corresponding test.

- This notebook is automatically graded. This means that there are several cells embedded into the notebook that take care of checking your code and grading it. It also means that it is important **to follow the instructions for each of the exercises** to make sure that yo do everything right.

- Checks use the ```nose``` library to assert different conditions.

- The tests for the introductory exercises will be open for you to see. This will help you understand how the pipeline works and check that you got the basics right. You can run these checks as many times as you want, **as long as you don't modify them**.  

- The tests for the remaining exercises will remain hidden. It is important that you **do not write any code, nor do you remove the cells left in blank** for this purpose. 

Before moving on, please run the following cell. You only need run it once in order to install the ```nose``` library.

In [1]:
pip install nose



## Getting client credentials

Spotify's API uses OAuth as an Authentication scheme. Hence, before starting to make requests, you need to get your client credentials to the Spotify API. 

![Dashboard](https://www.dropbox.com/s/cpfepk5fbq6ic5a/dashboard.png?raw=1)


To do so, you need to have a Spotify account (free or paid). If you don't have one yet, please create a free account before moving on. Once you do, head over to Spotify for Developers and open your [Dashboard](https://developer.spotify.com/dashboard/) and log in with your account. 

<img src="https://www.dropbox.com/s/afubgs4ar99uh80/app.png?raw=1" width="300">

Click on “CREATE AN APP”, choose a name and description for your project and work your way through the checkboxes. Don't worry about the actual name and description. The only thing we are interested in is getting the credentials.

![Credentials](https://www.dropbox.com/s/3mmxxeet61nha4l/credentials.png?raw=1)

Once your App has been created, you should see a “Client ID” and “Client Secret” on the left-hand side. These numbers correspond to your client credentials.

<div class="alert alert-info"><b>Exercise </b>Create two new variables, <i>client_id</i> and <i>client_secret</i>, that store your ID and Key, respectively</div>

In [2]:
client_id = 'bdc731dd8ead4573b05e46dc4557c4d7'
client_secret = '31d2248f78ab48e3a5a210873d206a25'

Great! We are good to go. Next step is getting an access token.

## Getting an access token

In order to access the various endpoints of the Spotify API, we need to pass an access token. 

To get one, we need to pass a ```POST``` request with our client credentials. This request will create a token resource in the server and respond back with it. We can build this ```POST``` request using ```requests``` library. remember that this library contains all the different methods available when interacting with an API. You can build the ```POST``` request by running the following cell:

In [3]:
import requests

# URL for token resource
auth_url = 'https://accounts.spotify.com/api/token'

# request body
params = {'grant_type': 'client_credentials',
          'client_id': client_id,
          'client_secret': client_secret}

# POST the request
auth_response = requests.post(auth_url, params)

Retrieve the body of the response in JSON format and store your token in a new variable called *access_token* by running the cell below. Take note of the different steps.

In [4]:
# convert the response to JSON
auth_response_data = auth_response.json()

# save the access token to a new variable
access_token = auth_response_data['access_token']

access_token

'BQA7wA7IvSzaZ6ggPe0oIXc8Q_zMAeuIDjRAa36QS55s_wBcO5ot0G1KpSlkVLwzJdo2vVpO6I87LT2MuiA'

Once you do, you can take a look at how your token looks like. You'll see that it is just a long alphanumeric string, like the client credentials you obtained in the previous step. This is your golden ticket to access Spotify's API. A copy of this string is now stored in the server, so that everytime to the API the server will check that the token you provide and the one it has in store match.

## Poking around

This API provides numerous endpoints to access things like album listings, artist information, playlists, even Spotify-generated audio analysis of individual tracks, which include their time signature or measurements such as their “danceability” or "loudness". You can take a look at all the information available by reading the [Docs](https://developer.spotify.com/documentation/web-api/reference/). In this assignment you will have to use different endpoints.

Independently of the specific data you want to retrieve, you need to send a properly formed GET request to the API server. As you should know by now, this request is composed of different elements. 

<img src="https://www.dropbox.com/s/hgb02k4h1mtdv22/header.png?raw=1" width="500">

As opposed to NASA's API, where we provided our API Key as part of the request body, Spotify's API expects you to include your access token in the requests header. There is a specific header called 'Authorization' for this purpose. Providing this information is sometimes tricky. Hence, the header has already been formatted for you. Just run the following cell to save save the header in a new variable so that you can use it later on.

In [5]:
headers = {
    'Authorization': 'Bearer {token}'.format(token=access_token)
}

In order to get a feel of how the API works, we will begin by making a ```GET``` requests to the ```audio-features``` endpoint to extract data for a specific track. In particular, let's retrieve all the information for Radiohead's **Creep**. 

The first thing you need is to identify the appropriate URL or path to direct your request to. The urls for all Spotify API endpoints follow the same structure. They all use the base URL for the API and are then defined as a concatenication of ```base_url + endpoint```. Sometimes, you will also need to provide some additional information as part of the path. In the case of ```audio-features```, it is enough with just the ```base_url``` and the ```endpoint``` name.

The ```base_url``` is defined below:

In [6]:
base_url = 'https://api.spotify.com/v1/'

And the endpoint for this case is, as we said, defined as:

In [7]:
endpoint = 'audio-features'

Hence, we can define the url as follows:

In [8]:
url = base_url + endpoint
url

'https://api.spotify.com/v1/audio-features'

Next thing we need is to fill in the request body. If you check the documentation you'll see that the ```audio-features``` endpoint takes the following query parameters.

<img src="https://www.dropbox.com/s/s4zs6wlue0u16cu/body.png?raw=1" width="500">

Hence, the final thing you need to extract data about Radiohead's Creep song is to locate its ```id```. This is its unique identifier. Spotify has unique ids for tracks, for artists, for albums, for playlists, etc.

![Creep](https://www.dropbox.com/s/kufj6ww2yn069gb/creep.png?raw=1)

You can get the ```id``` for any song by going to Spotify, looking for the song, clicking the “…” by the song name, then “Share” and then “Copy Spotify URI”. 

<div class="alert alert-warning">Note that this procedure also works for retrieving ids for artists, albums or any other resource type.</div>

This URI should be a string that includes something like **spotify:track:**, followed by an alphanumeric sequence. This sequence is the ID you are looking for.

<div class="alert alert-info"><b>Exercise </b>Create a new variable called <i>track_id</i> that stores the ID for Radiohead's song Creep.</div>

In [9]:
track_id = '6b2oQwSGFkzsMtQruIWm2p'

Now that we have the id, let's format the body of our request. As we did for NASA's API, we'll provide the body in dictionary form using a variable called *params*. Remember that the keys of this dictionary should correspond to the different query parameters defined in the documentation.

In [10]:
params = {'ids': [track_id]}

Now that everything is ready, you can run the actual GET request to retrieve the data.

<div class="alert alert-info"><b>Exercise </b>Write the code to make your get request using the requests library. When doing so, remember to pass the <i>url</i>, the <i>headers</i> and the <i>params</i> dictionary as arguments to the <i>get</i> functions. Store the response in a new variable called <i>creep</i>.</div>

In [11]:
creep = requests.get (url, headers = headers, params = params)
creep

<Response [200]>

<div class="alert alert-warning">If you leave your notebook open for too long, your token might expire. When this happens, you will get an error {'error': {'status': 401, 'message': 'The access token expired'}} when making your request to the server. No worries. Just renew your token by executing the corresponding cell again and you should be good to go</div>

Finally, let's convert the response to JSON format to be able to manipulate it with greater ease.

<div class="alert alert-info"><b>Exercise </b>Write the code to convert the response to JSON format. Keep the name <i>creep</i>.</div>

In [12]:
creep = creep.json()

You can run the following cell to check if you obtained the right answer. If you get no error when running the cell it means that you did right. Otherwise, revise your code to ensure you get no error. You can run this cell as many times as you want, just **remember not to modify it**

In [13]:
from nose.tools import assert_equal
assert_equal(creep, {'audio_features': [{'danceability': 0.515,
                                         'energy': 0.43,
                                         'key': 7,
                                         'loudness': -9.935,
                                         'mode': 1,
                                         'speechiness': 0.0369,
                                         'acousticness': 0.0102,
                                         'instrumentalness': 0.000141,
                                         'liveness': 0.129,
                                         'valence': 0.104,
                                         'tempo': 91.841,
                                         'type': 'audio_features',
                                         'id': '6b2oQwSGFkzsMtQruIWm2p',
                                         'uri': 'spotify:track:6b2oQwSGFkzsMtQruIWm2p',
                                         'track_href': 'https://api.spotify.com/v1/tracks/6b2oQwSGFkzsMtQruIWm2p',
                                         'analysis_url': 'https://api.spotify.com/v1/audio-analysis/6b2oQwSGFkzsMtQruIWm2p',
                                         'duration_ms': 238640,
                                         'time_signature': 4}]})

Congrats! You just made your first successful request to Spotify's API! 

Feel free to take a look at the information stored in the response. Pay special attention to the way in which the different information is stored. Remember that the JSON format works basically as a dictionary, so may want to check the slides on dictionaries before goign further, to make sure you have them under control.

Once you are done, let's move on to some actual work!

## Getting data from a playlist

In the following exercise you will build a dataset containing data about different songs. You can either use a playlist of your own, or use the one we have created for this purpose. You can find our playlist in the following [link](https://open.spotify.com/playlist/4NVeFUEHBybfh3ITNG1b8n?si=js9BKt5aTOiCWMm_Cx4Vvg). If you prefer not to use this, feel free to complete the exercise with a playlist of your choosing. 

<div class="alert alert-info"><b>Exercise </b>Create a variable called <i>playlist_id</i> that stores the id of your playlist of choice.</div>

In [14]:
playlist_id = '33poPcRzf9sSmNJO2qECov'

Next step will be making a request to extract all the data bout your chosen playlist. Remember that you can take a look at all the information available at the different endpoints in Spotify's API by reading the [Docs](https://developer.spotify.com/documentation/web-api/reference/).

Locate the right endpoint for your query and read the Docs to find out how you should build your request. Don't use any of the optional arguments.

<div class="alert alert-info"><b>Exercise </b>Write the code to retrieve all the information about the chosen playlist in JSON form. Store the response in a new variable called <i>playlist</i>. When building your request, don't provide any information in the body. </div>

In [15]:
endpoint2 = 'playlists/33poPcRzf9sSmNJO2qECov/tracks'
url2 = base_url + endpoint2
params = {'ids': [playlist_id]}
playlist = requests.get(url2,headers=headers,params=params)
playlist = playlist.json()

<div class="alert alert-warning">By default, Spotify's API only returns information about a maximum of 100 tracks in a playlist. There are ways to overcome this and I'll be happy to explain if you are interested. For now, though, let's keep it just like that. So, if your playlist of choice has more that 100 tracks, you'll get info about the first 100 of them.</div>

The following cells check whether your request is correct. Please **don't write any code here**. Just leave them blank

In [16]:
# LEAVE BLANK

In [17]:
# LEAVE BLANK

Once again, take your time to familiarize yourself with the data and how they are presented.

<div class="alert alert-info"><b>Exercise </b>Write the code to extract data about all the tracks included in your chosen playlist and save them into a pandas DataFrame object under the name <i>df</i>. The DataFrame should include the <i>album</i>, <i>artists</i>, <i>disc_number</i>, <i>duration_ms</i>, <i>explicit</i>, <i>name</i>, <i>popularity</i>, <i>release_date</i>, <i>track_number</i>, <i>uri</i>, <i>danceability</i>, <i>energy</i>, <i>key</i>, <i>loudness</i>, <i>mode</i>, <i>speechness</i>, <i>acousticness</i>, <i>instrulmentallness</i>, <i>liveness</i>, <i>valence</i> and <i>tempo</i> of every song. Use these same names as column names. In addition, your DataFrame should also include the total number of <i>followers</i>, the first listed <i>genre</i> and the <i>popularity</i> for the artists of each of the tracks. Store these data in columns called 'followers', 'genres' and 'artist_popularity'. The columns of your DataFrame should be ordered alphabetically. Use default index values.</div>

You can take a look at the following image for reference. 

![Dataframe](https://www.dropbox.com/s/42exa8hn43f9nyp/dataframe.png?raw=1)

In [18]:
# Track ID: playlist['items'][0]['track']['album']['id']
track_id = []
artist_id = []
track_list = playlist['items']
iterations = range(len(playlist['items'])) 

for i in iterations:
  track_id.append(track_list[i]['track']['id'])
  artist_id.append(track_list[i]['track']['artists'][0]['id'])

track_id_css = ",".join(track_id) # Comma-separated string for track info

artist_id1 = artist_id[0:50]
artist_id2 = artist_id[50:101]

artist_id_css1 = ",".join(artist_id1) # Comma-separated string for artists info - first 50
artist_id_css2 = ",".join(artist_id2) # Comma-separated string for artists info - second 50

In [19]:
# Create lists
album = []
artist = []
disc_number = []
duration_ms = []
explicit = []
name = []
popularity = []
release_date = []
track_number = []
uri = []

for i in iterations:
  album.append(track_list[i]['track']['album']['name'])
  artist.append(track_list[i]['track']['artists'][0]['name'])
  disc_number.append(track_list[i]['track']['disc_number'])
  duration_ms.append(track_list[i]['track']['duration_ms'])
  explicit.append(track_list[i]['track']['explicit'])
  name.append(track_list[i]['track']['artists'][0]['name'])
  popularity.append(track_list[i]['track']['popularity'])
  release_date.append(track_list[i]['track']['album']['release_date'])
  track_number.append(track_list[i]['track']['track_number'])
  uri.append(track_list[i]['track']['uri'])


In [20]:
# Artists first part
followers = []
genres = []
artist_popularity = []

url3 = base_url + 'artists'
params = {'ids':[artist_id_css1]}
artist_info = requests.get(url3,headers=headers,params=params).json()
iterations = range(len(artist_info['artists']))

for i in iterations:
  followers.append(artist_info ['artists'][i]['followers']['total'])
  genres.append(artist_info['artists'][i]['genres'][0])
  artist_popularity.append(artist_info['artists'][i]['popularity'])

In [21]:
# Artists second part
followers2 = []
genres2=[]
artist_popularity2=[]

params = {'ids':[artist_id_css2]}
artist_info2 = requests.get(url3,headers=headers,params=params).json()
iterations = range(len(artist_info2['artists']))

for i in iterations:
  followers2.append(artist_info2['artists'][i]['followers']['total'])
  if artist_info2['artists'][i]['genres']:
    genres2.append(artist_info2['artists'][i]['genres'][0])
  else:
    genres2.append('N/A')
  artist_popularity2.append(artist_info2['artists'][i]['popularity'])

In [22]:
# Artists summary
followers = followers + followers2
genres = genres + genres2
artist_popularity = artist_popularity + artist_popularity2

In [23]:
# Audio features
endpoint4 = 'audio-features'
url4 = base_url + endpoint4
params = {'ids': [track_id_css]}
audio_features = requests.get(url4,headers=headers,params=params).json()

danceability = []
energy = []
key = []
loudness = []
mode = []
speechiness = []
acousticness = []
instrumentalness = []
liveness = []
valence = []
tempo = []

iterations = range(len(playlist['items'])) 
for i in iterations:
  danceability.append(audio_features ['audio_features'][i]['danceability'])
  energy.append(audio_features ['audio_features'][i]['energy'])
  key.append(audio_features ['audio_features'][i]['key'])
  loudness.append(audio_features ['audio_features'][i]['loudness'])
  mode.append(audio_features ['audio_features'][i]['mode'])
  speechiness.append(audio_features ['audio_features'][i]['speechiness'])
  acousticness.append(audio_features ['audio_features'][i]['acousticness'])
  instrumentalness.append(audio_features ['audio_features'][i]['instrumentalness'])
  liveness.append(audio_features ['audio_features'][i]['liveness'])
  valence.append(audio_features ['audio_features'][i]['valence'])
  tempo.append(audio_features ['audio_features'][i]['tempo'])

In [24]:
import pandas as pd
df = pd.DataFrame(data={'album':album,'artists':artist,'disc_number':disc_number,'duration_ms':duration_ms,'explicit':explicit,'name':name,'popularity':popularity,'release_date':release_date,'track_number':track_number,'uri':uri,'danceability':danceability,'energy':energy,'key':key,'loudness':loudness,'mode':mode,'speechiness':speechiness,'acousticness':acousticness,'instrumentalness':instrumentalness,'liveness':liveness,'valence':valence,'tempo':tempo,'followers':followers,'genres':genres,'artist_popularity':popularity})

df = df.reindex(sorted(df.columns), axis=1)
df

Unnamed: 0,acousticness,album,artist_popularity,artists,danceability,disc_number,duration_ms,energy,explicit,followers,genres,instrumentalness,key,liveness,loudness,mode,name,popularity,release_date,speechiness,tempo,track_number,uri,valence
0,0.657,Leave My Home,29,FKJ,0.778,1,221890,0.449,False,818111,filter house,0.686000,5,0.0875,-10.435,0,FKJ,29,2019-03-08,0.0792,78.974,1,spotify:track:2pCuxeqnAYxOEigrCWIwKs,0.896
1,0.393,Portraits,66,Maribou State,0.697,1,218357,0.578,False,226681,electronica,0.379000,9,0.1060,-5.213,0,Maribou State,66,2015-06-01,0.0950,114.995,8,spotify:track:6VNooTY5w9A9wg1YUsEbKB,0.300
2,0.774,French Kiwi Juice,52,FKJ,0.651,1,252016,0.414,False,818111,filter house,0.000478,2,0.1220,-5.754,0,FKJ,52,2017-03-02,0.0310,88.983,8,spotify:track:00xjXIoyDIQYSYve03VsXf,0.230
3,0.391,Movie,50,Tom Misch,0.785,1,357357,0.379,False,765436,indie soul,0.016100,7,0.2670,-11.446,1,Tom Misch,50,2017-11-02,0.0732,122.043,1,spotify:track:6pcED19DVN2Coh4tmmgHH6,0.182
4,0.676,Leftovers,65,Dennis Lloyd,0.724,1,192628,0.304,False,487838,israeli pop,0.000000,10,0.1040,-11.462,0,Dennis Lloyd,65,2017-01-15,0.2300,153.906,1,spotify:track:0dcuGoAIPAT7OrP5CGSVBA,0.305
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0.725,Small Crimes / Keep on Calling,18,Nilüfer Yanya,0.764,1,230310,0.442,False,83623,art pop,0.000660,6,0.0807,-10.548,0,Nilüfer Yanya,18,2016-11-18,0.0355,123.069,2,spotify:track:4FUrVPC15KK8OcABXprp3Z,0.106
96,0.646,South,42,Galimatias,0.694,1,215526,0.333,False,106480,alternative r&b,0.025300,0,0.1080,-11.108,0,Galimatias,42,2018-04-26,0.0651,76.013,1,spotify:track:7L5P30Eg8eta8Q9em2VteI,0.433
97,0.313,Kind of Purple,31,Tertia May,0.491,1,248137,0.494,False,14390,uk contemporary r&b,0.000957,10,0.1500,-8.653,1,Tertia May,31,2018-06-08,0.2330,75.894,1,spotify:track:45GKP2X1Vyher9AdCR2dW7,0.202
98,0.632,Work,24,The Septembers,0.786,1,226000,0.264,False,411,neo r&b,0.026000,5,0.1170,-11.784,1,The Septembers,24,2017-01-20,0.0490,80.002,1,spotify:track:7KuuWuv2vJPonC12dHtATG,0.298


The following cells check whether your code is correct. Please **don't write any code here**. Just leave them as they are.

In [25]:
# LEAVE BLANK

In [26]:
# LEAVE BLANK

In [27]:
# LEAVE BLANK

In [28]:
# LEAVE BLANK

In [29]:
# LEAVE BLANK

## Saving the data

The final step is to save the DataFrame to a csv file in order to be able to load it into your BigML account. You can do so using DataFrame's ```to_csv``` method. This method takes your DataFrame and saves it to a file of your choosing in the same directoy where your notebook is located.

Run the following cell to save your DataFrame to a .csv file called 'spotify.csv'.

In [30]:
df.to_csv('spotify.csv', sep=',')

<div class="alert alert-danger"><b>Disclaimer: </b>This is a graded assignment. You can share your doubts in the course Forum and give a hand to your classmates with theirs. Yet, remember that posting explicit solutions to any of the exercises is strictly forbidden.</div>