# Final Project - Data in Hand
### Kaitlin Cochran

For my final project, I wish to explore if variables created by Spotify for a track (i.e. danceability, key, etc.) have an impact on how popular the track is. To do this, I pulled songs from the Top Songs USA playlist on Spotify. I may have to pull in other playlists to make the data more reliable, but I decided to start small. I also will need to pick a song to compare to this data that is relatively new, which I will do at the time of the Analysis Plan or Preliminary Draft stage so that I get the most accurate data. I will also probably rerun the code below to get more accurate songs on the Top Songs USA playlist. Below is the code I used to query the Tracks API and Playlist API from Spotify.

Import statements for the libraries I will be using.

In [1]:
import pandas as pd
import requests
import base64
import json

Reading in my tokens for a new application on the Spotify developer website.

In [2]:
tokens = pd.read_csv('keys.txt', sep = '\t', header = None).T
tokens.columns = tokens.iloc[0]

Formatting the tokens file so it can be read by functions later on.

In [5]:
tokens = tokens[1:]

A function we made together in class which creates a session header to use to connect to the client.

In [7]:
def start_session_headers(client_id, client_secret):
    client_cred = base64.b64encode(str(client_id + ":" + client_secret).encode('ascii'))
    header = {'Authorization': 'Basic %s' % (client_cred.decode('ascii'))}
    return header

A function we made together in class which connects to the client using the URL and header.

In [8]:
def connect_to_client_cred(url, header):
        payload = {'grant_type' : 'client_credentials'}
        response = requests.request("POST", url, headers = header, data = payload )
        print(response.status_code)
        if response.status_code != 200:
            raise Exception(response.status_code, response.text)
        return response.json()

A function we made together in class which gets an access token using both of the above methods in order to query specific Spotify APIs.

In [9]:
def get_access_token(client_id, client_secret):
    url = "https://accounts.spotify.com/api/token"
    header = start_session_headers(client_id, client_secret)
    json_response = connect_to_client_cred(url, header)
    return json_response

Acquiring the session token.

In [10]:
session_token = get_access_token(tokens['clientID'].iloc[0], tokens['clientSecret'].iloc[0])

200


A function we made in class that creates a header to query Spotify APIs.

In [17]:
def create_header(session_token):
    headers = {"Authorization": "Bearer {}".format(session_token)}
    return headers

A function we made in class that connects to a specific Spotify endpoint to gather specific data through a GET request.

In [18]:
def connect_to_endpoint(url, headers):
    response = requests.request("GET", url, headers = headers)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

A function I created to gather information about a Spotify playlist through the playlist id, found on Spotify.

In [26]:
def get_a_playlist(playlist_id):
    url = 'https://api.spotify.com/v1/playlists/{}/tracks'.format(playlist_id)
    header = create_header(session_token['access_token'])
    json_response = connect_to_endpoint(url, header)
    return json_response

A function we created in class to get information about a track on Spotify through the track id, found on Spotify.

In [53]:
def get_a_track(track_id):
    url = 'https://api.spotify.com/v1/tracks/{}'.format(track_id)
    header = create_header(session_token['access_token'])
    json_response = connect_to_endpoint(url, header)
    return json_response

A function we created in class to get Spotify-created variables about tracks through the track id, found on Spotify.

In [54]:
def track_audio_features(track_id):
    url = 'https://api.spotify.com/v1/audio-features/{}'.format(track_id)
    header = create_header(session_token['access_token'])
    json_response = connect_to_endpoint(url, header)
    return json_response

Querying the Playlist API using the playlist id for the Top Songs USA playlist.

In [27]:
playlist = get_a_playlist('37i9dQZEVXbLp5XoPON0wI')

200


Looking at the keys from the response to make a decision on what to use to make a helpful Data Frame.

In [48]:
playlist.keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

Getting the first item in the response.

In [None]:
playlist['items'][0]

Getting the track id for the first item in the response.

In [78]:
playlist['items'][0]['track']['id']

'46IZ0fSY2mpAiktS3KOqds'

Getting the track name for the first item in the response.

In [61]:
playlist['items'][0]['track']['name']

'Easy On Me'

Getting all track ids from the response.

In [51]:
tracks = [x['track']['id'] for x in playlist['items']]

Getting all track names from the response.

In [60]:
trackNames = [x['track']['name'] for x in playlist['items']]

Getting audio feature information for all songs in the Top Songs USA playlist.

In [None]:
trackInfo = [track_audio_features(x) for x in tracks]

Creating a Data Frame from the response from the Audio Features API.

In [57]:
tracksDf = pd.DataFrame(trackInfo)

The data frame from the Audio Features API.

In [58]:
tracksDf.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.604,0.366,5,-7.519,1,0.0282,0.578,0.0,0.133,0.13,141.981,audio_features,46IZ0fSY2mpAiktS3KOqds,spotify:track:46IZ0fSY2mpAiktS3KOqds,https://api.spotify.com/v1/tracks/46IZ0fSY2mpA...,https://api.spotify.com/v1/audio-analysis/46IZ...,224695,4
1,0.631,0.518,0,-8.771,1,0.0303,0.274,0.0,0.088,0.205,93.023,audio_features,5enxwA8aAbwZbf5qCHORXi,spotify:track:5enxwA8aAbwZbf5qCHORXi,https://api.spotify.com/v1/tracks/5enxwA8aAbwZ...,https://api.spotify.com/v1/audio-analysis/5enx...,613027,4
2,0.534,0.725,1,-3.906,1,0.046,0.091,0.0,0.0264,0.55,88.098,audio_features,3Kkjo3cT83cw09VJyrLNwX,spotify:track:3Kkjo3cT83cw09VJyrLNwX,https://api.spotify.com/v1/tracks/3Kkjo3cT83cw...,https://api.spotify.com/v1/audio-analysis/3Kkj...,225149,4
3,0.741,0.691,10,-7.395,0,0.0672,0.0221,0.0,0.0476,0.892,150.087,audio_features,5Z9KJZvQzH6PFmb8SNkxuk,spotify:track:5Z9KJZvQzH6PFmb8SNkxuk,https://api.spotify.com/v1/tracks/5Z9KJZvQzH6P...,https://api.spotify.com/v1/audio-analysis/5Z9K...,212353,4
4,0.703,0.68,9,-6.118,0,0.0431,0.0171,0.0,0.178,0.536,97.985,audio_features,6w8ZPYdnGajyfPddTWdthN,spotify:track:6w8ZPYdnGajyfPddTWdthN,https://api.spotify.com/v1/tracks/6w8ZPYdnGajy...,https://api.spotify.com/v1/audio-analysis/6w8Z...,210384,4


Creating a Data Frame using the track ids and names obtained earlier.

In [68]:
trackNamesDf = pd.DataFrame({'id': tracks, 'name': trackNames})

The resulting Data Frame.

In [69]:
trackNamesDf.head()

Unnamed: 0,id,name
0,46IZ0fSY2mpAiktS3KOqds,Easy On Me
1,5enxwA8aAbwZbf5qCHORXi,All Too Well (10 Minute Version) (Taylor's Ver...
2,3Kkjo3cT83cw09VJyrLNwX,Oh My God
3,5Z9KJZvQzH6PFmb8SNkxuk,INDUSTRY BABY (feat. Jack Harlow)
4,6w8ZPYdnGajyfPddTWdthN,Can I Get It


Creating the final data frame for this playlist, which combines the above Data Frame of ids and names with the original Data Frame containing audio track information.

In [74]:
finalDf = pd.merge(tracksDf, trackNamesDf, left_on = "id", right_on = 'id', how = 'left')

The resulting Data Frame.

In [75]:
finalDf.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,name
0,0.604,0.366,5,-7.519,1,0.0282,0.578,0.0,0.133,0.13,141.981,audio_features,46IZ0fSY2mpAiktS3KOqds,spotify:track:46IZ0fSY2mpAiktS3KOqds,https://api.spotify.com/v1/tracks/46IZ0fSY2mpA...,https://api.spotify.com/v1/audio-analysis/46IZ...,224695,4,Easy On Me
1,0.631,0.518,0,-8.771,1,0.0303,0.274,0.0,0.088,0.205,93.023,audio_features,5enxwA8aAbwZbf5qCHORXi,spotify:track:5enxwA8aAbwZbf5qCHORXi,https://api.spotify.com/v1/tracks/5enxwA8aAbwZ...,https://api.spotify.com/v1/audio-analysis/5enx...,613027,4,All Too Well (10 Minute Version) (Taylor's Ver...
2,0.534,0.725,1,-3.906,1,0.046,0.091,0.0,0.0264,0.55,88.098,audio_features,3Kkjo3cT83cw09VJyrLNwX,spotify:track:3Kkjo3cT83cw09VJyrLNwX,https://api.spotify.com/v1/tracks/3Kkjo3cT83cw...,https://api.spotify.com/v1/audio-analysis/3Kkj...,225149,4,Oh My God
3,0.741,0.691,10,-7.395,0,0.0672,0.0221,0.0,0.0476,0.892,150.087,audio_features,5Z9KJZvQzH6PFmb8SNkxuk,spotify:track:5Z9KJZvQzH6PFmb8SNkxuk,https://api.spotify.com/v1/tracks/5Z9KJZvQzH6P...,https://api.spotify.com/v1/audio-analysis/5Z9K...,212353,4,INDUSTRY BABY (feat. Jack Harlow)
4,0.703,0.68,9,-6.118,0,0.0431,0.0171,0.0,0.178,0.536,97.985,audio_features,6w8ZPYdnGajyfPddTWdthN,spotify:track:6w8ZPYdnGajyfPddTWdthN,https://api.spotify.com/v1/tracks/6w8ZPYdnGajy...,https://api.spotify.com/v1/audio-analysis/6w8Z...,210384,4,Can I Get It


Formatting the Data Frame so that only the information I intend to analyze is present.

In [79]:
finalDf.drop(['type', 'uri', 'track_href', 'analysis_url' ], axis = 1).head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,id,duration_ms,time_signature,name
0,0.604,0.366,5,-7.519,1,0.0282,0.578,0.0,0.133,0.13,141.981,46IZ0fSY2mpAiktS3KOqds,224695,4,Easy On Me
1,0.631,0.518,0,-8.771,1,0.0303,0.274,0.0,0.088,0.205,93.023,5enxwA8aAbwZbf5qCHORXi,613027,4,All Too Well (10 Minute Version) (Taylor's Ver...
2,0.534,0.725,1,-3.906,1,0.046,0.091,0.0,0.0264,0.55,88.098,3Kkjo3cT83cw09VJyrLNwX,225149,4,Oh My God
3,0.741,0.691,10,-7.395,0,0.0672,0.0221,0.0,0.0476,0.892,150.087,5Z9KJZvQzH6PFmb8SNkxuk,212353,4,INDUSTRY BABY (feat. Jack Harlow)
4,0.703,0.68,9,-6.118,0,0.0431,0.0171,0.0,0.178,0.536,97.985,6w8ZPYdnGajyfPddTWdthN,210384,4,Can I Get It


At this point, I could export this to a CSV file. However, I would really like to wait until I do the analysis portion of this project so I can look at most recent songs released and also the most recent Top Songs USA playlist.