![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-science-and-artificial-intelligence&branch=main&subPath=analysis/06b-getting-spotify-data.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Getting Spotify Data

This is an optional notebook to demonstrate how we can query the [Spotify API](https://developer.spotify.com/documentation/web-api) to get information about songs and store that as a [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) file for use in the [Spotify Data Analysis notebook](06-data-analysis-spotify.ipynb).

An [application programming interface](https://en.wikipedia.org/wiki/API) is a set of commands to access data from another system.

[![what is an API?](https://img.youtube.com/vi/s7wmiS2mSXY/0.jpg)](https://www.youtube.com/watch?v=s7wmiS2mSXY)

The [Spotify Web API](https://developer.spotify.com/documentation/web-api/) allows us to get information about songs, albums, and artists.

If you want to retireve more data and have a [Spotify account](https://www.spotify.com/us/signup), you can:

1. Login to the [Developers Dashboard](https://developer.spotify.com/dashboard/login).
2. From the Dashboard, you can click the `CREATE AN APP` button, type a name and description, and then click `CREATE`.
3. Click on your new app in the Dashboard to find the `Client ID` and `CLIENT SECRET`.
4. Paste those strings of characters into the code cell below, replacing the `PASTE_YOUR_CLIENT_ID_HERE` and `PASTE_YOUR_CLIENT_SECRET_HERE` strings. Make sure there are still quotation marks (`'`) around them.

In [None]:
CLIENT_ID = 'PASTE_YOUR_CLIENT_ID_HERE'
CLIENT_SECRET = 'PASTE_YOUR_CLIENT_SECRET_HERE'

if CLIENT_ID != 'PASTE_YOUR_CLIENT_ID_HERE': # make sure you've pasted in your client ID and client secret
    import requests
    import pandas as pd
    from math import ceil

    def get_track_info(track_id):
        try:
            r = requests.get('https://api.spotify.com/v1/tracks/' + track_id, headers=headers)
            info = r.json()
        except:
            print('Error with track id:', track_id)
            info = None
        return info

    def get_track_features(track_id):
        try:
            r = requests.get('https://api.spotify.com/v1/audio-features/' + track_id, headers=headers)
            info = r.json()
        except:
            print('Error with track id:', track_id)
            info = None
        return info

    try:
        auth_response = requests.post('https://accounts.spotify.com/api/token', {'grant_type':'client_credentials', 'client_id':CLIENT_ID, 'client_secret':CLIENT_SECRET})
        auth_response_data = auth_response.json()
        access_token = auth_response_data['access_token']
        headers = {'Authorization':'Bearer {token}'.format(token=access_token)}
        print('Spotify API setup complete')
    except:
        print('Remember to paste your client ID and secret into the code cell')

Now we can get information and audio features for the songs in a playlist using the `playlist_id`.

For example, the Billions Club playlist url is https://open.spotify.com/playlist/37i9dQZF1DX7iB3RCnBnN4 so the `playlist_id` is `37i9dQZF1DX7iB3RCnBnN4`

In [None]:
playlist_id = '37i9dQZF1DX7iB3RCnBnN4'

if CLIENT_ID != 'PASTE_YOUR_CLIENT_ID_HERE':
    playlist_info = requests.get('https://api.spotify.com/v1/playlists/' + playlist_id, headers=headers).json()
    playlist_name = playlist_info['name']
    playlist_length = playlist_info['tracks']['total']
    print(f'Getting {playlist_length} tracks from {playlist_name}')
    n_requests = ceil(playlist_length/100)  # it only returns 100 tracks at a time
    tracks = []
    for x in range(n_requests):
        offset = x*100
        r = requests.get('https://api.spotify.com/v1/playlists/' + playlist_id + '/tracks?offset=' + str(offset), headers=headers)
        for item in r.json()['items']:
            artist = item['track']['artists'][0]['name']
            name = item['track']['name']
            track_id = item['track']['id']
            release_date = pd.to_datetime(item['track']['album']['release_date'], format='%Y-%m-%d')
            popularity = item['track']['popularity']
            tracks.append([artist, name, track_id, release_date, popularity])
    pl = pd.DataFrame(tracks, columns=['artist', 'track', 'id','release_date', 'popularity'])

    track_features = {}
    for row in pl.itertuples():
        print(playlist_length, row[1], row[2]) # artist and track
        id = row[3]
        features = get_track_features(id)
        track_features[id] = features
        playlist_length -= 1
    tf = pd.DataFrame(track_features).T
     
    playlist = pd.merge(pl, tf, on='id') # merge the dataframes
    playlist = playlist.drop('type', axis=1) # drop the 'type' column
    
    from IPython.display import display, clear_output
    clear_output()
    display(playlist)
    print('Your data is stored in the dataframe called "playlist"')

Now that we have a data set stored in `playlist`, we can save it to a csv file. The `index=False` command means we won't include the index as a column in the file.

In [None]:
playlist.to_csv('spotify.csv', index=False)

### Column Descriptions

From https://developer.spotify.com/documentation/web-api/reference/

|Value|Description|
|-|-|
|track|The name of the track.|
|artist|The person or group the track is credited to.|
|track_id|The [Spotify ID](https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids) of the track|
|danceability|How suitable a track is for dancing. A value of 0.0 is least danceable and 1.0 is most danceable.|
|energy|A perceptual measure of intensity and activity that ranges between 0 to 1. Typically, energetic tracks feel fast, loud, and noisy.|
|key|The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D|
|loudness|The average loudness of a track in decibels (dB). Values typically ranges between -60 and 0 db.|
|mode|The modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.|
|speechiness|Indicates the presence of spoken words in a track. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech while below 0.33 most likely represent music and other non-speech-like tracks.|
|acousticness|A confidence measure indicating whether the track is acoustic. Value of 1 represents highest confidence.|
|instrumentalness|Predicts whether a track contains no vocals. The closer the value is to 1, the greater likelihood the track contains no vocal content.|
|liveness|Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.|
|valence|A measure to describe the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).|
|tempo|The overall estimated tempo (speed or pace) of a track in beats per minute (BPM).|
|duration_ms|The duration of the track in [milliseconds](https://en.wikipedia.org/wiki/Millisecond).|
|time_signature|An estimated overall time signature of a track. The time signature is a notational convention to specify how many beats are in each bar (or measure).|
|chorus_hit|The approximate start time of the chorus, in seconds|
|sections|The number of sections in the track. Sections are defined by large variations in rhythm or timbre.|
|popularity|A value between 0 and 1, with 1 being the most popular.|
|release_date|The date the track was released.|

You can now use these data sets in the [Spotify Data Analysis notebook](06-data-analysis-spotify.ipynb), or go on to the [next notebook](07-primary-data.ipynb) that will introduce you to recording and using your own data.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)