# Step 1: Request Data
Request a copy of your data from Spotify [here](https://www.spotify.com/us/account/privacy/). Be patient and wait a few days. There is probably a way to request this data directly using Spotify’s API but that’s a project for another day!

In [1]:
import pandas as pd
import numpy as np
import requests

# Step 2: Prep Streaming/Library Data
Using the files Spotify has given us, we will now create one dataframe that includes all our streaming data PLUS whether each song is on our Library PLUS each song’s Spotify ‘URI’ (unique identifier — it’ll come in handy later)

## Create df_stream:

In [2]:
# read your 1+ StreamingHistory files (depending on how extensive your streaming history is) into pandas dataframes
df_stream = pd.read_json('StreamingHistory0.json')

# create a 'UniqueID' for each song by combining the fields 'artistName' and 'trackName'
df_stream['UniqueID'] = df_stream['artistName'] + ":" + df_stream['trackName']

df_stream.head()

Unnamed: 0,endTime,artistName,trackName,msPlayed,UniqueID
0,2020-09-22 00:02,halberd,coffee on the beach.,178559,halberd:coffee on the beach.
1,2020-09-22 00:05,Kota the Friend,Chicago Diner,175037,Kota the Friend:Chicago Diner
2,2020-09-22 00:08,cupcakKe,Single While Taken,169343,cupcakKe:Single While Taken
3,2020-09-22 00:12,Smino,Father Son Holy Smoke,240768,Smino:Father Son Holy Smoke
4,2020-09-22 00:15,Princess Nokia,Felicity Island,188693,Princess Nokia:Felicity Island


## Create df_library:

Next, I cleaned up my ‘YourLibrary’ file from Spotify so that it just contained the “tracks” dictionary, surrounded by brackets [ ] and saved as a new file ‘YourLibrary1'. Someone better at cleaning up json files could likely automate this step and use the original file.

In [3]:
# read your edited Library json file into a pandas dataframe
df_library = pd.read_json('YourLibrary1.json')

# add UniqueID column (same as above)
df_library['UniqueID'] = df_library['artist'] + ":" + df_library['track']

# add column with track URI stripped of 'spotify:track:'
new = df_library["uri"].str.split(":", expand = True)
df_library['track_uri'] = new[2]

df_library.head()

Unnamed: 0,artist,album,track,uri,UniqueID,track_uri
0,Noname,Telefone,All I Need (feat. Xavier Omär),spotify:track:5SBPdm1dAz7WhgmSQVfOew,Noname:All I Need (feat. Xavier Omär),5SBPdm1dAz7WhgmSQVfOew
1,Chance the Rapper,Acid Rap,Chain Smoker,spotify:track:4Jh8aypoHtCqv5GPzZxPsz,Chance the Rapper:Chain Smoker,4Jh8aypoHtCqv5GPzZxPsz
2,Dua Lipa,Future Nostalgia,Good In Bed,spotify:track:6uAFJ75WDAoAPyCWJAtvks,Dua Lipa:Good In Bed,6uAFJ75WDAoAPyCWJAtvks
3,cupcakKe,Eden,PetSmart,spotify:track:0Wahp1YAWzaTwt1hYBNefQ,cupcakKe:PetSmart,0Wahp1YAWzaTwt1hYBNefQ
4,Jhené Aiko,Chilombo,Surrender (feat. Dr. Chill),spotify:track:1vPw8XDPJLMmIaQGHYQ7Pp,Jhené Aiko:Surrender (feat. Dr. Chill),1vPw8XDPJLMmIaQGHYQ7Pp


## Create our final dataframe, df_tableau:

In [4]:
# create final dict as a copy df_stream
df_tableau = df_stream.copy()

# add column checking if streamed song is in library
# not used in this project but could be helpful for cool visualizations
df_tableau['In Library'] = np.where(df_tableau['UniqueID'].isin(df_library['UniqueID'].tolist()),1,0)

# left join with df_library on UniqueID to bring in album and track_uri
df_tableau = pd.merge(df_tableau, df_library[['album','UniqueID','track_uri']],how='left',on=['UniqueID'])

df_tableau.head()

Unnamed: 0,endTime,artistName,trackName,msPlayed,UniqueID,In Library,album,track_uri
0,2020-09-22 00:02,halberd,coffee on the beach.,178559,halberd:coffee on the beach.,0,,
1,2020-09-22 00:05,Kota the Friend,Chicago Diner,175037,Kota the Friend:Chicago Diner,1,Chicago Diner,1rpQJ5vH0wjL8EtSA7ZITQ
2,2020-09-22 00:05,Kota the Friend,Chicago Diner,175037,Kota the Friend:Chicago Diner,1,FOTO,4HzltjBAqEhAayGEBX2ZlR
3,2020-09-22 00:08,cupcakKe,Single While Taken,169343,cupcakKe:Single While Taken,1,Ephorize,0v5tTD8cCbNsuSPdZq4ppU
4,2020-09-22 00:12,Smino,Father Son Holy Smoke,240768,Smino:Father Son Holy Smoke,1,blkswn,3lWatHvGLP3GU8qvcO1tIu


# Step 3: Create New Spotify Project
Log into your developer account [here](https://developer.spotify.com/dashboard). In your dashboard, create a new project. Once created, you can retrieve your ‘Client ID’ and ‘Client Secret.’ We’ll use these in Step 4.

# Step 4: Create Genre Dataframe using Spotify’s API
First we’ll use our Client ID and Client Secret to generate an access token so we can pull data from Spotify’s API. Note: this token has to be regenerated after one hour. I figured out how to do this using the help of [this post](https://stmorse.github.io/journal/spotify-api.html).

In [5]:
from env import spotify_client_id, spotify_client_secret

# save your IDs from new project in Spotify Developer Dashboard
CLIENT_ID = spotify_client_id
CLIENT_SECRET = spotify_client_secret

In [6]:
# generate access token

# authentication URL
AUTH_URL = 'https://accounts.spotify.com/api/token'

# POST
auth_response = requests.post(AUTH_URL, {
    'grant_type': 'client_credentials',
    'client_id': CLIENT_ID,
    'client_secret': CLIENT_SECRET,
})

# convert the response to JSON
auth_response_data = auth_response.json()

# save the access token
access_token = auth_response_data['access_token']

In [7]:
# used for authenticating all API calls
headers = {'Authorization': 'Bearer {token}'.format(token=access_token)}

In [8]:
# base URL of all Spotify API endpoints
BASE_URL = 'https://api.spotify.com/v1/'

Now we’ll pull the artist and genres associated with each track_uri in our library and add to a dictionary *(check out Spotify’s [console](https://developer.spotify.com/console/) to find out how to pull the data points you’re interested in).*

In [None]:
# create blank dictionary to store track URI, artist URI, and genres
dict_genre = {}

# convert track_uri column to an iterable list
track_uris = df_library['track_uri'].to_list()

# loop through track URIs and pull artist URI using the API,
# then use artist URI to pull genres associated with that artist
# store all these in a dictionary
for t_uri in track_uris:
    
    dict_genre[t_uri] = {'artist_uri': "", "genres":[]}
    
    r = requests.get(BASE_URL + 'tracks/' + t_uri, headers=headers)
    r = r.json()
    a_uri = r['artists'][0]['uri'].split(':')[2]
    dict_genre[t_uri]['artist_uri'] = a_uri
    
    s = requests.get(BASE_URL + 'artists/' + a_uri, headers=headers)
    s = s.json()
    dict_genre[t_uri]['genres'] = s['genres']

We’ll convert this dictionary to a dataframe `df_genre` and expand it so that each genre for each track/artist is in its own line (I used this solution). This will create `df_genre_expanded`.

In [None]:
# convert dictionary into dataframe with track_uri as the first column
df_genre = pd.DataFrame.from_dict(dict_genre, orient='index')
df_genre.insert(0, 'track_uri', df_genre.index)
df_genre.reset_index(inplace=True, drop=True)

df_genre.head()

In [None]:
df_genre_expanded = df_genre.explode('genres')
df_genre_expanded.head()

We’ll then save `df_tableau` and `df_genre_expanded` as csv files that we can load into Tableau.

In [None]:
# save df_tableau and df_genre_expanded as csv files that we can load into Tableau
df_tableau.to_csv('MySpotifyDataTable.csv')
df_genre_expanded.to_csv('GenresExpandedTable.csv')

print('done')