# Spotipy: Get My Tracks and Audio Features
___

## Introduction
This project evolved several times over the course of the 5 weeks we had to work on it. This Introduction is a high level overview of my plans as they evolved and ultimately how the final product came to fruition. Hopefully these sections will help the reader navigate the notebooks in a way that is readable.<br>
`Version 1`:
- My original idea was around collaborative filtering and creating a network based on different users data. I was planning on authenticating 3rd Party Users via Spotify Oauth, and only request access to their library, their playlists, who they followed, who followed them, and the top tracks of each other user. 
- With this information create a network with Users being the origin point, and further breaking down into nodes for Artists, Albums, Tracks, and Genres with different nodes having different metadata and descriptive data made available by Spotify. 
- With this network, calculate and manipulate song suggestions based on the mean vector of a given set of tracks depending on how you filtered the network.
- Unfortunately, after spending several weeks exploring different articles and walkthroughs of Collaborative Filtering and working on Spotify Oauth protocols, I found out that other user data is not actually available via a user's session. <br>
`Version 2`:
- Having learned that gathering data from multiple users was not a possibility, I attempted to pivot to a similar form of recommendation which was Content Based Filtering. 
- While I couldn't get other users information, I could get a lot of information and track information from a given profile. This could set me up well in relation to Content Based Filtering which recommends items by comparing the content of items and their relation to a user profile. For track data this could be split between explicit, implicit, and general interactions.
- Explicit interactions would be any tracks that a user actively saved or was one of their top tracks e.g. the frequency of play over other songs could be interpreted as liking.
- Implicit interactions would be any tracks that a user listened to but didnt explicitly like, which came from any playlists the users owned or followed, on top of the top tracks of any artist they followed.
- General Pool was based on songs that were available to a given user based on their current Locale/Market ('US' for example)
- This would creat our user profile on top of the library of tracks that existed in a given users profile.
- However, after researching a couple of articles like https://towardsdatascience.com/introduction-to-two-approaches-of-content-based-recommendation-system-fc797460c18c and reading their notebooks, I realized:
    - A: This wasn't exactly what I was trying to execute on which was identifying a mean vector of audio features to produce a signal of recommendations
    - B: The techniques described weren't using any machine learning
- Separately but in parallel, after trying to push my application code onto rendor for hosting the application, I quickly found some dependencies in relation to user authentication, redirect uris, and what streamlit/render allow within their libraries. This required me to pivot again, but in a way that I can plan for the future.<br>
`Version 3`:
- Create an application that doesnt authenticate 3rd Party users, but gets an access token to interact server to server. This still allows me to interact with certain API endpoints from Spotify that get tracks, audio features, artists, and genres.
- This application will serve as a version 1 that can ultimately be built upon to create the original version down the line. Instead of pulling a user's profile, we will give the user 3 options, to get recommendations by:
    - Track Names, Artists, Release Year
    - Artist Names
    - Genres
- With each option the application would call the required endpoint to either pull the specific tracks provided by users, pull the top tracks from provided artists, pull 20 tracks per genre listed.
- From there find the Mean Vector of the pulled track audio features, and use that mean vector to capture the closest cosine distances in relation to a scaled track data library of 400k songs.
- There ultimately were some complications with this final version but we will get into those as the notebook progresses.

___
# Version 2&3 Data Collection: Unsupervised K-Means Clustering

- After pulling some of my own user data, tracks, playlists, etc. I was able to do some EDA to identify how my music taste follows an audio track pattern/frequency across the different audio track features.
- I was even able to pull some initial scores from some classification models with different tuned hyper parameters as well as different pre-processed methods (Standard Scaler and Polynomial Features).
- In either models case, none of them came close to beating the baseline score of my liked songs and songs from my profile (~75%).
- As such, I am going back to pull even more data and try and pull as much track data as I can from one user.
- I am going to represent a users explicit and implicit taste through:
    - Explicit: Liked Songs, user took action on each track
    - Explicit: Songs from playlists they created themselves or follow, user took action to make something with these songs or liked enough of the songs to follow the playlist
    - Implicit: Songs from Artists they follow, like playlists, but users are not often the artists, and the library of songs under an Artist is more expansive than an individual playlist.
    - Implicit: Songs from Artists who they may not follow, but have liked songs from them before
    - General Pool: Featured Playlists and their songs which are made available by the profile
    - General Pool: Category playlists and their songs which are made avilalbe by the profile
- Combined with "One Hot Encoded" genres per track this will not only give us our signal, but the noise as well.

### Handler FLow

- I want to be able to scale this application in the future, and I want to be able to add other features that might require different authentication scopes, I created a handler function.
- Throughout this notebook and other notebooks where we interact with Spotify/Spotipy, we will call the handler to pull the required data
- The first call in any handler call is to our authenticate function that:
    - creates a SpotifyOAuth session
    - checks if we have a cached token within that session
    - and if not, generates a new token for us to use
    - this handler function will be the basis of interactions on the webapp for simplicitity and scale
- Code can be found in get_tracks_methods

In [199]:
# !pip3 install more_itertools

In [204]:
import pandas as pd
from get_tracks_methods import handler, authenticate, flatten_tracks,\
get_track_audio_features, merger
import time
from concurrent.futures import ThreadPoolExecutor
from more_itertools import chunked

Let's use the handler to pull my Liked Songs and turn them into a flattened dataframe:

___
### Explicit Tracks: Tracks a user has saved

In [2]:
%%time
user_liked_songs = handler("get_saved_tracks")

CPU times: user 427 ms, sys: 74 ms, total: 501 ms
Wall time: 10.6 s


In [3]:
user_liked_songs.head()

Unnamed: 0,id,track_name,artist,artist_uri,album_uri,album,release_date,popularity,explicit,user_liked
0,1nFtiJxYdhtFfFtfXBv06s,Something In The Way,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,73,False,1
1,5Ddlk6C2JVxb1SReZ6O1wk,Drain You,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,63,False,1
2,4l012k8ZcAdVbUvZ4kae5Q,Stay Away,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,56,False,1
3,2YodwKJnbPyNKe8XXSE9V7,Lithium,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,72,False,1
4,4P5KoWXOxwuobLmHXLMobV,Come As You Are,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,78,False,1


- Now having retrieved our saved tracks, let's add columns for playlist_name and user_liked to signify our user's explicit interaction with these recrods.

In [4]:
user_liked_songs['user_liked'] = 1

In [5]:
user_liked_songs.head()

Unnamed: 0,id,track_name,artist,artist_uri,album_uri,album,release_date,popularity,explicit,user_liked
0,1nFtiJxYdhtFfFtfXBv06s,Something In The Way,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,73,False,1
1,5Ddlk6C2JVxb1SReZ6O1wk,Drain You,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,63,False,1
2,4l012k8ZcAdVbUvZ4kae5Q,Stay Away,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,56,False,1
3,2YodwKJnbPyNKe8XXSE9V7,Lithium,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,72,False,1
4,4P5KoWXOxwuobLmHXLMobV,Come As You Are,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,78,False,1


- Only one more group of songs for our explicit interaction and we can move onto implicit and general pool tracks. Let's grab the current users playlists owned and/or followed including their associated tracks:

In [6]:
%%time
user_playlist_songs = handler("get_user_playlist_tracks")

Please authorize the app by visiting this URL: https://accounts.spotify.com/authorize?client_id=fd5183b4728840a989108098987ef843&response_type=code&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2Fcallback&scope=user-library-read+playlist-read-private


Enter the authorization code:  AQB9AujTu5kCTjg4M3AAAA_CdE4VclNw_4MqK2mZqX3Ph589ta31AqgtUXonumHO8L8MidM6bzN6jl8fd_3eg4f9uySu76sEAPdt4MRrlf-hly89iLblfsZmoq4m0BeHgVjxeLV2Tqefhq_iSsV67STqYio9IrGWtQm5rwYD5ZB1L1-hGAg-Rzm29RbO5PEAPqBXJ7FIiwHtMIc02PQrqU70R8KtNqGRFF7mYg


CPU times: user 773 ms, sys: 158 ms, total: 931 ms
Wall time: 27.3 s


In [7]:
user_playlist_songs.head()

Unnamed: 0,id,track_name,artist,artist_uri,album_uri,album,release_date,popularity,explicit,user_liked
0,1pUsdir2xhSxP0RyBe9lLH,Icarus,Madeon,spotify:artist:4pb4rqWSoGUgxm63xmJ8xc,spotify:album:3uKLwDjku2Us0c81LEmftR,Adventure (Deluxe),2015-03-30,49,False,1
1,3H3cOQ6LBLSvmcaV7QkZEu,Aerodynamic,Daft Punk,spotify:artist:4tZwfgrHOc3mvqYlEYSvVi,spotify:album:2noRn2Aes5aoNVsU6iWThc,Discovery,2001-03-12,62,False,1
2,7sLnn2UOttrEPBZSFyIsYh,Wolfgang's 5th Symphony,Wolfgang Gartner,spotify:artist:3534yWWzmxx8NbKVoNolsK,spotify:album:3kDILp9v0rDVPMAJUQYbZx,Back Story,2012-09-04,43,False,1
3,3ezkJgagRPZ39KCTrKcSI7,Ghosts 'n' Stuff (feat. Rob Swire),deadmau5,spotify:artist:2CIMQHirSU0MQqyYHq0eOx,spotify:album:3eNZDL2rqTVvmiC1f0yFwF,For Lack of a Better Name (The Extended Mixes),2009,62,False,1
4,0VffaI2jwQknRrxpECYHsF,Greyhound,Swedish House Mafia,spotify:artist:1h6Cn3P4NGzXbaXidqURXs,spotify:album:4ljisoNarj0BpQSMIEv88L,Until Now,2012-01-01,64,False,1


In [8]:
user_playlist_songs.shape, user_liked_songs.shape

((2396, 10), (643, 10))

- Let's combine into one df to have our explicit interactions df

In [9]:
user_exp_tracks = pd.concat([user_liked_songs, user_playlist_songs])

In [10]:
user_exp_tracks.shape

(3039, 10)

In [11]:
user_exp_tracks.drop_duplicates(subset=['id'],inplace=True)

In [13]:
user_exp_tracks.to_csv('../data/explicit_tracks.csv')

- Successfully captured our tracks where the user has explicitly shown interest in whether by saving individually, adding to a playlist they created themselves/own, or are frome a playlist they chose to follow.

Now to collect implicit:
- Implicit: Songs from Artists they follow, like playlists, but users are not often the artists, and the library of songs under an Artist is more expansive than an individual playlist.
- Implicit: Songs from Artists who they may not follow, but have liked songs from them before

___
### Implicit Tracks

- Collect all available Artists from our liked songs

In [14]:
artists = user_liked_songs['artist_uri']

In [15]:
sp = authenticate(scope='user-library-read user-follow-read')

Please authorize the app by visiting this URL: https://accounts.spotify.com/authorize?client_id=fd5183b4728840a989108098987ef843&response_type=code&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2Fcallback&scope=user-library-read+user-follow-read


Enter the authorization code:  AQBMCeoR_mkm_GhCS5tLfQJsHt4FSuL9V_VbDGufNS74Xmnka8fcLqOFuzVmubAqnHrLGs6OGEbmrPdujloTJs5qD52Pm84Tcr3dccxm1QF0XMkGbyqKywbEWeyXZ7Yatm9tNYe119jygKQckOqQZ_qi8Y2PFLCWDt3faIHeKaET8DEzhThT4Z3dxe6i5GgMU-8IwLf9N-TXqmefn8AeGcRgZ8QHhpU


- Collect all Artists a given user follows

In [16]:
followed_artists = sp.current_user_followed_artists(limit=50)

In [17]:
temp_artists = []
for artist in followed_artists['artists']['items']:
    temp_artists.append(artist['uri'])
    
temp = pd.Series(temp_artists)

In [18]:
imp_artists = pd.concat([artists,temp])

In [19]:
imp_artists

0     spotify:artist:6olE6TJLqED3rqDCT0FyPh
1     spotify:artist:6olE6TJLqED3rqDCT0FyPh
2     spotify:artist:6olE6TJLqED3rqDCT0FyPh
3     spotify:artist:6olE6TJLqED3rqDCT0FyPh
4     spotify:artist:6olE6TJLqED3rqDCT0FyPh
                      ...                  
27    spotify:artist:65c0gzsw9JsPUxm09QPjQj
28    spotify:artist:6P7H3ai06vU1sGvdpBwDmE
29    spotify:artist:6a1696QGUkDJgGuJHwS3zS
30    spotify:artist:70cRZdQywnSFp9pnc2WTCE
31    spotify:artist:73sIBHcqh3Z3NyqHKZ7FOL
Length: 675, dtype: object

In [20]:
imp_artists.shape

(675,)

In [21]:
imp_artists.drop_duplicates(inplace=True)

In [22]:
imp_artists

0     spotify:artist:6olE6TJLqED3rqDCT0FyPh
7     spotify:artist:3hyGGjxu73JuzBa757H6R5
8     spotify:artist:6V70yeZQCoSR2M3fyW8qiA
9     spotify:artist:5KeQyt1QJBjcutJ2AuLNO2
10    spotify:artist:4gzpq5DPGxSnKTe4SA8HAU
                      ...                  
24    spotify:artist:5d4LM8c0ZhuhXFwZKs6lXR
25    spotify:artist:5h0EnezM11vvxMuGuGd7wJ
28    spotify:artist:6P7H3ai06vU1sGvdpBwDmE
29    spotify:artist:6a1696QGUkDJgGuJHwS3zS
30    spotify:artist:70cRZdQywnSFp9pnc2WTCE
Length: 332, dtype: object

In [23]:
imp_artists.shape

(332,)

- Function to pull artists top tracks from list of implicit artists

In [None]:
tracks = []
for artist_uri in imp_artists:
    
    artist_tracks = sp.artist_top_tracks(artist_id=artist_uri)
    tracks.extend(artist_tracks['tracks'])

- First Item of Artist Top Tracks Collection

In [None]:
tracks[0]

{'album': {'album_group': 'album',
  'album_type': 'album',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh'},
    'href': 'https://api.spotify.com/v1/artists/6olE6TJLqED3rqDCT0FyPh',
    'id': '6olE6TJLqED3rqDCT0FyPh',
    'name': 'Nirvana',
    'type': 'artist',
    'uri': 'spotify:artist:6olE6TJLqED3rqDCT0FyPh'}],
  'external_urls': {'spotify': 'https://open.spotify.com/album/2guirTSEqLizK7j9i1MTTZ'},
  'href': 'https://api.spotify.com/v1/albums/2guirTSEqLizK7j9i1MTTZ',
  'id': '2guirTSEqLizK7j9i1MTTZ',
  'images': [{'height': 640,
    'url': 'https://i.scdn.co/image/ab67616d0000b273e175a19e530c898d167d39bf',
    'width': 640},
   {'height': 300,
    'url': 'https://i.scdn.co/image/ab67616d00001e02e175a19e530c898d167d39bf',
    'width': 300},
   {'height': 64,
    'url': 'https://i.scdn.co/image/ab67616d00004851e175a19e530c898d167d39bf',
    'width': 64}],
  'is_playable': True,
  'name': 'Nevermind (Remastered)',
  'release_date':

- Before we created this as a function in get_track_methods.py it was plain text
- Creates a dictionary per track in a list pulling out certain metadata features for further data collection such as audio features

In [31]:
flattened_tracks = []

for track in tracks:
    
    try:

        flattened_track = {
                "id": track["id"],
                "track_name": track["name"],
                "artist": track["artists"][0]["name"],
                "artist_uri": track["artists"][0]["uri"],
                "album_uri": track["album"]["uri"],
                "album": track["album"]["name"],
                "release_date": track["album"]["release_date"],
                "popularity": track["popularity"],
                "explicit": track["explicit"],
                "user_liked": 0
            }

        flattened_tracks.append(flattened_track)
    except:
        continue


In [33]:
len(flattened_tracks)

3260

In [37]:
implicit_tracks = pd.DataFrame(flattened_tracks)

In [38]:
user_exp_tracks.shape, implicit_tracks.shape

((2518, 10), (3260, 10))

- After collecting examples of both explicit and implicit tracks, combine to create a new collection of tracks for the mean vector

In [39]:
exp_imp_tracks = pd.concat([user_exp_tracks, implicit_tracks])

In [40]:
exp_imp_tracks.shape

(5778, 10)

- Sort values by the user liked column, in descending order
- user_liked is a feature used to designate if the user explicitly or implicitly interatcted with a given track

In [46]:
exp_imp_tracks.sort_values(by=['user_liked'],ascending=False)

Unnamed: 0,id,track_name,artist,artist_uri,album_uri,album,release_date,popularity,explicit,user_liked
0,1nFtiJxYdhtFfFtfXBv06s,Something In The Way,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,73,False,1
1464,3BCe21LqWVOWWDfHjuoeeR,"Nightcall - From ""Drive""",Thibault Cauvin,spotify:artist:6d81rjlV6r9u8qPMAjavRV,spotify:album:3j4Luk2uwaiahexJJUIioz,"Nightcall (From ""Drive"")",2021-04-23,45,False,1
1457,19L0jgD9mr8QOy9kxr7LZS,Total Eclipse of the Heart (feat. Simply Three),Vitamin String Quartet,spotify:artist:6MERXsiRbur2oJZFgYRDKz,spotify:album:0sZwPaM40mIDElKRmnYvAb,Total Eclipse of the Heart (feat. Simply Three),2021-05-21,46,False,1
1458,6Pk7XpaiMsXAM6dXN3kQEo,Firework,Jeremy Green,spotify:artist:32jiRxDN9Nb9QbXh88uo42,spotify:album:7wBmPhdZlYiDa4h6rGDSbG,Firework,2022-01-21,43,False,1
1459,0zOI6bMCkjd10vCrDy3voc,Wellerman (arr. piano),Music Lab Collective,spotify:artist:1ylcY77FWeSVQKh5et1VGp,spotify:album:1f9nDMa1wlNsPxDtygkg4B,Wellerman (arr. piano),2021-07-30,48,False,1
...,...,...,...,...,...,...,...,...,...,...
1089,3C41w4YPDCfGuxzQNjvk8O,Quality Control,Jurassic 5,spotify:artist:6wFId9Jhuf9AKVzWboOj2B,spotify:album:4ePjDIN5keSHlmc93FKN0r,Quality Control,2000-01-01,44,False,0
1090,2IdUqeF20MzmtyetmQCRU6,High Fidelity,Jurassic 5,spotify:artist:6wFId9Jhuf9AKVzWboOj2B,spotify:album:7HvVxQBTNKoxY2aLexcEQH,Power In Numbers,2002-01-01,42,False,0
1091,4zVtyyuXiLAzSs9I5meIyO,Jayou,Jurassic 5,spotify:artist:6wFId9Jhuf9AKVzWboOj2B,spotify:album:5Lj4qWG2xATA2XNz3l6BZT,J5 (Deluxe Edition),2008-02-06,49,True,0
1092,0XtcvtK0d2dbBZSBQIP7XU,Thin Line,Jurassic 5,spotify:artist:6wFId9Jhuf9AKVzWboOj2B,spotify:album:7HvVxQBTNKoxY2aLexcEQH,Power In Numbers,2002-01-01,42,False,0


- We sorted by user_liked in order to drop duplicates in case there was any cross over from the implicit tracks and explicit tracks.
- IF there are duplicates, we want to preserve the explicit track as it is of a higher order data wise

In [47]:
exp_imp_tracks.drop_duplicates(subset=['id'],keep='first',inplace=True)

In [50]:
exp_imp_tracks[exp_imp_tracks['user_liked']==1].shape

(2518, 10)

- Save explicit implicit track library

In [51]:
exp_imp_tracks.to_csv('../data/explicit_implicit_tracks.csv')

Successfully pulled 3k additional songs for the implicit library. Finally let's pull from featured playlists and category playlists

### General Tracks
- General Pool: Featured Playlists and their songs which are made available by the profile
    - Country/Market
    - limit 50
- General Pool: Category playlists and their songs which are made avilalbe by the profile
    - Get Several Browse Categories
    - For each category, pull category playlist
        - both need Country/Market
        - Each have a limit of 50

___
### Featured Playlists

In [53]:
sp = authenticate(scope='user-read-private')

Please authorize the app by visiting this URL: https://accounts.spotify.com/authorize?client_id=fd5183b4728840a989108098987ef843&response_type=code&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2Fcallback&scope=user-read-private


Enter the authorization code:  AQA1cVjLFLfRy-yy4pcfkieKFOhszfAOw-TUsGm2LL_vLbpaX-z2OClFIMTAwOxZnh2fadOFXOdHArL2Ql3vhFwwNwPX-Z3i4eW8qGu8ocxabdPz6MTS3OJOQEmZIuq7r8qfxYe3Eh5LJVQYKxP3lGDVpwabWcw95NZM9RXvUvT8y5fyrjsDboRqWJmEwjVyPlQNN8g


In [56]:
current_market = sp.current_user()['country']

In [57]:
current_market

'US'

- Gather featured playlists within a given market, my given market is the US

In [58]:
featured_playlists = sp.featured_playlists(country=current_market, limit=50)

In [91]:
featured_playlists = featured_playlists['playlists']['items']

In [92]:
featured_playlists

[{'collaborative': False,
  'description': 'Relax with these timeless tunes. Cover: John Denver',
  'external_urls': {'spotify': 'https://open.spotify.com/playlist/37i9dQZF1DWTQwRw56TKNc'},
  'href': 'https://api.spotify.com/v1/playlists/37i9dQZF1DWTQwRw56TKNc',
  'id': '37i9dQZF1DWTQwRw56TKNc',
  'images': [{'height': None,
    'url': 'https://i.scdn.co/image/ab67706f00000003a83b6c9332a62267ed6d4f5f',
    'width': None}],
  'name': 'Mellow Classics',
  'owner': {'display_name': 'Spotify',
   'external_urls': {'spotify': 'https://open.spotify.com/user/spotify'},
   'href': 'https://api.spotify.com/v1/users/spotify',
   'id': 'spotify',
   'type': 'user',
   'uri': 'spotify:user:spotify'},
  'primary_color': None,
  'public': None,
  'snapshot_id': 'MTY3NTEwMjE2NSwwMDAwMDAwMDdkYWI0MzAxYTEzYTY0NjNkZWJmNzAxNGQ2Zjg0MDg1',
  'tracks': {'href': 'https://api.spotify.com/v1/playlists/37i9dQZF1DWTQwRw56TKNc/tracks',
   'total': 70},
  'type': 'playlist',
  'uri': 'spotify:playlist:37i9dQZF1DWTQ

___
### Category Playlists

- Similar to featured playlists, pull categories that are available in a users market

In [61]:
current_categories = sp.categories(country=current_market, limit=50)

In [67]:
current_categories

{'categories': {'href': 'https://api.spotify.com/v1/browse/categories?country=US&offset=0&limit=50',
  'items': [{'href': 'https://api.spotify.com/v1/browse/categories/toplists',
    'icons': [{'height': 275,
      'url': 'https://t.scdn.co/media/derived/toplists_11160599e6a04ac5d6f2757f5511778f_0_0_275_275.jpg',
      'width': 275}],
    'id': 'toplists',
    'name': 'Top Lists'},
   {'href': 'https://api.spotify.com/v1/browse/categories/0JQ5DAqbMKFQ00XGBls6ym',
    'icons': [{'height': 274,
      'url': 'https://t.scdn.co/media/original/hip-274_0a661854d61e29eace5fe63f73495e68_274x274.jpg',
      'width': 274}],
    'id': '0JQ5DAqbMKFQ00XGBls6ym',
    'name': 'Hip-Hop'},
   {'href': 'https://api.spotify.com/v1/browse/categories/0JQ5DAqbMKFEC4WFtoNRpw',
    'icons': [{'height': 274,
      'url': 'https://t.scdn.co/media/derived/pop-274x274_447148649685019f5e2a03a39e78ba52_0_0_274_274.jpg',
      'width': 274}],
    'id': '0JQ5DAqbMKFEC4WFtoNRpw',
    'name': 'Pop'},
   {'href': 'https

In [68]:
list_of_cats = [category['id'] for category in current_categories['categories']['items']]

In [70]:
sp = authenticate(scope='user-library-read')

Please authorize the app by visiting this URL: https://accounts.spotify.com/authorize?client_id=fd5183b4728840a989108098987ef843&response_type=code&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2Fcallback&scope=user-library-read


Enter the authorization code:  AQB8R6TZahq6nl8q4wAtBPZUVc5m3uFyPWU8r6Ro8Rd6qTT9xo-oA0r0r3b5Nkv58s_0lC9Z-RUqDs6ao7m57Tn_KHL6hYu-1ywxjWIwD0fGFLbmxgd0wQ4iPj87FkNsuWKp-Vih2HT9_8BQQBlMy6_RlO-tSH7Ox6OMJMygtXzh1I4PJj-twFyVn_Nfd7yJob5R1_w


In [74]:
sp.playlist(playlist_id='37i9dQZF1DWTQwRw56TKNc')

{'collaborative': False,
 'description': 'Relax with these timeless tunes. Cover: John Denver',
 'external_urls': {'spotify': 'https://open.spotify.com/playlist/37i9dQZF1DWTQwRw56TKNc'},
 'followers': {'href': None, 'total': 637521},
 'href': 'https://api.spotify.com/v1/playlists/37i9dQZF1DWTQwRw56TKNc?additional_types=track',
 'id': '37i9dQZF1DWTQwRw56TKNc',
 'images': [{'height': None,
   'url': 'https://i.scdn.co/image/ab67706f00000003a83b6c9332a62267ed6d4f5f',
   'width': None}],
 'name': 'Mellow Classics',
 'owner': {'display_name': 'Spotify',
  'external_urls': {'spotify': 'https://open.spotify.com/user/spotify'},
  'href': 'https://api.spotify.com/v1/users/spotify',
  'id': 'spotify',
  'type': 'user',
  'uri': 'spotify:user:spotify'},
 'primary_color': '#ffffff',
 'public': True,
 'snapshot_id': 'MCwzYjk0ODlkNmNjMDk4Mzc3YzVjYTUzN2ZmZTBhYjE5ODZhZGNlNDE3',
 'tracks': {'href': 'https://api.spotify.com/v1/playlists/37i9dQZF1DWTQwRw56TKNc/tracks?offset=0&limit=100&additional_types=t

- iterate through each available category and pull a set of playlists

In [87]:
cat_playlists = []

for cat_id in list_of_cats:
    try:
        cats = sp.category_playlists(category_id=cat_id,country=current_market)
        cat_playlists.extend(cats['playlists']['items'])
    except:
        continue
    time.sleep(.1)

HTTP Error for GET to https://api.spotify.com/v1/browse/categories/0JQ5DAqbMKFF1br7dZcRtK/playlists with Params: {'country': 'US', 'limit': 20, 'offset': 0} returned 404 due to Not found.
HTTP Error for GET to https://api.spotify.com/v1/browse/categories/0JQ5DAqbMKFRNXsIvgZF9A/playlists with Params: {'country': 'US', 'limit': 20, 'offset': 0} returned 404 due to Not found.


In [88]:
len(cat_playlists)

911

- After pulling category playlists and feature playlists, we combine them into 1 item to pull their tracks in one go

In [96]:
general_pool_playlists = cat_playlists + featured_playlists

In [105]:
general_pool_playlists[0]['id']

'37i9dQZF1DXcBWIGoYBM5M'

___
### Getting Tracks from Featured and Category Playlists

- We now have a list of featured playlists available to the current user's country/market as well as category playlists based off of categories that the current user has access to from their country market. 
- We were able to get 922 playlists, we will:
    - use a list comprehension to get the ids for track scrubbing
    - remove duplicates
    - get tracks, append to our explicit and implicit tracks to create our giant pool of tracks
    - remove duplicates
    - get track audio features on the whole pool
    - get genres
    - try a clustering model and then some eda

In [110]:
gen_pool_ids = [playlist['id'] for playlist in general_pool_playlists if playlist is not None]

- Pull playlist items per playlist id from the general pool

In [114]:
%%time
gen_pool_tracks = []
for pid in gen_pool_ids:
    
    results = sp.playlist_items(pid)
    
    tracks = results['items']
    
    while results['next']:
        
        results = sp.next(results)
        
        tracks.extend(results['items'])
        
    gen_pool_tracks.extend(tracks)
    

CPU times: user 31.1 s, sys: 5.39 s, total: 36.5 s
Wall time: 8min 14s


- Note potential runtime problems with the application as this data collection took 8 minutes
- Might have to find other methods of data collection for current timeline

In [116]:
len(gen_pool_tracks)

94189

In [117]:
gen_pool_df = pd.DataFrame(flatten_tracks(gen_pool_tracks))

- Set user_liked to 0

In [118]:
gen_pool_df['user_liked'] = 0

In [119]:
gen_pool_df.head()

Unnamed: 0,id,track_name,artist,artist_uri,album_uri,album,release_date,popularity,explicit,user_liked
0,6ZZf5a8oiInHDkBe9zXfLP,Curtains,Ed Sheeran,spotify:artist:6eUKZXaKkcviH0Ku9w2n3V,spotify:album:2WFFcvzM0CgLaSq4MSkyZk,- (Deluxe),2023-05-05,76,False,0
1,0WtM2NBVQNNJLh6scP13H8,Calm Down (with Selena Gomez),Rema,spotify:artist:46pWGuE3dSwY3bMMXGBvVS,spotify:album:2b2GHWESCWEuHiCZ2Skedp,Calm Down (with Selena Gomez),2022-08-25,95,False,0
2,0yLdNVWF3Srea0uzk55zFn,Flowers,Miley Cyrus,spotify:artist:5YGY8feqx7naU7z4HrwZM6,spotify:album:7I0tjwFtxUwBC1vgyeMAax,Flowers,2023-01-13,98,False,0
3,1Qrg8KqiBpW07V7PNxwwwL,Kill Bill,SZA,spotify:artist:7tYKF4w9nC0nq9CsPZTHyP,spotify:album:1nrVofqDRs7cpWXJ49qTnP,SOS,2022-12-08,94,False,0
4,5w40ZYhbBMAlHYNDaVJIUu,Chemical,Post Malone,spotify:artist:246dkjvS1zLTtiykXe5h60,spotify:album:7qcSUc5Af63mhfTF60KTEA,Chemical,2023-04-14,89,True,0


In [120]:
gen_pool_df.to_csv('../data/general_pool_tracks_without_features.csv')

____
### Pull Track Audio Features and Genres
- Let's combine our explicit tracks, implicit tracks, and our general pool tracks into one large dataframe to try clustering as well as pulling genres

In [122]:
user_track_library = pd.concat([exp_imp_tracks, gen_pool_df])

In [124]:
user_track_library.shape

(99474, 10)

- resort values to have explicit tracks on top

In [140]:
user_track_library.sort_values(by='user_liked', ascending=False)

Unnamed: 0,id,track_name,artist,artist_uri,album_uri,album,release_date,popularity,explicit,user_liked
0,1nFtiJxYdhtFfFtfXBv06s,Something In The Way,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,73,False,1
1683,3BCe21LqWVOWWDfHjuoeeR,"Nightcall - From ""Drive""",Thibault Cauvin,spotify:artist:6d81rjlV6r9u8qPMAjavRV,spotify:album:3j4Luk2uwaiahexJJUIioz,"Nightcall (From ""Drive"")",2021-04-23,45,False,1
1676,19L0jgD9mr8QOy9kxr7LZS,Total Eclipse of the Heart (feat. Simply Three),Vitamin String Quartet,spotify:artist:6MERXsiRbur2oJZFgYRDKz,spotify:album:0sZwPaM40mIDElKRmnYvAb,Total Eclipse of the Heart (feat. Simply Three),2021-05-21,46,False,1
1677,6Pk7XpaiMsXAM6dXN3kQEo,Firework,Jeremy Green,spotify:artist:32jiRxDN9Nb9QbXh88uo42,spotify:album:7wBmPhdZlYiDa4h6rGDSbG,Firework,2022-01-21,43,False,1
1678,0zOI6bMCkjd10vCrDy3voc,Wellerman (arr. piano),Music Lab Collective,spotify:artist:1ylcY77FWeSVQKh5et1VGp,spotify:album:1f9nDMa1wlNsPxDtygkg4B,Wellerman (arr. piano),2021-07-30,48,False,1
...,...,...,...,...,...,...,...,...,...,...
34834,40WDnUnzQL4XTo81vUJlKt,All Of Me,Matt Hammitt,spotify:artist:0o77vi5tCsW348tzvdjNPw,spotify:album:57KfddAY3ffu3A3F7M5b0h,Every Falling Tear,2011-01-01,39,False,0
34833,5aIWSE31yrW7hbrjMpeJVl,you.,outr.cty,spotify:artist:44p6xbyBk8khm2UotlfH2w,spotify:album:6B7etnFvfuyy0H0cY7DsrD,you.,2022-02-14,19,False,0
34832,1LCiZ4QclxrAcuT1JhFM1D,Your Love,Blessing Offor,spotify:artist:55qfDfgj4Qi3JGe6KpqGtC,spotify:album:6BiS5wWmp1iWnRsxkq4BzD,My Tribe,2023-01-13,39,False,0
34831,1t3koreZmmgvTYvxATwqfz,Pilots,Andrew Ripp,spotify:artist:7oAskcd3mX9ZzxMPFHYqoN,spotify:album:3ziyaxQziaPPYaUYKl4uR0,The Soul,2019-11-29,35,False,0


- reset index

In [141]:
user_track_library.reset_index(inplace=True,drop=True)

In [142]:
user_track_library.head()

Unnamed: 0,id,track_name,artist,artist_uri,album_uri,album,release_date,popularity,explicit,user_liked
0,1nFtiJxYdhtFfFtfXBv06s,Something In The Way,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,73,False,1
1,5Ddlk6C2JVxb1SReZ6O1wk,Drain You,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,63,False,1
2,4l012k8ZcAdVbUvZ4kae5Q,Stay Away,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,56,False,1
3,2YodwKJnbPyNKe8XXSE9V7,Lithium,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,72,False,1
4,4P5KoWXOxwuobLmHXLMobV,Come As You Are,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,78,False,1


- drop potential track duplicates with preference for keeping the first record which will be the explicit tracks after resorting

In [143]:
user_track_library.drop_duplicates(subset=['id'],keep='first',inplace=True)

In [145]:
user_track_library.shape

(61854, 10)

In [146]:
user_track_library.to_csv('../data/user_track_library.csv')

In [185]:
sp = authenticate(scope='user-library-read')

- Pull Audio Track Features for each Track

In [148]:
%%time
user_track_library = get_track_audio_features(sp, user_track_library)

AttributeError: 'NoneType' object has no attribute 'keys'

In [149]:
%%time
# Use List Comprehension to get List of all track ids
track_ids = [track for track in user_track_library['id']]
# set list for audio features
audio_features = []
# iterate through list of track ids 100 at a time to account for api call limit
for i in range(0, len(track_ids), 100):
    # get batch of audio features for each chunk of 100
    audio_features_batch = sp.audio_features(tracks=track_ids[i:i+100])
    # add batch to audio features
    audio_features += audio_features_batch

CPU times: user 11.7 s, sys: 1.11 s, total: 12.8 s
Wall time: 2min 53s


In [154]:
len(audio_features)

61854

In [162]:
audio_features[0]

{'danceability': 0.427,
 'energy': 0.201,
 'key': 8,
 'loudness': -13.044,
 'mode': 1,
 'speechiness': 0.0317,
 'acousticness': 0.745,
 'instrumentalness': 0.263,
 'liveness': 0.109,
 'valence': 0.0668,
 'tempo': 105.218,
 'type': 'audio_features',
 'id': '1nFtiJxYdhtFfFtfXBv06s',
 'uri': 'spotify:track:1nFtiJxYdhtFfFtfXBv06s',
 'track_href': 'https://api.spotify.com/v1/tracks/1nFtiJxYdhtFfFtfXBv06s',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/1nFtiJxYdhtFfFtfXBv06s',
 'duration_ms': 232147,
 'time_signature': 4}

In [165]:
audio_feats = [af for af in audio_features if af is not None]

In [171]:
len(audio_feats)

61580

In [166]:
# turn audio features into a df
audio_features_df = pd.DataFrame(audio_feats)
# drop columns we dont need
audio_features_df = audio_features_df.drop(columns=['type', 'uri', 'track_href', 'analysis_url'])
# merge original df and audio features df
tracks_w_features = merger(user_track_library, audio_features_df)



In [167]:
tracks_w_features.shape

(61580, 23)

In [168]:
tracks_w_features.head()

Unnamed: 0,id,track_name,artist,artist_uri,album_uri,album,release_date,popularity,explicit,user_liked,...,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,1nFtiJxYdhtFfFtfXBv06s,Something In The Way,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,73,False,1,...,-13.044,1,0.0317,0.745,0.263,0.109,0.0668,105.218,232147,4
1,5Ddlk6C2JVxb1SReZ6O1wk,Drain You,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,63,False,1,...,-5.625,0,0.0695,0.000129,2e-06,0.184,0.198,133.358,223880,4
2,4l012k8ZcAdVbUvZ4kae5Q,Stay Away,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,56,False,1,...,-5.1,0,0.128,5e-06,0.0461,0.0822,0.205,165.21,211440,4
3,2YodwKJnbPyNKe8XXSE9V7,Lithium,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,72,False,1,...,-6.41,1,0.0381,0.00174,0.0,0.0631,0.485,123.207,257053,4
4,4P5KoWXOxwuobLmHXLMobV,Come As You Are,Nirvana,spotify:artist:6olE6TJLqED3rqDCT0FyPh,spotify:album:2guirTSEqLizK7j9i1MTTZ,Nevermind (Remastered),1991-09-26,78,False,1,...,-5.846,0,0.0388,0.00016,0.00161,0.0916,0.539,120.125,218920,4


In [172]:
tracks_w_features.to_csv('../data/user_lib_tracks_feats.csv')

### User Profile Track Library Complete!

___
## Version 2's attempt at filling in Genres: Ultimately Unsuccessful

- Final step in the get phase of this project is capturing genre and separating the lists of genres into one-hot-encoded like columns as each artist will bring back more than one genre
- Minor Update: retrieving genres has been difficult and slow with the given record size of ~61k, going to try my hand at ThreadPoolExecutor to help call the artist api multiple times to speed up the record processing
    - Was successful in creating a ThreadPoolExecutor but I am pretty sure i am now being throttled by the Spotify service after trying to run 8 simultaneous threads over the 61k records.
    - learned the limit was 180 calls per 30 secs...

- In both of the below sections: ThreadPoolExecutor and Pull Genres without TPE we ultimately were timed out of the Web API Service. 
- These are kept here for context as to different attempts at Pulling Genres.
- The code for successfully pulling Genres is in both the Web App as well as in item 03_Model_User_Data

___
### ThreadPoolExecutor

In [174]:
df = tracks_w_features.copy()

In [206]:
# Define a function to call the API on each chunk of artist URIs and create a df
def get_artist_genres(uri_list):
    # call artists api on uri_chunk
    results = sp.artists(uri_list)['artists']
    # extract genres and create a dataframe with the matching artist uri for merge later
    genres_df = pd.DataFrame({'artist_uri': [artist['uri'] for artist in results],
                                  'genres': [artist['genres'] for artist in results]})
    return genres_df

In [207]:
# https://more-itertools.readthedocs.io/en/stable/
chunk_size = 1000
chunks = list(chunked(df['artist_uri'], chunk_size))

In [212]:
%%time
# Create a ThreadPoolExecutor with the specified number of workers
# https://www.digitalocean.com/community/tutorials/how-to-use-threadpoolexecutor-in-python-3
# https://superfastpython.com/threadpoolexecutor-in-python/
# https://www.geeksforgeeks.org/how-to-use-threadpoolexecutor-in-python3/

with ThreadPoolExecutor(max_workers=2) as executor:
   
    # Submit tasks to executor
    futures = [executor.submit(get_artist_genres, chunk) for chunk in chunks]
    
    # Collect results as they become available
    results = []
    for future in futures:
        results.extend(future.result())


Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached


KeyboardInterrupt: 

Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached
Max Retries reached


___
### Pull Genres without TPE

In [222]:
sp = authenticate(scope='user-library-read')

In [217]:
batches = []
for i in range(0, df.shape[0], 50):
    start = time.time()
   
    batches.append(df['artist_uri'][i:i+50])

In [219]:
batches[0]

0     spotify:artist:6olE6TJLqED3rqDCT0FyPh
1     spotify:artist:6olE6TJLqED3rqDCT0FyPh
2     spotify:artist:6olE6TJLqED3rqDCT0FyPh
3     spotify:artist:6olE6TJLqED3rqDCT0FyPh
4     spotify:artist:6olE6TJLqED3rqDCT0FyPh
5     spotify:artist:6olE6TJLqED3rqDCT0FyPh
6     spotify:artist:6olE6TJLqED3rqDCT0FyPh
7     spotify:artist:3hyGGjxu73JuzBa757H6R5
8     spotify:artist:6V70yeZQCoSR2M3fyW8qiA
9     spotify:artist:5KeQyt1QJBjcutJ2AuLNO2
10    spotify:artist:4gzpq5DPGxSnKTe4SA8HAU
11    spotify:artist:1h6Cn3P4NGzXbaXidqURXs
12    spotify:artist:67tgMwUfnmqzYsNAtnP6YJ
13    spotify:artist:3C8RpaI3Go0yFF9whvKoED
14    spotify:artist:3534yWWzmxx8NbKVoNolsK
15    spotify:artist:0AkpPlFLnr0VQwZQeMGht0
16    spotify:artist:1gR0gsQYfi6joyO1dlp76N
17    spotify:artist:7gjAu1qr5C2grXeQFFOGeh
18    spotify:artist:0qudezVgvl4Chd9BgNFB83
19    spotify:artist:57anmI1X2hXWPrNagFdzZr
20    spotify:artist:6veTV9sF06FBf2KN0xAdvo
21    spotify:artist:57dN52uHvrHOxijzpIgu3E
22    spotify:artist:6vWDO969PvN

In [None]:
artists = sp.artists(batches[0])['artists']

In [215]:
%%time

genres = []

for i in range(0, df.shape[0], 50):
    start = time.time()
    print(i)
    batch_uris = df['artist_uri'][i:i+50]
    
    artists = sp.artists(batch_uris)['artists']
    
    for artist in artists:
        
        print(artist['genres'])
        genres.append(artist['genres'])
    
    end = time.time()
    print('loop time', end-start)
        
        

0


KeyboardInterrupt: 

In [213]:
%%time
df[['genres']] = df['artist_uri'].apply(lambda uri: pd.Series(sp.artist(uri)['genres']))

HTTP Error for GET to https://api.spotify.com/v1/artists/6olE6TJLqED3rqDCT0FyPh with Params: {} returned 401 due to The access token expired


SpotifyException: http status: 401, code:-1 - https://api.spotify.com/v1/artists/6olE6TJLqED3rqDCT0FyPh:
 The access token expired, reason: None

- Below is Version 1, which was the initial attempt at extracting data
- Kept this section in case you were curious how I started working with this data and how it ended up

___
___
# Version 1

Now, using the songs we've retrieved we can pull the audio features using our newly generated df and merge them to create my saved tracks with audio features:

In [4]:
liked_songs_features = handler("get_track_audio_features", my_liked_songs)

In [7]:
liked_songs_features.head()

Unnamed: 0,id,track_name,artist,artist_id,album_id,album,release_date,playlist_name,popularity,explicit,...,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,2Nw6tjb0euV6LApzN4fU0a,Good for You,Spacey Jane,6V70yeZQCoSR2M3fyW8qiA,3zZi1vy6CnNZX7lbcRJtXo,Sunlight,2020-06-12,Liked_Songs,60,True,...,-4.741,1,0.0385,5.6e-05,0.0106,0.165,0.931,174.962,174760,4
1,5RHY7WkAjAhpxuPN0CTd4F,True Lovers,Holy Holy,5KeQyt1QJBjcutJ2AuLNO2,6J8EzIkd1LeP07kIF77RNz,Paint,2017-02-24,Liked_Songs,54,False,...,-5.092,0,0.0467,0.433,0.0534,0.0903,0.378,139.031,259964,4
2,5J8m6w5VmswbMBYUAFf44t,Every Teardrop Is A Waterfall (Coldplay Vs. Sw...,Coldplay,4gzpq5DPGxSnKTe4SA8HAU,4ljisoNarj0BpQSMIEv88L,Until Now,2012-01-01,Liked_Songs,51,False,...,-5.703,1,0.0585,0.000635,0.00775,0.545,0.293,125.022,408829,4
3,0VffaI2jwQknRrxpECYHsF,Greyhound,Swedish House Mafia,1h6Cn3P4NGzXbaXidqURXs,4ljisoNarj0BpQSMIEv88L,Until Now,2012-01-01,Liked_Songs,64,False,...,-4.989,1,0.0521,0.00666,0.876,0.276,0.539,124.976,410097,4
4,2yWyFT6bW1Rd9cjVvYi4v8,Superstylin',Groove Armada,67tgMwUfnmqzYsNAtnP6YJ,1bS1J4OVGrpu6e2U2pHge6,Goodbye Country (Hello Nightclub),2001-07-11,Liked_Songs,63,False,...,-8.263,1,0.0558,0.00346,0.00131,0.259,0.928,128.967,360427,4


Success! We've also added a column to signify that the user actively liked these select tracks. Let's save that as a CSV for evaluating music taste using only my liked songs:

In [8]:
liked_songs_features.to_csv('../data/my_savedtracks_with_features.csv')

### Retrieving songs from playlists in my library

Depending on the user who logs in to the application, they might have more songs in their playlists than they do in their liked songs. Either way, it will be better to have more observations to train the model on when predicting songs they listen to.

In [50]:
liked_songs_features = pd.read_csv('../data/my_savedtracks_with_features.csv')

In [51]:
%%time
users_tracks_features = handler("get_playlist_tracks_df", liked_songs_features)

CPU times: user 1.42 s, sys: 215 ms, total: 1.63 s
Wall time: 21.3 s


- Above we called our get_playlist_tracks_df which:
    - pulls a users playlists and provides a list of playlist_ids and playlist names
    - pulls all tracks from a given playlist and extends them to a list
    - creates an observation for each track as a dictionary that mirrors our get liked tracks function
    - turns it into a dataframe to then pull the playlist tracks audio features
    - finally concatenates our liked songs with our playlist songs into the final df with the index reset.

In [52]:
users_tracks_features.drop(columns='Unnamed: 0', inplace=True)

- We remove the Unnamed: 0 column from the returned total tracks

In [53]:
users_tracks_features.isnull().sum()

id                  0
track_name          1
artist              1
artist_id           0
album_id            0
album               1
release_date        0
playlist_name       0
popularity          0
explicit            0
user_liked          0
danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
duration_ms         0
time_signature      0
dtype: int64

- appears to be a couple null values, potentially all within the same row, let's check it out

In [54]:
users_tracks_features[users_tracks_features['track_name'].isnull()]

Unnamed: 0,id,track_name,artist,artist_id,album_id,album,release_date,playlist_name,popularity,explicit,...,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
104,4LLvxxkWtt818FNO3cbsdo,,,0LyfQWJT6nXafLPZqxe9Of,6HBa9wXaZG8WIkhBY8p4aT,,0,Liked_Songs,0,False,...,-3.28,1,0.0635,0.0939,4e-06,0.105,0.877,145.037,211034,4


- Let's see what we get when we run the track_id through the get single track api endpoint
- Update: When I opened the link to open the song in Spotify Web App, the song was no longer available, thus I will drop this record from my observations

In [55]:
sp = authenticate(scope='user-library-read')

In [56]:
sp.track(track_id='4LLvxxkWtt818FNO3cbsdo')

{'album': {'album_group': 'single',
  'album_type': 'single',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/0LyfQWJT6nXafLPZqxe9Of'},
    'href': 'https://api.spotify.com/v1/artists/0LyfQWJT6nXafLPZqxe9Of',
    'id': '0LyfQWJT6nXafLPZqxe9Of',
    'name': 'Various Artists',
    'type': 'artist',
    'uri': 'spotify:artist:0LyfQWJT6nXafLPZqxe9Of'}],
  'available_markets': [],
  'external_urls': {'spotify': 'https://open.spotify.com/album/6HBa9wXaZG8WIkhBY8p4aT'},
  'href': 'https://api.spotify.com/v1/albums/6HBa9wXaZG8WIkhBY8p4aT',
  'id': '6HBa9wXaZG8WIkhBY8p4aT',
  'images': [],
  'name': '',
  'release_date': '0000',
  'release_date_precision': 'year',
  'total_tracks': 1,
  'type': 'album',
  'uri': 'spotify:album:6HBa9wXaZG8WIkhBY8p4aT'},
 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/0LyfQWJT6nXafLPZqxe9Of'},
   'href': 'https://api.spotify.com/v1/artists/0LyfQWJT6nXafLPZqxe9Of',
   'id': '0LyfQWJT6nXafLPZqxe9Of',
   'name

In [57]:
users_tracks_features.shape

(3618, 24)

In [58]:
users_tracks_features.dropna().shape

(3617, 24)

In [59]:
users_tracks_features.dropna(inplace=True)

In [60]:
users_tracks_features.isnull().sum().sum()

0

- Now that null values have been taken care of, because Liked Songs will most likely be selected from different playlists the users have listened to and the same songs can exist across similar playlists, there will no doubt be duplicate values
- Let's identify the number of duplicates and then take steps to get rid of rows we don't need and keep the records we want to keep.
    - in this case, records that are under our Liked Songs Playlist

In [61]:
users_tracks_features['id'].duplicated().value_counts()

False    2541
True     1076
Name: id, dtype: int64

- As expected we have duplicate tracks (1074), let's create a copy of our tracks df sorting based on the 'user_liked' feature we added denoting that a user actively selected an item.
- We will sort by DESC order so that when we run drop_duplicates, we can keep the first record which if duplicated, will be in the liked playlist

In [62]:
cleaned_of_dupes = users_tracks_features.sort_values(by='user_liked', ascending=False).copy()

In [63]:
cleaned_of_dupes = cleaned_of_dupes.drop_duplicates(subset='id', keep='first')

In [64]:
cleaned_of_dupes.shape

(2541, 24)

In [65]:
cleaned_of_dupes[cleaned_of_dupes['user_liked']==1].shape

(634, 24)

- We have succesfully retained all of our liked songs (635 - null value) and removed all other duplicates for a total record of ~2500 tracks.
- Using this cleaned df let's save to csv and jump into EDA!

In [48]:
cleaned_of_dupes.to_csv('../data/allmy_tracks_with_features.csv')

In [1]:
import get_tracks_methods
import json

In [48]:
sp = get_tracks_methods.authenticate(scope='user-library-read')

Please authorize the app by visiting this URL: https://accounts.spotify.com/authorize?client_id=fd5183b4728840a989108098987ef843&response_type=code&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2Fcallback&scope=user-library-read


Enter the authorization code:  AQDCqhJ-KDFVFblmmIy8fXALb-0_jTKDOByOd95cK7MKbanOhg2RUUPBgGz-T2hACZXcmIkp0FpeQe3S6SuyO9XTRN5Pb8C1J4xkURH_npvigJ4fSpASILg9ZWHdjYwhtcSTs2nryTHUpq1-x3eDKqyQaB85qeoqADl-B_EAn72HI3VgXPm-5lji2RTg0dEBtvUuYRk


In [49]:
# get lists of playlist ids and names
playlist_ids, playlist_names = get_tracks_methods.get_playlists(sp)

In [50]:
len(playlist_ids), len(playlist_names)

(28, 28)

In [51]:
playlist_tracks = []
# loop through zippled playlist ids and names
for playlist_id, playlist_name in zip(playlist_ids, playlist_names):
    # get tracks from playlist
    playlist_tracks.extend(get_tracks_methods.get_all_playlist_tracks(sp, playlist_id))

In [52]:
len(playlist_tracks)

2396

In [53]:
type(playlist_tracks)

list

In [54]:
type(playlist_tracks[0])

dict

In [55]:
playlist_tracks[0]['track']['id']

'1pUsdir2xhSxP0RyBe9lLH'

In [56]:
flattened_tracks = []
for i in range(0, len(playlist_tracks)+1):
    # print(i)
    # print(playlist_tracks[i]["track"]["id"])
    try:
        # flattened_track = {"id": playlist_tracks[i]["track"]["id"]}
        # flattened_tracks.append(flattened_track)
        flattened_track = {
                "id": playlist_tracks[i]["track"]["id"],
                "track_name": playlist_tracks[i]["track"]["name"],
                "artist": playlist_tracks[i]["track"]["artists"][0]["name"],
                "artist_id": playlist_tracks[i]["track"]["artists"][0]["id"],
                "album_id": playlist_tracks[i]["track"]["album"]["id"],
                "album": playlist_tracks[i]["track"]["album"]["name"],
                "release_date": playlist_tracks[i]["track"]["album"]["release_date"],
                "playlist_name": playlist_name,
                "popularity": playlist_tracks[i]["track"]["popularity"],
                "explicit": playlist_tracks[i]["track"]["explicit"],
                "user_liked": 0
            }
        flattened_tracks.append(flattened_track)
    except:
        continue
    
    
    

In [57]:
len(flattened_tracks)

2391

In [58]:

# turn flattened tracks into pandas df and return
playlist_tracks_df = pd.DataFrame(flattened_tracks)

playlist_tracks_df.head()


# # track_features = handler('get_track_audio_features',playlist_tracks_df)

# # final_df = merger(df, track_features)

Unnamed: 0,id,track_name,artist,artist_id,album_id,album,release_date,playlist_name,popularity,explicit,user_liked
0,1pUsdir2xhSxP0RyBe9lLH,Icarus,Madeon,4pb4rqWSoGUgxm63xmJ8xc,3uKLwDjku2Us0c81LEmftR,Adventure (Deluxe),2015-03-30,Fleet Foxes,49,False,0
1,3H3cOQ6LBLSvmcaV7QkZEu,Aerodynamic,Daft Punk,4tZwfgrHOc3mvqYlEYSvVi,2noRn2Aes5aoNVsU6iWThc,Discovery,2001-03-12,Fleet Foxes,62,False,0
2,7sLnn2UOttrEPBZSFyIsYh,Wolfgang's 5th Symphony,Wolfgang Gartner,3534yWWzmxx8NbKVoNolsK,3kDILp9v0rDVPMAJUQYbZx,Back Story,2012-09-04,Fleet Foxes,43,False,0
3,3ezkJgagRPZ39KCTrKcSI7,Ghosts 'n' Stuff (feat. Rob Swire),deadmau5,2CIMQHirSU0MQqyYHq0eOx,3eNZDL2rqTVvmiC1f0yFwF,For Lack of a Better Name (The Extended Mixes),2009,Fleet Foxes,61,False,0
4,0VffaI2jwQknRrxpECYHsF,Greyhound,Swedish House Mafia,1h6Cn3P4NGzXbaXidqURXs,4ljisoNarj0BpQSMIEv88L,Until Now,2012-01-01,Fleet Foxes,63,False,0


In [59]:
track_ids = [track for track in playlist_tracks_df['id']]

In [60]:
track_ids

['1pUsdir2xhSxP0RyBe9lLH',
 '3H3cOQ6LBLSvmcaV7QkZEu',
 '7sLnn2UOttrEPBZSFyIsYh',
 '3ezkJgagRPZ39KCTrKcSI7',
 '0VffaI2jwQknRrxpECYHsF',
 '3EfOzBsC8in9ZYuD3NR2v9',
 '2JUNwx3qKeKiNntIKK2rBg',
 '3ERVrhNx8p2I3xY9RomH9t',
 '4QwMCXtsoCdIrK7oNJkhGW',
 '5PtEpuVX03k9bOUwilL5EO',
 '4Xw9TXNrLhzp2JBx3X8j9l',
 '64y85LeHY8Z6OlTOM9cpKD',
 '2xeqCSNWOe59rGppUKggeF',
 '7bKkIsUZ6sb5Lt73oNTLUw',
 '11IWX2lE69q4Nt8B0ttt1L',
 '27Es1J9GNKf0pZLy3sdzjl',
 '5ZHts4IatJgG1ZIaIN3qIL',
 '7cMFjxhbXBpOlais7KMF3j',
 '49X0LAl6faAusYq02PRAY6',
 '0DiWol3AO6WpXZgp0goxAV',
 '33yAEqzKXexYM3WlOYtTfQ',
 '6MpRH2AODF7OM2Md1zzaEV',
 '4OlWYCYNtAkWdQUiQRA97f',
 '5noQJkpVfHt2D4df2GXieV',
 '4wSmqFg31t6LsQWtzYAJob',
 '0wP9okoDWmbeC2w9E8ZzPu',
 '72iqZG1zy55rXQPBBB4a21',
 '2JGIjJEohXvvphtxTqStha',
 '6fLGuyj9ITWTp4TolAeCaR',
 '4ZVanZJ1HJdxYd8ABLXGUM',
 '1ol7LFeUEsEvG1s1n07wGn',
 '1ScXXDRjg8Xxx2VqWs9Kus',
 '42pAisqMtyzuWjMzts9jRH',
 '39CKNNJskTBGnLGOTmgERN',
 '2YKm2APgaXNsseFrizTsKM',
 '5CF0XiE4EnI37za2l3WZmH',
 '0DC6XJuyJIotOK74ahqHEo',
 

In [26]:
playlist_tracks[0]['track']['id']
playlist_tracks[0]["track"]["name"]
playlist_tracks[0]["track"]["artists"][0]["name"]
playlist_tracks[0]["track"]["artists"][0]["id"]

'4EVpmkEwrLYEg6jIsiPMIb'