<a id="home"></a>
# Alexaurus Recs Recommenders

### Recommenders explored in this notebook. 

[Track Based](#track_based)

[Artist Based](#artist_based)

[One-Click](#one_click)

[Playlist Based - work in progress](#playlist_based)

[Depricated Artist Based](#depricated_artist_based)

# Log-In
Client ID and Client Secret are application identification codes provided by the [Spotify Developers Program](https://developer.spotify.com/). These codes are removed from my code before pushing to GitHub.

In [10]:
## REMOVE CLIENT_ID AND CLIENT_SECRET BEFORE PUSHING ##
#CLIENT_ID
#CLIENT_SECRET

In [11]:
# This code is all of the necessary imports and a quick way to access my personal data
# using Spotipy. 
import spotipy
from spotipy import util
from spotipy.oauth2 import SpotifyClientCredentials, SpotifyOAuth
import pandas as pd
import matplotlib.pyplot as plt
import urllib

USERNAME = 'elw86ve5g5t944wwlef6qyzu3' # User id
TEST_USERNAME = '4mxn9ir0m2w6bggldqxpte1v2' # Test User Name

SCOPE = 'playlist-modify-public user-top-read'
LOCAL_REDIRECT_URI = 'http://127.0.0.1:8080'
REDIRECT_URI = 'https://alexaurusrecs.herokuapp.com/'

# Requesting access token
token = util.prompt_for_user_token(username=USERNAME,
                                   scope=SCOPE,
                                   client_id=CLIENT_ID,
                                   client_secret=CLIENT_SECRET,
                                   redirect_uri=LOCAL_REDIRECT_URI) 
# Instantiating OAuth object
spotify = spotipy.Spotify(auth=token)

<a id='track_based'></a>
# Track Based Recommender
[Back to Top](#home)

### Overview: 

The track based recommender will retreive base recommendations from Spotify based on what the user inputs as parameters. I will request the audio features for each one of the recommended tracks. Learn more about the audio features [HERE](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/). 

From there, I will pull the user's top listened-to tracks. I will pull the audio features from them as well.

Lastly, I will use scikit-learn's `cosine_similarity`and `pairwise_distances`to compare the audio features between the two sets of tracks and return a list of tracks that most similar.  

### Let's get started

In [5]:
# Imports
from sklearn.metrics.pairwise import pairwise_distances, cosine_distances, cosine_similarity
from sklearn.preprocessing import StandardScaler
import pandas as pd
pd.set_option("display.precision", 14)

Below is a spotify [URI](https://community.spotify.com/t5/Spotify-Answers/What-s-a-Spotify-URI/ta-p/919201). Essentially, it's a code used by Spotify to identify artists, playlists, tracks and more. 

I will use the track Blockbuster Pt. 1 by Run the Jewels as my test track. Listen to it [here](https://open.spotify.com/album/4Loc7NtCAo9mypHO6kbviD?highlight=spotify:track:5jQYkYhoOlBW4vJ2l4TCxl)

**The scenario is that I LOVE this song. I want to hear more just like it. Let's find some good recommendations!**

In [6]:
# Test track from Run the Jewels.
blockbuster = 'spotify:track:5jQYkYhoOlBW4vJ2l4TCxl' 

In [14]:
# Returns a dictionary of recommended tracks. These are the base recommendations.
recs = spotify.recommendations(seed_tracks = [blockbuster], limit=20)
type(recs)

dict

In [15]:
# Track names and URIs of base recommendations
rec_track_names = [track['name'] for track in recs['tracks']]
uris = [track['uri'] for track in recs['tracks']]
rec_track_names

['Millions',
 'The Space Program',
 'Trouble in Paradise',
 'Blue Suede',
 'Connect Four',
 'Reagan',
 'Put Jewels on It',
 'Father Sister Berzerker',
 'Easy Rider',
 'Kill Jill (feat. Killer Mike & Jeezy)',
 'Acid Raindrops',
 'Rings',
 'Lie, Cheat, Steal',
 'Alphabet Aerobics',
 'Untouchable',
 'Wolf Like Me',
 'Drowning',
 'War Ready',
 'Above The Clouds',
 'Bar Breaker']

At this point in the recommender there is a user **decison-point**. The user can designate that we not use their listening history. If this is the case, we return the URIs of the base recommendations. But where's the fun in that!!🎺🎶🎵

Below, you can see a dataframe that includes the audio features. Danceability, energy, etc. These features are what we are going to be measure to the user's top tracks

In [17]:
# Creating audio features dataframe of recommended tracks
rec_df = pd.DataFrame.from_dict(spotify.audio_features(uris))
rec_df['track_name'] = rec_track_names
rec_df['uri'] = uris
rec_df.head(3)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name
0,0.667,0.747,4,-4.918,0,0.277,0.0484,0.0,0.108,0.281,136.931,audio_features,7clUOYJb0WsAgFQEgDxSIT,spotify:track:7clUOYJb0WsAgFQEgDxSIT,https://api.spotify.com/v1/tracks/7clUOYJb0WsA...,https://api.spotify.com/v1/audio-analysis/7clU...,309947,4,Millions
1,0.612,0.705,1,-5.898,1,0.352,0.427,0.0,0.658,0.533,101.876,audio_features,203xmWRHAyqwW6AkGkhhVM,spotify:track:203xmWRHAyqwW6AkGkhhVM,https://api.spotify.com/v1/tracks/203xmWRHAyqw...,https://api.spotify.com/v1/audio-analysis/203x...,341040,4,The Space Program
2,0.814,0.851,7,-6.725,1,0.212,0.155,0.00251,0.232,0.245,110.013,audio_features,4AHo9T0MVpNTPybmv0ITfJ,spotify:track:4AHo9T0MVpNTPybmv0ITfJ,https://api.spotify.com/v1/tracks/4AHo9T0MVpNT...,https://api.spotify.com/v1/audio-analysis/4AHo...,126945,4,Trouble in Paradise


In [18]:
# Fetching a dictionary of user's top tracks
user_tracks = spotify.current_user_top_tracks(limit=15, time_range='medium_term')
type(user_tracks)

dict

In [19]:
# Saving lists of track names and URIs of user top tracks
user_track_names = [track['name'] for track in user_tracks['items']]
uris = [track['uri'] for track in user_tracks['items']]
user_track_names # How embarassing 😳

['Blockbuster Night, Pt. 1',
 'When I Grow Up',
 'The Dark',
 'No Excuses',
 'Nate',
 'Returns',
 'PAID MY DUES',
 'The Search',
 'Leave Me Alone',
 'Blood // Water',
 'The Visitor',
 'Let Me Go',
 'Venom - Music From The Motion Picture',
 'Only',
 'Talk to Me']

In [20]:
# Creating audio features dataframe of user top_tracks
user_df = pd.DataFrame.from_dict(spotify.audio_features(uris))
user_df['track_name'] = user_track_names
user_df['uri'] = uris
user_df.head(1)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name
0,0.701,0.525,1,-7.938,0,0.382,0.0353,2.55e-06,0.0615,0.385,75.693,audio_features,5jQYkYhoOlBW4vJ2l4TCxl,spotify:track:5jQYkYhoOlBW4vJ2l4TCxl,https://api.spotify.com/v1/tracks/5jQYkYhoOlBW...,https://api.spotify.com/v1/audio-analysis/5jQY...,152253,4,"Blockbuster Night, Pt. 1"


<a id='artist_reference'></a>
We are slicing our data frames to only include numeric columns to be compared.

In [23]:
# Setting new dataframes of only numeric features to be compared
rec_df_numeric = rec_df.loc[:, rec_df.columns[:11]]
user_df_numeric = user_df.loc[:, user_df.columns[:11]]
rec_df_numeric.head(1)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,0.667,0.747,4,-4.918,0,0.277,0.0484,0.0,0.108,0.281,136.931


Here is another **decision-point** for the user. At this point a user may determine specifically what they like about the song. If they only like the "beat", we may only include danceability, energy and tempo. If they like the message behind the song, we may only include features such as valence, mode and speechiness. 

Unfortunately, this user choice is not implemented in the app yet due to my lack of front-end web development skills. Coming soon! 

### Now we compare the two dataframes

In [25]:
# Comparing the two dataframes using pairwise_distances and setting metric to cosine
comps = pairwise_distances(rec_df_numeric, user_df_numeric, metric='cosine')

# Creating dataframe of these distances and setting the columns to the user_tracks and index to recommendation tracks
comps_df = pd.DataFrame(comps, columns=user_track_names, index=rec_track_names)

# Pairwise calculates cosine distance, but we want similiarity.
comps_df = 1 - comps_df

# Dropping any duplicate tracks
comps_df.drop_duplicates(inplace=True)

comps_df.head(3)

Unnamed: 0,"Blockbuster Night, Pt. 1",When I Grow Up,The Dark,No Excuses,Nate,Returns,PAID MY DUES,The Search,Leave Me Alone,Blood // Water,The Visitor,Let Me Go,Venom - Music From The Motion Picture,Only,Talk to Me
Millions,0.99750199236698,0.99988936622154,0.99928910934323,0.99969668028341,0.99490343489182,0.99898916838312,0.99983087569199,0.99986462571067,0.99963036013202,0.9999483253942,0.99777219905774,0.99954564228239,0.99953972564343,0.99935245754999,0.99947921314963
The Space Program,0.99882923579216,0.99971043624963,0.99948685227837,0.99956354001272,0.99281667873934,0.99834751832921,0.99953952809482,0.99979926361571,0.99935042684243,0.99927006893044,0.99898588436804,0.99906472091026,0.99971712573011,0.99813693400984,0.99895211104388
Trouble in Paradise,0.99774200287826,0.99866710443145,0.99942652038497,0.99786476290929,0.99770140663709,0.99991324570446,0.99823742792039,0.99867898557355,0.9976224357366,0.99934551862418,0.99815421778617,0.9998757155409,0.99769662672427,0.99961334431696,0.99992206406963


Now we have a cosine similarity score for each pair of base recommendation and user top track. This model assumes that you like the songs you listen to the most. 

Next, for each user track, we will find the highest similarity score and add that to our list of recommendations.

In [26]:
song_list = []

for user_track in comps_df.columns:
    max_cosine = comps_df[user_track].max()
    song_name = comps_df[comps_df[user_track] == max_cosine].index[0]
    song_uri = list(rec_df.loc[rec_df['track_name'] == song_name, 'uri'])[0]
    #print(f'{user_track}------ closest to ---------> {song_name}')
    song_list.append((song_name, song_uri))

In [27]:
# This adds unique URIs. Drops songs if they are duplicates.

uri_list= []

[uri_list.append(track[1]) for track in song_list if track[1] not in uri_list]

[None, None, None, None, None, None, None]

In [30]:
# List of songs that have the highest scores to my top tracks and their asscociated URIs
song_list

[('Acid Raindrops', 'spotify:track:4MbV8zrWudQflnbiIzp29t'),
 ('Rings', 'spotify:track:3SVqCyJ7ahjO2kAa4XhGUG'),
 ('Acid Raindrops', 'spotify:track:4MbV8zrWudQflnbiIzp29t'),
 ('Rings', 'spotify:track:3SVqCyJ7ahjO2kAa4XhGUG'),
 ('Bar Breaker', 'spotify:track:6LyzqxbVyIUoDtr4dYshIl'),
 ('Connect Four', 'spotify:track:7uer04FLTB7mt2ImHP6XjE'),
 ('Rings', 'spotify:track:3SVqCyJ7ahjO2kAa4XhGUG'),
 ('Rings', 'spotify:track:3SVqCyJ7ahjO2kAa4XhGUG'),
 ('Rings', 'spotify:track:3SVqCyJ7ahjO2kAa4XhGUG'),
 ('Put Jewels on It', 'spotify:track:01zffoSenvp9JnSbl0UgMa'),
 ('Acid Raindrops', 'spotify:track:4MbV8zrWudQflnbiIzp29t'),
 ('Kill Jill (feat. Killer Mike & Jeezy)',
  'spotify:track:6npcJbgDhtWeNT5iOGsjZI'),
 ('Rings', 'spotify:track:3SVqCyJ7ahjO2kAa4XhGUG'),
 ('Father Sister Berzerker', 'spotify:track:3ROmYjCVAJeoqi5ozYFi6J'),
 ('Connect Four', 'spotify:track:7uer04FLTB7mt2ImHP6XjE')]

At this point, you can copy and paste the URIs into a blank Spotify playlist. My app will create the playlist for the user so that step is eliminated. Please see "Create Playlist" function in app_utilities.ipynb for more information.

Below is code to perform the same calculations as above, but this time it includes a standard scaler. I hypothesized that the scaled data would give better recommendations, but I have found that not to be the case. I may explore this more in the future.

In [33]:
# Scaled data
ss = StandardScaler()
rec_df_scaled = pd.DataFrame(ss.fit_transform(rec_df_numeric.values), columns=rec_df_numeric.columns, index=rec_df_numeric.index)
user_df_scaled = pd.DataFrame(ss.transform(user_df_numeric.values), columns=user_df_numeric.columns, index=user_df_numeric.index)
comps_scaled = pairwise_distances(rec_df_scaled, user_df_scaled, metric='cosine')
comps_df_scaled = pd.DataFrame(comps_scaled, columns=user_track_names, index=rec_track_names)
comps_df_scaled = 1 - comps_df_scaled
comps_df_scaled.drop_duplicates(inplace=True)
song_list_scaled = []

for user_track in comps_df_scaled.columns:
    max_cosine = comps_df_scaled[user_track].max()
    song_name = comps_df_scaled[comps_df_scaled[user_track] == max_cosine].index[0]
    song_uri = list(rec_df.loc[rec_df['track_name'] == song_name, 'uri'])[0]
    #print(f'{user_track}------ closest to ---------> {song_name}')
    song_list_scaled.append((song_name, song_uri))
    
uri_list_scaled = []
[uri_list_scaled.append(track[1]) for track in song_list_scaled if track[1] not in uri_list_scaled]
song_list_scaled

[('Drowning', 'spotify:track:70bx1d7tQbVm768tZKbRhn'),
 ('Acid Raindrops', 'spotify:track:4MbV8zrWudQflnbiIzp29t'),
 ('War Ready', 'spotify:track:2HwqshjCe5Tuvq2s6Fx3K5'),
 ('Drowning', 'spotify:track:70bx1d7tQbVm768tZKbRhn'),
 ('Untouchable', 'spotify:track:4AHZRMJCpscmxygCNtC2Qq'),
 ('Bar Breaker', 'spotify:track:6LyzqxbVyIUoDtr4dYshIl'),
 ('Millions', 'spotify:track:7clUOYJb0WsAgFQEgDxSIT'),
 ('Acid Raindrops', 'spotify:track:4MbV8zrWudQflnbiIzp29t'),
 ('Above The Clouds', 'spotify:track:3ZBSXNYdTZVaBUQI3E2rF6'),
 ('Millions', 'spotify:track:7clUOYJb0WsAgFQEgDxSIT'),
 ('War Ready', 'spotify:track:2HwqshjCe5Tuvq2s6Fx3K5'),
 ('Kill Jill (feat. Killer Mike & Jeezy)',
  'spotify:track:6npcJbgDhtWeNT5iOGsjZI'),
 ('Rings', 'spotify:track:3SVqCyJ7ahjO2kAa4XhGUG'),
 ('Kill Jill (feat. Killer Mike & Jeezy)',
  'spotify:track:6npcJbgDhtWeNT5iOGsjZI'),
 ('Rings', 'spotify:track:3SVqCyJ7ahjO2kAa4XhGUG')]

Below is a function that includes all of the above code with additional parameters for the base recommendations.

In [3]:
def recommend_songs(spotify, artists=None, genres=None, tracks=None, limit=100, n_tracks=10, listener_based=True, user_time_range='medium_term', drop_cols=[]):
    '''
    This function will recommend songs based on a seed artist, seed genres and/or seed tracks.
    The searched songs will be compared to user top tracks and return only those songs with the 
    highest cosine similarity.
   
    Parameters:
    
    spotify: OAuth user instance
    
    artists: list of Spotify artist URIs
    
    genres: list of seed genres
    
    tracks: list of Spotify track URIs
    
    limit: number of basic Spotify recommendations for comparison
    
    n_tracks: number of tracks to pull from user listening history
    
    listener_based: boolean, if True, function will pull user data and compare to base recommendations. If False, function will return n_tracks of base recommendations
    
    time_range: the listening history time range to pull from user listening data
    
    drop_cols: columns to drop from numeric featue comparison. 
    Choose from:    danceability
                    energy
                    key
                    loudness
                    mode
                    speechiness
                    acousticness
                    instrumentalness
                    liveness
                    valence
                    tempo
    
    '''
    # Instantiating history
    history = {'no_listening_history': False, 'insufficient_history':False, 'listener_based':listener_based}

    
    # Fetching Spotify recommendations 
    recs = spotify.recommendations(seed_artists=artists, seed_genres=genres, seed_tracks=tracks, limit=limit)
    
    # Saving lists of track names and URIs of recommended tracks
    rec_track_names = [track['name'] for track in recs['tracks']]
    uris = [track['uri'] for track in recs['tracks']]
    
    # Creating audio features dataframe of recommended tracks
    rec_df = pd.DataFrame.from_dict(spotify.audio_features(uris))
    rec_df['track_name'] = rec_track_names
    rec_df['uri'] = uris
    
    # If not listener based, returning list of URIs for Spotify recommended songs
    if not listener_based:
        return (list(rec_df[:n_tracks]['uri']), history)
    
    else: # If listener based
        time_ranges = ['short_term', 'medium_term', 'long_term']
        time_ranges.remove(user_time_range)

        items = spotify.current_user_top_tracks(limit=n_tracks, time_range=user_time_range)


        user_uris = [item['uri'] for item in items['items']]
        user_track_names = [item['name'] for item in items['items']]


        for time_range in time_ranges:

            if len(user_uris) >= n_tracks:
                break
            items = spotify.current_user_top_tracks(limit=n_tracks, time_range=time_range)

            user_uris += [item['uri'] for item in items['items']]
            extra_names = [item['name'] for item in items['items']]
            user_track_names += extra_names
            history['insufficient_history'] = True


        if len(user_uris) == 0:
            history['no_listening_history'] = True
            return (list(rec_df[:n_tracks]['uri']), history)


        elif len(user_uris) > 0 and len(user_uris) < n_tracks:
            pass  

        else:  
            user_uris[:n_tracks]
            user_track_names[:n_tracks]

        
        # Creating audio features dataframe of user top_tracks
        user_df = pd.DataFrame.from_dict(spotify.audio_features(user_uris))
        user_df['track_name'] = user_track_names
        user_df['uri'] = user_uris
        
        # Setting new dataframes of only numeric features to be compared
        rec_df_numeric = rec_df.loc[:, rec_df.columns[:11]]
        user_df_numeric = user_df.loc[:, user_df.columns[:11]]
        
        # If drop_cols, dropping appropriate columns from each numeric dataframe
        if drop_cols:
            rec_df_numeric.drop(drop_cols, inplace=True, axis=1)
            user_df_numeric.drop(drop_cols, inplace=True, axis=1)
              
        
        comps = pairwise_distances(rec_df_numeric, user_df_numeric, metric='cosine')
        comps_df = pd.DataFrame(comps, columns=user_track_names, index=rec_track_names)
        comps_df = 1 - comps_df
        comps_df.drop_duplicates(inplace=True)
        
        song_list = []
    
        for user_track in comps_df.columns:
            max_cosine = comps_df[user_track].max()
            song_name = comps_df[comps_df[user_track] == max_cosine].index[0]
            song_uri = list(rec_df.loc[rec_df['track_name'] == song_name, 'uri'])[0]
            #print(f'{user_track}------ closest to ---------> {song_name}')
            song_list.append((song_name, song_uri))

        uri_list = []
        [uri_list.append(track[1]) for track in song_list if track[1] not in uri_list]
    
    
        return uri_list, history

### Let's test `recommend_songs`!

In [7]:
recs, hist = recommend_songs(spotify,
                             tracks=[blockbuster],
                             n_tracks=12,
                             )

<a id="artist_based"></a>
# Artist Based Recommender
[Back to Top](#home)

**Overview**: 

The goal with this recommender is to return songs from a target artist that closely match your listening taste. 

In [4]:
# Test Artists URIs
logslaught = 'spotify:artist:1I471vwcRhqQl6QonGZlen'
grandson = 'spotify:artist:4ZgQDCtRqZlhLswVS6MHN4'
run_the_jewels = 'spotify:artist:4RnBFZRiMLRyZy0AzzTg2C'
hatebreed = 'spotify:artist:17Mb968quDHpjCkIyq30QV'
jxdn = 'spotify:artist:6Y64EaNqpqcZYTgs4c76gF'
killer_mike = 'spotify:artist:2N4EYkIlG1kv25g6Wv8LGI'
glass_animals = 'spotify:artist:4yvcSjfu4PC0CYQyLy4wSq'

Our function will have the capabilities to handle multiple artists, but for this demonstration we will only use one. Please refer to the function to see how this is implemented.

First we need to ping the Spotify API for the maximum allowed number of albums for a given artist. In this case, our favorite, [Run the Jewels](https://open.spotify.com/artist/4RnBFZRiMLRyZy0AzzTg2C)

In [30]:
albums = spotify.artist_albums(artist_id=run_the_jewels, limit=50)
artist = spotify.artist(run_the_jewels)['name']

In [31]:
# List of all album URIs
album_uris = []
for album in albums['items']:
        album_uris.append(album['uri'])

        # Display the first four URIs
album_uris[:4]

['spotify:album:6cx4GVNs03Pu4ZczRnWiLd',
 'spotify:album:74LenWKjMdGa2qJiI1sxwT',
 'spotify:album:0kx0GhFSzqxYDeUiyyWl8i',
 'spotify:album:0DLkBrivkPhPGeSowPUyde']

The code snippen below will iterates through all of the albums, pulls the artists from each track (some albums are collaborative and list more than one artist. We will solve address this later), pulls all of the tracks from each of the albums and grabs the audio features and renders them in a pandas dataframe. From there, we append each album dataframe to a list called `df_list`. 

In [32]:
df_list = []
# Iterate through the album URIs
for uri in album_uris:
    # Fetching album JSON
    album = spotify.album(uri)
    
    # Fetching album name
    album_name = album['name']
    
    # Fetching all artists on album
    album_artist = ', '.join([artist['name'] for artist in album['artists']])
    
    # Fetching all of the track URIs on the album
    tracks = spotify.album_tracks(uri)
    
    # Listing track URIs and names
    track_uris = [track['uri'] for track in tracks['items']]
    track_names = [track['name'] for track in tracks['items']]

    # Fetching audio features from the track URIs and assigning track name, album name and album artist
    audio_features_df = pd.DataFrame.from_dict(spotify.audio_features(track_uris))
    audio_features_df['track_name'] = track_names
    audio_features_df['album_name'] = album_name
    audio_features_df['artist_name'] = album_artist

    # Appending dataframe to list to concatenate
    df_list.append(audio_features_df)

Below we concatenate all of the album dataframes in `df_list` and now we have a dataframe named `artist_df` that includes all of the audio features for each one of their tracks

In [33]:
# Concatenating dataframes, resetting index and dropping duplicates
artist_df = pd.concat(df_list)
artist_df.reset_index(inplace=True, drop=True)
artist_df.drop_duplicates(inplace=True)

Below we are filtering the dataframe to only include tracks from artists that are listed in the initial artist list. Some albums are collaborative or are a "Best Of" etc. We want to filter out this noise. 

In [36]:
artist_df = artist_df[artist_df['artist_name'].str.contains(artist)]
artist_df.head(3)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,...,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name,album_name,artist_name
0,0.595,0.878,10,-5.63,0,0.423,0.172,0.0,0.303,0.746,...,audio_features,1EjzcBTVLV7ATtdsQwyV31,spotify:track:1EjzcBTVLV7ATtdsQwyV31,https://api.spotify.com/v1/tracks/1EjzcBTVLV7A...,https://api.spotify.com/v1/audio-analysis/1Ejz...,146573,4,yankee and the brave (ep. 4),RTJ4,Run The Jewels
1,0.686,0.824,11,-5.413,1,0.34,0.157,0.0,0.242,0.405,...,audio_features,5taqLrLouA4vCjM7ZQpEtW,spotify:track:5taqLrLouA4vCjM7ZQpEtW,https://api.spotify.com/v1/tracks/5taqLrLouA4v...,https://api.spotify.com/v1/audio-analysis/5taq...,183827,4,ooh la la (feat. Greg Nice & DJ Premier),RTJ4,Run The Jewels
2,0.584,0.722,9,-7.733,1,0.28,0.0183,1.41e-05,0.143,0.685,...,audio_features,2uxudaBcJamtfgvUjSDdkZ,spotify:track:2uxudaBcJamtfgvUjSDdkZ,https://api.spotify.com/v1/tracks/2uxudaBcJamt...,https://api.spotify.com/v1/audio-analysis/2uxu...,201133,4,out of sight (feat. 2 Chainz),RTJ4,Run The Jewels


### Now lets get a user's tracks!


In our main function we will parameterize `user_time_range`. The user will be given the opportunity to select the time range of their listening history to pull. These songs will provide the foundation of the recommendations. I have included provisions in the case the user has insufficient listening history. For now, we will assume the user chose `medium_term` as their time range

In [37]:
# User selected time range
user_time_range = 'medium_term'
n_tracks = 15

# List of all possible time ranges
time_ranges = ['short_term', 'medium_term', 'long_term']

# We are removing the user selected time range from the list
time_ranges.remove(user_time_range)

# Fetching user data
items = spotify.current_user_top_tracks(limit=n_tracks, time_range=user_time_range)

# Listing the URIs and names of tracks
user_uris = [item['uri'] for item in items['items']]
names = [item['name'] for item in items['items']]

In the case we have insufficient listening history in the user designated time frame, we will loop through other time ranges to see if we can pull more tracks.

In [38]:
# Setting a dictionary called history. This will help us relay any issues we had when pulling the data to the user.
history = {'no_listening_history': False, 'insufficient_history':False}

# Checks to see if we have pulled sufficient data
for time_range in time_ranges: # Looping through additional time ranges other than user selected
    
    # If we have enough data, cancel this loop.
    if len(user_uris) >= n_tracks:
        break
    # If we don't have enough data try another time range
    items = spotify.current_user_top_tracks(limit=n_tracks, time_range=time_range)
    
    # Add the new URIs and track names to our list and update dictionary
    user_uris += [item['uri'] for item in items['items']]
    extra_names = [item['name'] for item in items['items']]
    names += extra_names
    history['insufficient_history'] = True

In the case the user has no lisenting history, we will simpy return the top tracks from user selected artist and inform the user they have no listening history.

In [39]:
# In the case the user has no listening history
if len(user_uris) == 0:
    uris = []
    # Updating dictionary to inform user
    history['no_listening_history'] = True
    
    for artist in artists:
        # Adding artist's top tracks to URIs list 
        for track in spotify.artist_top_tracks(artist)['tracks']:
            uris.append(track['uri'])
    print("User does not have any listening history, so check out the current URIs for recommendations!")


# In the case we have more URIs than requested, we limit to n_tracks
else:
    print('User has enough history. Keep going!')
    user_uris[:n_tracks]
    names[:n_tracks]

User has enough history. Keep going!


Now we have fetched the user's top tracks, let's get the audio features for them and create a dataframe along with the names of the tracks to compare them to `artist_df`

In [40]:
# Pulling audio features
audio_features =  spotify.audio_features(user_uris)

user_df = pd.DataFrame.from_dict(audio_features)

user_df['track_name'] = names

user_df.head(2)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name
0,0.701,0.525,1,-7.938,0,0.382,0.0353,2.55e-06,0.0615,0.385,75.693,audio_features,5jQYkYhoOlBW4vJ2l4TCxl,spotify:track:5jQYkYhoOlBW4vJ2l4TCxl,https://api.spotify.com/v1/tracks/5jQYkYhoOlBW...,https://api.spotify.com/v1/audio-analysis/5jQY...,152253,4,"Blockbuster Night, Pt. 1"
1,0.817,0.814,2,-3.985,1,0.185,0.32,0.0,0.152,0.247,109.994,audio_features,5rLyYxZNzca00ENADO9m54,spotify:track:5rLyYxZNzca00ENADO9m54,https://api.spotify.com/v1/tracks/5rLyYxZNzca0...,https://api.spotify.com/v1/audio-analysis/5rLy...,196800,4,When I Grow Up


### We've got an artist's tracks and a user's tracks. Lets compare!

This comparison is the same as it was in the earlier function. [Refer here for a detailed explanation of what is happening](#artist_reference)

In [41]:
from sklearn.metrics.pairwise import pairwise_distances, cosine_distances, cosine_similarity
pd.set_option("display.precision", 14)

artist_track_names = artist_df['track_name']
user_track_names = user_df['track_name']

artist_df_numeric = artist_df.loc[:, artist_df.columns[:11]]
user_df_numeric = user_df.loc[:, user_df.columns[:11]]

recs = pairwise_distances(artist_df_numeric, user_df_numeric, metric='cosine')
rec_df = pd.DataFrame(recs, columns=user_track_names, index=artist_track_names)    
rec_df = 1 - rec_df
rec_df.drop_duplicates(inplace=True)

song_list = []

for user_track in rec_df.columns:
    max_cosine = rec_df[user_track].max()
    song_name = rec_df[rec_df[user_track] == max_cosine].index[0]
    song_uri = list(artist_df.loc[artist_df['track_name'] == song_name, 'uri'])[0]
    # Uncomment this line to see which songs are highly similar to the recs
    #print(f'{user_track}------ closest to ---------> {song_name}')
    song_list.append((song_name, song_uri))

uri_list = []
[uri_list.append(track[1]) for track in song_list if track[1] not in uri_list]

song_list

[('Blockbuster Night, Pt. 1', 'spotify:track:6zqYITGsQw58qXz31AtQah'),
 ('Pew Pew Pew - Live From SXSW / 2015',
  'spotify:track:5D8iavYbqpalfnXmUXtFf8'),
 ('All My Life', 'spotify:track:4oc6T2mb67AAVxI7b4E12u'),
 ('Get It', 'spotify:track:63i0xlsKuhhHqt9fmpRrMJ'),
 ('yankee and the brave (ep. 4)', 'spotify:track:1EjzcBTVLV7ATtdsQwyV31'),
 ('Hey Kids', 'spotify:track:2ZPHQZkhWNNjKQ32d8aimv'),
 ('Get It', 'spotify:track:63i0xlsKuhhHqt9fmpRrMJ'),
 ('Pew Pew Pew - Bonus Track', 'spotify:track:1UwINkY4CL6d0WXgmgdGtv'),
 ('Get It', 'spotify:track:63i0xlsKuhhHqt9fmpRrMJ'),
 ('Twin Hype Back', 'spotify:track:6oGoJ9RJynv7gLaJB7sffR'),
 ('Run the Jewels', 'spotify:track:3mdo1znA61tI2RVALup1ft'),
 ('Down - Z Dot UK Remix', 'spotify:track:0C5sm79K5QIvJkOgreMpjp'),
 ('walking in the snow', 'spotify:track:2pVvB487ZwqdzTxEvEEors'),
 ('ooh la la (feat. Greg Nice & DJ Premier)',
  'spotify:track:5taqLrLouA4vCjM7ZQpEtW'),
 ('Talk to Me', 'spotify:track:0QCymIumn4YPTlKOuvDsS1')]

### Above is a list of track names and URIs!
### Below is a list of the URIs. You can copy and paste into a blank Spotify playlist if you want!

In [84]:
for uri in uri_list:
    print(uri)

spotify:track:1FhEnu3jv0C8xmmRa5NaDO
spotify:track:6l3A9Ima9k7CJcYW9blLHr
spotify:track:5z4wkMNcAfuP1UepmQqnTS
spotify:track:5fz3Tk9DSeU6bZ9DGOU90s
spotify:track:6AZCWuUUX2sENiv3LQx9Hv
spotify:track:1puA4Pm3IjZXmbNQuQCpcx
spotify:track:7CkNqgLg1bqHx3VDuYKajn
spotify:track:5NZk1lwVltQgTjAUMeyPji
spotify:track:5TAvKTuQJWUKlUB28rIMBV
spotify:track:5XHMYz8T6NMU6HMY6Uuwlv
spotify:track:2DAtzxIPYBS0VufIHQf4Sn
spotify:track:5ZyrhE0HVHpqsPv5tIPJwm
spotify:track:4qqo3We57x7IVNbIDIqi8r
spotify:track:1nxJCRe6QHzqpVG6s2curN
spotify:track:3GDoiGO9UYGpfJ3COskb1r
spotify:track:1KiXrqjasckljFNL1uvCNy
spotify:track:6aBBj5d0cXWvrytgSSpXqp
spotify:track:2leqPZQH0lXdRpQWIVLGlG
spotify:track:779IcCL3iyv7WmeR1KX1Kw


**AWESOME!** We now have some cool songs to listen to by [Run the Jewels](https://open.spotify.com/artist/4RnBFZRiMLRyZy0AzzTg2C)

### Below is all of the above code compiled into a function.

In [46]:
def artist_based_recs(spotify, artists, n_tracks=15, user_time_range='medium_term'):
    '''
    Combo function
    '''
    
    ########### ARTIST TRACKS ###########

    uris = []
    album_uris = []
    df_list = []
    artist_names = []
    
    # Checking if user input artists as a list. 
    if type(artists) != list:
        # Rectifying if user did not input list
        artists = [artists]
    
    # Iterate through artists
    for artist in artists:
        artist_names.append(spotify.artist(artist)['name'])
        # Fetches the albums JSONs
        albums = spotify.artist_albums(artist, limit=25)
        # Iterating through the albums and appending the artist URIs
        for album in albums['items']:
            album_uris.append(album['uri'])
    
    # Iterate through the album URIs
    for uri in album_uris:
        # Fetching album JSON
        album = spotify.album(uri)
        # Fetching album name
        album_name = album['name']
        # Fetching all artists on album
        album_artist = ', '.join([artist['name'] for artist in album['artists']])
            
        # Fetching all of the track URIs on the album
        tracks = spotify.album_tracks(uri)
        # Listing track URIs and names
        track_uris = [track['uri'] for track in tracks['items']]
        track_names = [track['name'] for track in tracks['items']]
        
        # Fetching audio features from the track URIs and assigning track name, album name and album artist
        try:
            audio_features_df = pd.DataFrame.from_dict(spotify.audio_features(track_uris))
            audio_features_df['track_name'] = track_names
            audio_features_df['album_name'] = album_name
            audio_features_df['artist_name'] = album_artist
        except:
            pass
        
        # Appending dataframe to list to concatenate
        df_list.append(audio_features_df)
    
    # Concatenating dataframes, resetting index and dropping duplicates
    artist_df = pd.concat(df_list)
    artist_df.reset_index(inplace=True, drop=True)
    artist_df.drop_duplicates()
    
    # This is a string that will be used in our filter that includes only rows that have 1 or more of the artists in the listed artists
    filter_artists = '|'.join(artist_names)
    # Filtering our dataframe using our string
    artist_df = artist_df[artist_df['artist_name'].str.contains(filter_artists)]
    
    ########### USER TRACKS ###########
    
    time_ranges = ['short_term', 'medium_term', 'long_term']
    time_ranges.remove(user_time_range)
    
    items = spotify.current_user_top_tracks(limit=n_tracks, time_range=user_time_range)

    
    user_uris = [item['uri'] for item in items['items']]
    names = [item['name'] for item in items['items']]

    history = {'no_listening_history': False, 'insufficient_history':False}
    
    for time_range in time_ranges:
        
        if len(user_uris) >= n_tracks:
            break
        items = spotify.current_user_top_tracks(limit=n_tracks, time_range=time_range)
        
        user_uris += [item['uri'] for item in items['items']]
        extra_names = [item['name'] for item in items['items']]
        names += extra_names
        history['insufficient_history'] = True
   
    
    if len(user_uris) == 0:
        uris = []
        history['no_listening_history'] = True
        for artist in artists:
            for track in spotify.artist_top_tracks(artist)['tracks']:
                uris.append(track['uri'])
        return (uris, history)
    
    elif len(user_uris) > 0 and len(user_uris) < n_tracks:
        pass  
    
    else:  
        user_uris[:n_tracks]
        names[:n_tracks]
        
    
    audio_features =  spotify.audio_features(user_uris)

    user_df = pd.DataFrame.from_dict(audio_features)
    
    # Setting track names
    user_df['track_name'] = names
    
    ########### COMPARE TRACKS ###########

    artist_track_names = artist_df['track_name']
    user_track_names = user_df['track_name']
    
    artist_df_numeric = artist_df.loc[:, artist_df.columns[:11]]
    user_df_numeric = user_df.loc[:, user_df.columns[:11]]
    
    recs = pairwise_distances(artist_df_numeric, user_df_numeric, metric='cosine')
    rec_df = pd.DataFrame(recs, columns=user_track_names, index=artist_track_names)    
    rec_df = 1 - rec_df
    rec_df.drop_duplicates(inplace=True)
    
    song_list = []
    
    for user_track in rec_df.columns:
        max_cosine = rec_df[user_track].max()
        song_name = rec_df[rec_df[user_track] == max_cosine].index[0]
        song_uri = list(artist_df.loc[artist_df['track_name'] == song_name, 'uri'])[0]
        #print(f'{user_track}------ closest to ---------> {song_name}')
        song_list.append((song_name, song_uri))
        
    uri_list = []
    [uri_list.append(track[1]) for track in song_list if track[1] not in uri_list]
    
    return uri_list, history


### Let's test `artist_based_recs`!

In [47]:
%%time

# Let's time it to see how long it takes
recs, hist = artist_based_recs(spotify, 
                               artists=glass_animals, 
                               n_tracks=25, 
                               user_time_range='short_term')

CPU times: user 437 ms, sys: 36.7 ms, total: 473 ms
Wall time: 7.93 s


In [48]:
for rec in recs[:5]:
    print(rec)

spotify:track:0rRjGruFonCGOt0S5zAJNQ
spotify:track:3mRBPZbnwsmddN0JH5BNVl
spotify:track:7GHqPV9RCRofSXh68htAkg
spotify:track:37adYGaYaAWTGhBaOzX4Fh
spotify:track:09ml9rVHNtfvr6jJZPrcPX


In [49]:
hist

{'no_listening_history': False, 'insufficient_history': False}

**AWESOME!** We now have some cool songs to listen to by [Glass Animals](https://open.spotify.com/artist/4yvcSjfu4PC0CYQyLy4wSq)<a id="one_click"></a>
# One Click Recommender
[Back to Top](#home)

I received some feedback that it would be nice to be able to create a playlist with one click. There is no data science here, but here is a fun little function that will do this for you!

In [155]:
def one_click_rec(spotify):
    '''
    Create a basic recommended Spotify playlist based on your listening history.
    If listening history is not deep enough, base playlist off of current Spotify Editor's recommended playlist.
    Returns a list of spotify URI codes of recommended songs
    
    Parameters:
        
        spotify: OAuth user authentication object
        
        limit: number of tracks to pull from listening history
        
    Returns:
        
        List of Spotify URI codes
        
    '''
    time_ranges = ['short_term', 'medium_term', 'long_term']
    listening_history=True
    uris = []
    for time_range in time_ranges:
        uris += [uri['uri'] for uri in spotify.current_user_top_tracks(limit=5, time_range=time_range)['items']]
    
    # In the case of no user listening history, we will return 
    if not uris:
        playlist_id = spotify.featured_playlists(limit=1)['playlists']['items'][0]['id']
        playlist = spotify.playlist(playlist_id, additional_types=('track', ))
        uris = [uri['track']['uri'] for uri in playlist['tracks']['items'][:5]]
        listening_history=False

    if len(uris) > 5:
        uris = uris[:5]
        
    # Request 20 recommended songs for the URIs designated above
    recs = spotify.recommendations(seed_tracks=uris, limit=20)
    rec_uris = [uri['uri'] for uri in recs['tracks']]
    return (rec_uris, listening_history)


In [156]:
recs, hist = one_click_rec(spotify)

In [29]:
for rec in recs:
    print(rec)

spotify:track:1Os2VQpm9aPGEY7yJXlMIP
spotify:track:3oFsFnNkVP9SU8K8KKVUTm
spotify:track:6Ndu96RYi1fvFBMNCYLIbT
spotify:track:3yxtsBzVlDBzRoBhHS3TpK
spotify:track:23GWbiEWeZ1IteR9ijD3hn
spotify:track:2bEwQoKOQ0iGc0beqAmUhP
spotify:track:2DuWtGyGAhc9GIhLCkerZH
spotify:track:6nebUmkUoc7fITIKPjS47K
spotify:track:1D8vZWYS4MaZzHiGgHT7xA
spotify:track:3r0mb1tzAcCNoA8YRXuIhZ
spotify:track:3JyvSSU0VnlMUsQckyEVfX
spotify:track:3pShTDa5E1bPBkrc1mxxGY
spotify:track:0WaWNS2ePQ79aARfDi6cQ2
spotify:track:2dmPVqFpmhNuggEB3niC9S
spotify:track:7noHzxuNmeIwYftylZfVRT
spotify:track:3Uf1MYX9CNC5RO8IEfG8ME
spotify:track:5ZUHZsPJtq5ymmcdB3g3Vb
spotify:track:0uBNqy6taLbUqTy605rCXV


In [158]:
hist

False

<a id="playlist_based"></a>
# Playlist Recommender
[Back to Top](#home)
### Work in Progress

In [75]:
playlist_id = spotify.featured_playlists(limit=1)['playlists']['items'][0]['id']
playlist = spotify.playlist(playlist_id, additional_types=('track', ))
uris = [uri['track']['uri'] for uri in playlist['tracks']['items'][:5]]
uris

['spotify:track:4R59wt5nnhYo88PIu3cUIt',
 'spotify:track:1MKT2Vf6qowCN48KlH0aCA',
 'spotify:track:1fYouLdK3PkOMPnx4CPxTY',
 'spotify:track:0NeJjNlprGfZpeX2LQuN6c',
 'spotify:track:2f6pqUyFcs3NUSoz49H9nw']

In [97]:
spotify.current_playback()

{'device': {'id': '61132c7ae806abd04502be3994fcc2408fb6fd88',
  'is_active': True,
  'is_private_session': False,
  'is_restricted': False,
  'name': 'Alex’s MacBook Pro',
  'type': 'Computer',
  'volume_percent': 100},
 'shuffle_state': False,
 'repeat_state': 'off',
 'timestamp': 1609623733109,
 'context': {'external_urls': {'spotify': 'https://open.spotify.com/playlist/4qOSWuAVcnM4lgsLAzg2Pm'},
  'href': 'https://api.spotify.com/v1/playlists/4qOSWuAVcnM4lgsLAzg2Pm',
  'type': 'playlist',
  'uri': 'spotify:playlist:4qOSWuAVcnM4lgsLAzg2Pm'},
 'progress_ms': 8574,
 'item': {'album': {'album_type': 'album',
   'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/3Uobr6LgQpBbk6k4QGAb3V'},
     'href': 'https://api.spotify.com/v1/artists/3Uobr6LgQpBbk6k4QGAb3V',
     'id': '3Uobr6LgQpBbk6k4QGAb3V',
     'name': 'I Prevail',
     'type': 'artist',
     'uri': 'spotify:artist:3Uobr6LgQpBbk6k4QGAb3V'}],
   'available_markets': ['AD',
    'AE',
    'AL',
    'AR',
    'A

In [87]:
spotify.search(q="killer+mike", type='artist')

{'artists': {'href': 'https://api.spotify.com/v1/search?query=killer%2Bmike&type=artist&offset=0&limit=10',
  'items': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2N4EYkIlG1kv25g6Wv8LGI'},
    'followers': {'href': None, 'total': 216904},
    'genres': ['alternative hip hop',
     'atl hip hop',
     'conscious hip hop',
     'political hip hop',
     'southern hip hop'],
    'href': 'https://api.spotify.com/v1/artists/2N4EYkIlG1kv25g6Wv8LGI',
    'id': '2N4EYkIlG1kv25g6Wv8LGI',
    'images': [{'height': 640,
      'url': 'https://i.scdn.co/image/9deec28b59f385038d2af72a5d04f5e89a6be0e7',
      'width': 640},
     {'height': 320,
      'url': 'https://i.scdn.co/image/3d2d43da6add82929dcc3abd92992a513da96567',
      'width': 320},
     {'height': 160,
      'url': 'https://i.scdn.co/image/a42cb10deb80a9ffffdbfb6a25a3df338aa7c3a9',
      'width': 160}],
    'name': 'Killer Mike',
    'popularity': 57,
    'type': 'artist',
    'uri': 'spotify:artist:2N4EYkIlG1kv25g6Wv

In [98]:
uri = ['a'] * 4
n_tracks = 15
if len(uri) > 0 and len(uri) < n_tracks:
    print(len(uri))
else:
    print(len(uri))
    print(len(uri[:n_tracks]))
    


4


In [96]:
test = {'artist_name': ['killer mike, grandson, blah', 
                        'killer mike', 
                        'blah', 
                        'no good', 
                        'grandson'
                        ]}
artist_names = ['blah', 'killer mike', 'grandson']

In [75]:
test_df = pd.DataFrame(test)

In [76]:
artist_names = '|'.join(artist_names)
test_df = test_df[test_df['artist_name'].str.contains(artist_names)]
test_df

Unnamed: 0,artist_name
0,"killer mike, grandson, blah"
1,killer mike
2,blah
4,grandson


<a id="depricated_artist_based"></a>
# Depricated Artist Based Recommender
[Back to Top](#home)

**Note**: This depricated version is essentially the same as the current recommender in production but less efficient. This version does not include provisions for a lack of user history.

**Overview**: 

The goal with this recommender is to return songs from a target artist that closely match your listening taste. I've broken this recommender into 3 functions:

- `get_artist_tracks`: pulls all of the songs from an artist and returns the audio features
- `get_user_tracks`: pulls a designated amount of tracks from a user's listening history and returns 
- `compare_songs`: pairwise comparison of the two previous functions and returns as list of URIs 

In [35]:
# Test Artists URIs
logslaught = 'spotify:artist:1I471vwcRhqQl6QonGZlen'
grandson = 'spotify:artist:4ZgQDCtRqZlhLswVS6MHN4'
run_the_jewels = 'spotify:artist:4RnBFZRiMLRyZy0AzzTg2C'
hatebreed = 'spotify:artist:17Mb968quDHpjCkIyq30QV'
jxdn = 'spotify:artist:6Y64EaNqpqcZYTgs4c76gF'
killer_mike = 'spotify:artist:2N4EYkIlG1kv25g6Wv8LGI'

First we need to ping the Spotify API for the maximum allowed number of albums for a given artist. In this case, our favorite, [Run the Jewels](https://open.spotify.com/artist/4RnBFZRiMLRyZy0AzzTg2C)

In [36]:
albums = spotify.artist_albums(artist_id=killer_mike, limit=25)
artist = spotify.artist(killer_mike)['name']

**Note**: the limit of albums in the above cell significantly affects runtime. 50 albums (the maxiumum allowed) may take up to 15 seconds. I have set it to 25 

In [37]:
# List of all album URIs
album_uris = []
for album in albums['items']:
        album_uris.append(album['uri'])

        # Display the first four URIs
album_uris[:4]

['spotify:album:6FtqZFgeAT84TJDpN2qbGk',
 'spotify:album:1gvkFTPF1rKpNnbfzitx5T',
 'spotify:album:5EAhUoAz1G3WTvIfGZvmrh',
 'spotify:album:0I2VFkosuGRfni7LnmlxYB']

The code snippen below will iterates through all of the albums, pulls the artists from each track (some are collaborative and there are more than one), pulls all of the tracks from each of the albums and grabs the audio features and renders them in a pandas dataframe. From there, we append each album dataframe to a list called `df_list`. 

In [38]:
df_list = []
# Iterate through the album URIs
for uri in album_uris:
    # Fetching album JSON
    album = spotify.album(uri)
    # Fetching album name
    album_name = album['name']
    # Fetching all artists on album
    album_artist = ', '.join([artist['name'] for artist in album['artists']])
 
    # Fetching all of the track URIs on the album
    tracks = spotify.album_tracks(uri)
    # Listing track URIs and names
    track_uris = [track['uri'] for track in tracks['items']]
    track_names = [track['name'] for track in tracks['items']]

    # Fetching audio features from the track URIs and assigning track name, album name and album artist
    audio_features_df = pd.DataFrame.from_dict(spotify.audio_features(track_uris))
    audio_features_df['track_name'] = track_names
    audio_features_df['album_name'] = album_name
    audio_features_df['artist_name'] = album_artist

    # Appending dataframe to list to concatenate
    df_list.append(audio_features_df)

Below we concatenate all of the album dataframes in `df_list` and now we have a dataframe named `df_artist` that includes all of the audio features for each one of their tracks

In [39]:
# Concatenating dataframes, resetting index and dropping duplicates
df_artist = pd.concat(df_list)
df_artist.reset_index(inplace=True, drop=True)
df_artist.drop_duplicates()
df_artist.head(3)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,...,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name,album_name,artist_name
0,0.499,0.746,7,-4.755,1,0.242,0.00725,1.58e-05,0.354,0.331,...,audio_features,4qqo3We57x7IVNbIDIqi8r,spotify:track:4qqo3We57x7IVNbIDIqi8r,https://api.spotify.com/v1/tracks/4qqo3We57x7I...,https://api.spotify.com/v1/audio-analysis/4qqo...,150720,4,Intro,Killer Mike Chronicles,Killer Mike
1,0.626,0.707,1,-4.195,1,0.151,0.00115,0.0,0.414,0.36,...,audio_features,1zQ6gVnBgAL4mxonTyeW3h,spotify:track:1zQ6gVnBgAL4mxonTyeW3h,https://api.spotify.com/v1/tracks/1zQ6gVnBgAL4...,https://api.spotify.com/v1/audio-analysis/1zQ6...,206960,4,10 G's,Killer Mike Chronicles,Killer Mike
2,0.671,0.725,2,-4.294,0,0.076,0.234,0.0,0.248,0.711,...,audio_features,1c009df7ymU5ZAXViGNgdh,spotify:track:1c009df7ymU5ZAXViGNgdh,https://api.spotify.com/v1/tracks/1c009df7ymU5...,https://api.spotify.com/v1/audio-analysis/1c00...,204360,4,Can You Hear Me,Killer Mike Chronicles,Killer Mike


### This is all of the code above compiled in a function

In [86]:
def get_artist_tracks(sp, artists, n_albums=50):
    '''
    Function takes in the Spotify URI of one or more artists and returns a Pandas dataframe with Spotify's proprietary audio features.
    
    Parameters:
    sp: Spotipy user authentication object. 
    artists: list of one or more artist URIs
    n_albums: number of albums to fetch per artist. 1-50 acceptable range
    
    
    '''
    # Empty list to hold the track uris, album uris and dataframes to concatenate
    uris = []
    album_uris = []
    df_list = []
    artist_names = []
    
    # Checking if user input artists as a list. 
    if type(artists) != list:
        # Rectifying if user did not input list
        artists = [artists]
    
    # Iterate through artists
    for artist in artists:
        artist_names.append(spotify.artist(artist)['name'])
        # Fetches the albums JSONs
        albums = sp.artist_albums(artist, limit=n_albums)
        # Iterating through the albums and appending the artist URIs
        for album in albums['items']:
            album_uris.append(album['uri'])
    
    # Iterate through the album URIs
    for uri in album_uris:
        # Fetching album JSON
        album = sp.album(uri)
        # Fetching album name
        album_name = album['name']
        # Fetching all artists on album
        album_artist = ', '.join([artist['name'] for artist in album['artists']])
        
        # Fetching all of the track URIs on the album
        tracks = sp.album_tracks(uri)
        # Listing track URIs and names
        track_uris = [track['uri'] for track in tracks['items']]
        track_names = [track['name'] for track in tracks['items']]
        
        # Fetching audio features from the track URIs and assigning track name, album name and album artist
        audio_features_df = pd.DataFrame.from_dict(sp.audio_features(track_uris))
        audio_features_df['track_name'] = track_names
        audio_features_df['album_name'] = album_name
        audio_features_df['artist_name'] = album_artist
        
        # Appending dataframe to list to concatenate
        df_list.append(audio_features_df)
    
    # Concatenating dataframes, resetting index and dropping duplicates
    df = pd.concat(df_list)
    df.reset_index(inplace=True, drop=True)
    df.drop_duplicates()
    
     # This is a string that will be used in our filter that includes only rows that have 1 or more of the artists in the listed artists
    filter_artists = '|'.join(artist_names)
    # Filtering our dataframe using our string
    df = df[df['artist_name'].str.contains(filter_artists)]
    return df

### Let's test `get_artist_tracks`!

In [87]:
artist_df = get_artist_tracks(spotify, killer_mike)
artist_df.head(3)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,...,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name,album_name,artist_name
0,0.499,0.746,7,-4.755,1,0.242,0.00725,1.58e-05,0.354,0.331,...,audio_features,4qqo3We57x7IVNbIDIqi8r,spotify:track:4qqo3We57x7IVNbIDIqi8r,https://api.spotify.com/v1/tracks/4qqo3We57x7I...,https://api.spotify.com/v1/audio-analysis/4qqo...,150720,4,Intro,Killer Mike Chronicles,Killer Mike
1,0.626,0.707,1,-4.195,1,0.151,0.00115,0.0,0.414,0.36,...,audio_features,1zQ6gVnBgAL4mxonTyeW3h,spotify:track:1zQ6gVnBgAL4mxonTyeW3h,https://api.spotify.com/v1/tracks/1zQ6gVnBgAL4...,https://api.spotify.com/v1/audio-analysis/1zQ6...,206960,4,10 G's,Killer Mike Chronicles,Killer Mike
2,0.671,0.725,2,-4.294,0,0.076,0.234,0.0,0.248,0.711,...,audio_features,1c009df7ymU5ZAXViGNgdh,spotify:track:1c009df7ymU5ZAXViGNgdh,https://api.spotify.com/v1/tracks/1c009df7ymU5...,https://api.spotify.com/v1/audio-analysis/1c00...,204360,4,Can You Hear Me,Killer Mike Chronicles,Killer Mike


### Now lets get a user's tracks!
We are going to go through the same process, but with a few less steps. Looking forward, we are going to parameterize `limit` in the `get_user_tracks` function so the user has a choice of how many songs to pull and compare to the artist's tracks, but for now we will leave it at 20 tracks. We will do the same for `time_range`, for now we will leave it at `'medium_term'`.

In [42]:
# Fetching user's top tracks
user_tracks = spotify.current_user_top_tracks(limit=20, time_range='medium_term')

In [43]:
# Fetching audio features from user's top tracks
audio_features =  spotify.audio_features([item['uri'] for item in user_tracks['items']])

Now we have fetched the user's top tracks and the audio features for them, we need to create a dataframe along with the names of the tracks to compare them

In [44]:
# Listing names of tracks
names = [item['name'] for item in user_tracks['items']]

# Creating dataframe of audio features
df_user = pd.DataFrame.from_dict(audio_features)

# Setting track names
df_user['track_name'] = names

df_user.head(2)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name
0,0.701,0.525,1,-7.938,0,0.382,0.0353,2.55e-06,0.0615,0.385,75.693,audio_features,5jQYkYhoOlBW4vJ2l4TCxl,spotify:track:5jQYkYhoOlBW4vJ2l4TCxl,https://api.spotify.com/v1/tracks/5jQYkYhoOlBW...,https://api.spotify.com/v1/audio-analysis/5jQY...,152253,4,"Blockbuster Night, Pt. 1"
1,0.817,0.814,2,-3.985,1,0.185,0.32,0.0,0.152,0.247,109.994,audio_features,5rLyYxZNzca00ENADO9m54,spotify:track:5rLyYxZNzca00ENADO9m54,https://api.spotify.com/v1/tracks/5rLyYxZNzca0...,https://api.spotify.com/v1/audio-analysis/5rLy...,196800,4,When I Grow Up


### This is all of the above code in a function

In [45]:
def get_user_tracks(sp, limit=10, time_range='medium_term'):
    '''
    Fetches a Spotify user's top tracks and returns a dataframe of 
    Spotify audio features
    
    Parameters:
    sp: Spotipy user authentication object  
    limit: number of user tracks to fetch
    time_range: choose from 'short_term', 'medium_term' and 'long_term'
    '''
    
    # Fetching user's top tracks
    user_tracks = sp.current_user_top_tracks(limit=limit, time_range=time_range)
    # Fetching audio features from user's top tracks
    audio_features =  sp.audio_features([item['uri'] for item in user_tracks['items']])
    # Listing names of tracks
    names = [item['name'] for item in user_tracks['items']]
    # Creating dataframe of audio features
    df = pd.DataFrame.from_dict(audio_features)
    # Setting track names
    df['track_name'] = names
    return df

### Let's test `get_user_tracks`!

In [46]:
df_user = get_user_tracks(spotify, limit=20, time_range='long_term')
df_user.head(2)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name
0,0.701,0.525,1,-7.938,0,0.382,0.0353,2.55e-06,0.0615,0.385,75.693,audio_features,5jQYkYhoOlBW4vJ2l4TCxl,spotify:track:5jQYkYhoOlBW4vJ2l4TCxl,https://api.spotify.com/v1/tracks/5jQYkYhoOlBW...,https://api.spotify.com/v1/audio-analysis/5jQY...,152253,4,"Blockbuster Night, Pt. 1"
1,0.817,0.814,2,-3.985,1,0.185,0.32,0.0,0.152,0.247,109.994,audio_features,5rLyYxZNzca00ENADO9m54,spotify:track:5rLyYxZNzca00ENADO9m54,https://api.spotify.com/v1/tracks/5rLyYxZNzca0...,https://api.spotify.com/v1/audio-analysis/5rLy...,196800,4,When I Grow Up


## We've got an artist's tracks and a user's tracks. Lets compare!

This comparison is the same as it was in the earlier function. [Refer here for a detailed explanation of what is happening](#artist_reference)

In [47]:
from sklearn.metrics.pairwise import pairwise_distances, cosine_distances, cosine_similarity
pd.set_option("display.precision", 14)

artist_track_names = df_artist['track_name']
user_track_names = df_user['track_name']

df_numeric = df_artist.loc[:, df_artist.columns[:11]]
user_df_numeric = df_user.loc[:, df_user.columns[:11]]

recs = pairwise_distances(df_numeric, user_df_numeric, metric='cosine')
rec_df = pd.DataFrame(recs, columns=user_track_names, index=artist_track_names)    
rec_df = 1 - rec_df
rec_df.drop_duplicates(inplace=True)

song_list = []

for user_track in rec_df.columns:
    max_cosine = rec_df[user_track].max()
    song_name = rec_df[rec_df[user_track] == max_cosine].index[0]
    song_uri = list(df_artist.loc[df_artist['track_name'] == song_name, 'uri'])[0]
    #print(f'{user_track}------ closest to ---------> {song_name}')
    song_list.append((song_name, song_uri))

uri_list = []
[uri_list.append(track[1]) for track in song_list if track[1] not in uri_list]

song_list

[('Monster (Intro)', 'spotify:track:1FhEnu3jv0C8xmmRa5NaDO'),
 ('Rap Is Dead', 'spotify:track:6l3A9Ima9k7CJcYW9blLHr'),
 ('ArrowThroughSkinOutOfBlueSky', 'spotify:track:5z4wkMNcAfuP1UepmQqnTS'),
 ("Stuntin' Like A…", 'spotify:track:5fz3Tk9DSeU6bZ9DGOU90s'),
 ('All 4 U', 'spotify:track:6AZCWuUUX2sENiv3LQx9Hv'),
 ('Greatness', 'spotify:track:6MmtmsmhXj6feevl2Kch2U'),
 ('In My City', 'spotify:track:7CkNqgLg1bqHx3VDuYKajn'),
 ('Ready Set Go Remix', 'spotify:track:5NZk1lwVltQgTjAUMeyPji'),
 ('Swimming', 'spotify:track:5TAvKTuQJWUKlUB28rIMBV'),
 ('My Money', 'spotify:track:5XHMYz8T6NMU6HMY6Uuwlv'),
 ('So Fly', 'spotify:track:2DAtzxIPYBS0VufIHQf4Sn'),
 ('Players Lullaby (feat. Twista & Rock D The Legend)',
  'spotify:track:5ZyrhE0HVHpqsPv5tIPJwm'),
 ('Intro', 'spotify:track:4qqo3We57x7IVNbIDIqi8r'),
 ("I'm the Shit!!!", 'spotify:track:1nxJCRe6QHzqpVG6s2curN'),
 ('Players Lullaby', 'spotify:track:3GDoiGO9UYGpfJ3COskb1r'),
 ("I'mma Fool Wit It", 'spotify:track:6aBBj5d0cXWvrytgSSpXqp'),
 ('Willi

In [48]:
for uri in uri_list:
    print(uri)

spotify:track:1FhEnu3jv0C8xmmRa5NaDO
spotify:track:6l3A9Ima9k7CJcYW9blLHr
spotify:track:5z4wkMNcAfuP1UepmQqnTS
spotify:track:5fz3Tk9DSeU6bZ9DGOU90s
spotify:track:6AZCWuUUX2sENiv3LQx9Hv
spotify:track:6MmtmsmhXj6feevl2Kch2U
spotify:track:7CkNqgLg1bqHx3VDuYKajn
spotify:track:5NZk1lwVltQgTjAUMeyPji
spotify:track:5TAvKTuQJWUKlUB28rIMBV
spotify:track:5XHMYz8T6NMU6HMY6Uuwlv
spotify:track:2DAtzxIPYBS0VufIHQf4Sn
spotify:track:5ZyrhE0HVHpqsPv5tIPJwm
spotify:track:4qqo3We57x7IVNbIDIqi8r
spotify:track:1nxJCRe6QHzqpVG6s2curN
spotify:track:3GDoiGO9UYGpfJ3COskb1r
spotify:track:6aBBj5d0cXWvrytgSSpXqp
spotify:track:1KiXrqjasckljFNL1uvCNy
spotify:track:2leqPZQH0lXdRpQWIVLGlG
spotify:track:779IcCL3iyv7WmeR1KX1Kw


### This is all of the code above compiled in a function

In [49]:
def compare_songs(artist_df, user_df):
    '''
    Takes in a dataframe of artist tracks and audio feautes, and a dataframe of user top tracks. 
    Returns tracks in artist_df that are most alligned with user tracks.
    '''
    
    band_track_names = artist_df['track_name']
    user_track_names = user_df['track_name']
    
    df_numeric = artist_df.loc[:, artist_df.columns[:11]]
    user_df_numeric = user_df.loc[:, user_df.columns[:11]]
    
    recs = pairwise_distances(df_numeric, user_df_numeric, metric='cosine')
    rec_df = pd.DataFrame(recs, columns=user_track_names, index=band_track_names)    
    rec_df = 1 - rec_df
    rec_df.drop_duplicates(inplace=True)
    
    song_list = []
    
    for user_track in rec_df.columns:
        max_cosine = rec_df[user_track].max()
        song_name = rec_df[rec_df[user_track] == max_cosine].index[0]
        song_uri = list(artist_df.loc[artist_df['track_name'] == song_name, 'uri'])[0]
        #print(f'{user_track}------ closest to ---------> {song_name}')
        song_list.append((song_name, song_uri))
        
    uri_list = []
    [uri_list.append(track[1]) for track in song_list if track[1] not in uri_list]
    
    return uri_list

### Lets test out all three functions!
- `get_artist_tracks`
- `get_user_tracks`
- `compare_songs`

In [50]:
glass_animals = 'spotify:artist:4yvcSjfu4PC0CYQyLy4wSq'

In [51]:
artist_tracks = get_artist_tracks(spotify, artists = glass_animals )
user_tracks = get_user_tracks(spotify, limit=12, time_range='long_term')
song_list = compare_songs(artist_tracks, user_tracks)
song_list

['spotify:track:3mRBPZbnwsmddN0JH5BNVl',
 'spotify:track:29hdc5G2JCbWMfuESdrlkK',
 'spotify:track:3ZMXLTcves58BnHyLytiTp',
 'spotify:track:2zZ9AeS6VQzXQsfzs1HW36',
 'spotify:track:2jwETvcqjt93Csfrz84qtF',
 'spotify:track:6XADhsXM5S9Ck89bbMMTGw',
 'spotify:track:5MwthEwpoFUVayAlhjoBs2',
 'spotify:track:2Svizy2uKSW0YyuffAX78U',
 'spotify:track:7t3MxxnVUOEkmdvEWxXSaH',
 'spotify:track:05rgpw5CeSege0heFDGSqA',
 'spotify:track:65PsOtxOpgaGvjztMGzoj2',
 'spotify:track:67QuYzEv1pkYrP6QyugCDZ']

**AWESOME!** We now have some cool songs to listen to by [Glass Animals](https://open.spotify.com/artist/4yvcSjfu4PC0CYQyLy4wSq)

In [5]:
def artist_based_recs(spotify, artists, n_tracks=15, user_time_range='medium_term'):
    '''
    Combo function
    '''
    
    ########### ARTIST TRACKS ###########

    uris = []
    album_uris = []
    df_list = []
    artist_names = []
    
    # Checking if user input artists as a list. 
    if type(artists) != list:
        # Rectifying if user did not input list
        artists = [artists]
    
    # Iterate through artists
    for artist in artists:
        artist_names.append(spotify.artist(artist)['name'])
        # Fetches the albums JSONs
        albums = spotify.artist_albums(artist, limit=25)
        # Iterating through the albums and appending the artist URIs
        for album in albums['items']:
            album_uris.append(album['uri'])
    
    # Iterate through the album URIs
    for uri in album_uris:
        # Fetching album JSON
        album = spotify.album(uri)
        # Fetching album name
        album_name = album['name']
        # Fetching all artists on album
        album_artist = ', '.join([artist['name'] for artist in album['artists']])
            
        # Fetching all of the track URIs on the album
        tracks = spotify.album_tracks(uri)
        # Listing track URIs and names
        track_uris = [track['uri'] for track in tracks['items']]
        track_names = [track['name'] for track in tracks['items']]
        
        # Fetching audio features from the track URIs and assigning track name, album name and album artist
        audio_features_df = pd.DataFrame.from_dict(spotify.audio_features(track_uris))
        audio_features_df['track_name'] = track_names
        audio_features_df['album_name'] = album_name
        audio_features_df['artist_name'] = album_artist
        
        # Appending dataframe to list to concatenate
        df_list.append(audio_features_df)
    
    # Concatenating dataframes, resetting index and dropping duplicates
    artist_df = pd.concat(df_list)
    artist_df.reset_index(inplace=True, drop=True)
    artist_df.drop_duplicates()
    
    # This is a string that will be used in our filter that includes only rows that have 1 or more of the artists in the listed artists
    filter_artists = '|'.join(artist_names)
    # Filtering our dataframe using our string
    artist_df = artist_df[artist_df['artist_name'].str.contains(filter_artists)]
    
    ########### USER TRACKS ###########
    
    time_ranges = ['short_term', 'medium_term', 'long_term']
    time_ranges.remove(user_time_range)
    
    items = spotify.current_user_top_tracks(limit=n_tracks, time_range=user_time_range)

    
    user_uris = [item['uri'] for item in items['items']]
    names = [item['name'] for item in items['items']]

    history = {'no_listening_history': False, 'insufficient_history':False}
    
    for time_range in time_ranges:
        
        if len(user_uris) >= n_tracks:
            break
        items = spotify.current_user_top_tracks(limit=n_tracks, time_range=time_range)
        
        user_uris += [item['uri'] for item in items['items']]
        extra_names = [item['name'] for item in items['items']]
        names += extra_names
        history['insufficient_history'] = True
   
    
    if len(user_uris) == 0:
        uris = []
        history['no_listening_history'] = True
        for artist in artists:
            for track in spotify.artist_top_tracks(artist)['tracks']:
                uris.append(track['uri'])
        return (uris, history)
    
    elif len(user_uris) > 0 and len(user_uris) < n_tracks:
        pass  
    
    else:  
        user_uris[:n_tracks]
        names[:n_tracks]
        
    
    audio_features =  spotify.audio_features(user_uris)

    user_df = pd.DataFrame.from_dict(audio_features)
    
    # Setting track names
    user_df['track_name'] = names
    
    ########### COMPARE TRACKS ###########

    artist_track_names = artist_df['track_name']
    user_track_names = user_df['track_name']
    
    artist_df_numeric = artist_df.loc[:, artist_df.columns[:11]]
    user_df_numeric = user_df.loc[:, user_df.columns[:11]]
    
    recs = pairwise_distances(artist_df_numeric, user_df_numeric, metric='cosine')
    rec_df = pd.DataFrame(recs, columns=user_track_names, index=artist_track_names)    
    rec_df = 1 - rec_df
    rec_df.drop_duplicates(inplace=True)
    
    song_list = []
    
    for user_track in rec_df.columns:
        max_cosine = rec_df[user_track].max()
        song_name = rec_df[rec_df[user_track] == max_cosine].index[0]
        song_uri = list(artist_df.loc[artist_df['track_name'] == song_name, 'uri'])[0]
        #print(f'{user_track}------ closest to ---------> {song_name}')
        song_list.append((song_name, song_uri))
        
    uri_list = []
    [uri_list.append(track[1]) for track in song_list if track[1] not in uri_list]
    
    return uri_list, history

In [7]:
artist_based_recs(spotify, glass_animals, n_tracks=45)

AttributeError: 'NoneType' object has no attribute 'keys'