# Recommender with Spotify API Integration

## 1: Create Spotify Authorization Token

Due to Spotify API restrictions, you need to add SpotifyForDevelopers to your Spotify account to be able to run the code in this notebook. Since you can't "log in" per se from the notebook itself, creating this token allows you to validate your credentials and retrieve information from your Spotify account. 

### Steps

1. Go to https://developer.spotify.com/dashboard/ and log in (or sign up if you don't already have an account).  
2. Go to https://developer.spotify.com/console/post-playlists/ and press "Get Token".
3. Select the following checkboxes:  
    a. playlist-modify-public  
    b. playlist-modify-private  
    c. user-read-recently-played  
4. Press "Request Token" and "Agree". Now you should see a field value under **OAuth Token**. Copy this, as we'll need it later.  
5. Next, open Spotify. Open the drop-down next to your name in the top-right corner, and select **Account**.  
6. Copy the **Username** as we'll need it later.  
7. Open a terminal on your system. 
8. Enter the 2 following commands one by one in your terminal, replacing the <> with the variables you copied earlier. This basically allows you to validate your Spotify credentials without hard-coding it in this notebook's code cells. If you're on a Mac, replace the _SET_ keyword with _export_.  
**setx SPOTIFY_AUTHORIZATION_TOKEN \<yourOAuthToken\>  
setx SPOTIFY_USER_ID \<yourUsername\>**  

<img src="./terminal.JPG">

Restart your system for the changes to take effect. 

**Note: Spotify Access Tokens expire every hour, so you may need to go through the previous steps again if any of the code cells fail.**

In [1]:
#!pip install spotipy
import spotipy
from spotipy.oauth2 import SpotifyOAuth
import json
import os
import requests

# instantiate global variables
auth_token = os.environ.get("SPOTIFY_AUTHORIZATION_TOKEN") 
user_id = os.getenv("SPOTIFY_USER_ID")

# if the below lines returns None, please restart your system
print(auth_token)
print(user_id)

BQCsxBFMRAnS3vO6pMgE_71eNtIQCpSAiSasYTe6KWVo34Xe69dMbitJYTyB00SJ-W4yLaNpai_henGiI7OWmZ6T6ogyj8ZkFPZQHk3iAmmR4cVsv5kIiHuXhhFUz3PmrWUuQ5AvTaIgn0FbQGkHvTZTh8IYPjbKSEeuwz1NYNADPLecK6XUJT6pkRcOr1DoBJgnpottGOvr7w
krazyboy2571


In [2]:
def getAPIrequest(auth_token, url):
    """
    Function to place GET requests to the Spotify API.
    """
    response = requests.get(
            url,
            headers={
                "Content-Type": "application/json",
                "Authorization": f"Bearer {auth_token}"
            }
        )
    return response

def postAPIrequest(auth_token, url, data):
    response = requests.post(
           url,
           data=data,
           headers={
               "Content-Type": "application/json",
               "Authorization": f"Bearer {auth_token}"
           }
    )
    return response

In [3]:
def getLastPlayedSongs(numOfTracks):
    """
    Function to retrieve the last numOfTracks songs played by the user.
    """
    url = f"https://api.spotify.com/v1/me/player/recently-played?limit={numOfTracks}"
    response = getAPIrequest(auth_token, url)
    response_json = response.json()
    songs = []
    #print(json.dumps(response_json, indent=4))
    for song in response_json["items"]:
        songs.append(song)
    #print(songs)
    return songs

## 2: Retrieve User's Recently Played Songs
https://developer.spotify.com/console/get-recently-played/
 
JSON format is specified here. 

In [4]:
num = int(input("How many tracks would you like to visualize? "))
lastPlayed = getLastPlayedSongs(num)
print(f"\nHere are the last {num} tracks you listened to on Spotify:")
for index, track in enumerate(lastPlayed):
    print(f"\n {index+1}: {track['track']['name']}, {track['track']['artists'][0]['name']} ({track['track']['album']['release_date'][:4]})")

How many tracks would you like to visualize?  20



Here are the last 20 tracks you listened to on Spotify:

 1: Taro, alt-J (2012)

 2: Come and See Me (feat. Drake), PARTYNEXTDOOR (2016)

 3: Nice For What, Drake (2018)

 4: The Motto, Drake (2011)

 5: HIGHEST IN THE ROOM, Travis Scott (2019)

 6: 90210 (feat. Kacy Hill), Travis Scott (2015)

 7: DNA., Kendrick Lamar (2017)

 8: LOVE. FEAT. ZACARI., Kendrick Lamar (2017)

 9: All The Stars (with SZA), Kendrick Lamar (2018)

 10: 911 / Mr. Lonely (feat. Frank Ocean & Steve Lacy), Tyler, The Creator (2017)

 11: Where'd You Go (feat. Holly Brook & Jonah Matranga), Fort Minor (2005)

 12: Sapphire, Alcest (2019)

 13: Flying Whales, Gojira (2005)

 14: Needled 24 / 7, Children Of Bodom (2012)

 15: Ghost Shaped People, Lamb of God (2021)

 16: Memento Mori, Lamb of God (2020)

 17: Get You (feat. Kali Uchis), Daniel Caesar (2017)

 18: See You Again (feat. Kali Uchis), Tyler, The Creator (2017)

 19: After The Storm (feat. Tyler, The Creator & Bootsy Collins), Kali Uchis (2018)

 20: 1

## 3: Get User's Preferences

In [5]:
ref_tracks = input("\nEnter a list of up to 5 tracks to be used as seed tracks: ") # enter space separated number of the track
ref_tracks = ref_tracks.split()
seed_tracks = [lastPlayed[int(i)-1] for i in ref_tracks]
# print(seed_tracks)


Enter a list of up to 5 tracks to be used as seed tracks:  13


## 4: Data Preprocessing
To convert the data to model-friendly input.

In [6]:
def get_song_info(song_list):
    """
    Function to get the name and release year of seed tracks. 
    """
    seeds = []
    song_dict = {}
    for item in range(len(song_list)):
        #x = {song_list[item]['track']['name']:song_list[item]['track']['album']['release_date'][:4]}
        #song_dict.update(x)
        song = {'name': song_list[item]['track']['name'], 'artists': str([song_list[item]['track']['artists'][0]['name']]) }
        seeds.append(song)
        
    #print(song_dict)
    return seeds

get_song_info(seed_tracks)

[{'name': 'Flying Whales', 'artists': "['Gojira']"}]

## 5: Model Building and Training

In [7]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import plotly.express as px
%matplotlib inline

In [8]:
song_data = pd.read_csv('./data/data.csv')

In [9]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
song_cluster_pipeline = Pipeline([('scaler', StandardScaler()), 
                                  ('kmeans', KMeans(n_clusters=20, 
                                   verbose=2, n_jobs=4))],verbose=True)
X = song_data.select_dtypes(np.number)
number_cols = list(X.columns)
song_cluster_pipeline.fit(X)
song_cluster_labels = song_cluster_pipeline.predict(X)
song_data['cluster_label'] = song_cluster_labels

[Pipeline] ............ (step 1 of 2) Processing scaler, total=   0.1s




Initialization complete
Iteration 0, inertia 1579386.3350916202
Iteration 1, inertia 1184230.6483199457
Iteration 2, inertia 1151739.162521511
Iteration 3, inertia 1141284.1412710713
Iteration 4, inertia 1135781.7978405622
Iteration 5, inertia 1131489.3429062297
Iteration 6, inertia 1127466.6412947373
Iteration 7, inertia 1123182.3929168885
Iteration 8, inertia 1119311.3082392218
Iteration 9, inertia 1116385.4499427048
Iteration 10, inertia 1114352.822523716
Iteration 11, inertia 1112904.3435049758
Iteration 12, inertia 1111791.262796243
Iteration 13, inertia 1110741.929731599
Iteration 14, inertia 1109420.3821113186
Iteration 15, inertia 1107311.1184089135
Iteration 16, inertia 1104685.8541962982
Iteration 17, inertia 1103008.834250387
Iteration 18, inertia 1102066.8933539442
Iteration 19, inertia 1101459.7055911
Iteration 20, inertia 1101030.426651451
Iteration 21, inertia 1100876.5419930762
Iteration 22, inertia 1100800.8165147726
Iteration 23, inertia 1100749.836137156
Iteration 24

## 6: Making Recommendations

In [10]:
from collections import defaultdict
from scipy.spatial.distance import cdist
import difflib
    
def get_song_data(song, song_data):
    
    """
    Gets the song data for a specific song. The song argument takes the form of a dictionary with 
    key-value pairs for the name and release year of the song.
    """
    
    try:
        song_info = song_data[(song_data['name'] == song['name']) 
                            & (song_data['artists'] == song['artists'])].iloc[0]
        return song_info
    except IndexError:
        return None
        

def get_mean_vector(song_list, song_data):
  
    """
    Gets the mean vector for a list of songs.
    """
    
    song_vectors = []
    for song in song_list:
        song_info = get_song_data(song, song_data)
        if song_info is None:
            print('Warning: {} does not exist in database'.format(song['name']))
            continue
        song_vector = song_info[number_cols].values
        song_vectors.append(song_vector)  
    song_matrix = np.array(list(song_vectors))
    return np.mean(song_matrix, axis=0)

def flatten_dict_list(dict_list):
   
    """
    Utility function for flattening a list of dictionaries.
    """
    flattened_dict = defaultdict()
    for key in dict_list[0].keys():
        flattened_dict[key] = []
    for dictionary in dict_list:
        for key, value in dictionary.items():
            flattened_dict[key].append(value)
    return flattened_dict

In [11]:
def recommend_songs(song_list, song_data, n_songs=12):
  
    """
    Recommends songs based on a list of previous songs that a user has listened to.
    """
    
    metadata_cols = ['name', 'year', 'artists', 'id']
    song_dict = flatten_dict_list(song_list)
    
    song_center = get_mean_vector(song_list, song_data)
    scaler = song_cluster_pipeline.steps[0][1]
    scaled_data = scaler.transform(song_data[number_cols])
    scaled_song_center = scaler.transform(song_center.reshape(1, -1))
    distances = cdist(scaled_song_center, scaled_data, 'cosine')
    index = list(np.argsort(distances)[:, :n_songs][0])
    
    rec_songs = song_data.iloc[index]
    rec_songs = rec_songs[~rec_songs['name'].isin(song_dict['name'])]
    return rec_songs[metadata_cols].to_dict(orient='records')[1:]

In [12]:
recommended = recommend_songs(get_song_info(seed_tracks), song_data)
print(recommended)

[{'name': 'Spirit Crusher', 'year': 1998, 'artists': "['Death']", 'id': '3sSonVXqDeoEFj2lM7mpYT'}, {'name': 'Lay Your Hands On Me', 'year': 1988, 'artists': "['Bon Jovi']", 'id': '4oNktvBDV8PNZqZh6MgwZ3'}, {'name': 'Seed', 'year': 1998, 'artists': "['Korn']", 'id': '5U1lKuwbhpAC7N5JLHx0H0'}, {'name': 'Lay Your Hands On Me', 'year': 1988, 'artists': "['Bon Jovi']", 'id': '4jJRjBWL3tH50LRFA8jckV'}, {'name': 'The Well Of Souls', 'year': 1987, 'artists': "['Candlemass']", 'id': '1Z1GPw6RpIN7HvRysZ6RjC'}, {'name': 'Breakdown', 'year': 1991, 'artists': '["Guns N\' Roses"]', 'id': '77bCgbwirSDVeBGHxg00A7'}, {'name': 'Without Judgement', 'year': 1995, 'artists': "['Death']", 'id': '0QHAeWDMHjkCb1VVHAkXxY'}, {'name': 'Time', 'year': 1994, 'artists': "['Hootie & The Blowfish']", 'id': '1N0UflnZhn7xOkrN2l8zxx'}, {'name': 'Bleeding Me', 'year': 1996, 'artists': "['Metallica']", 'id': '2IenDzi8pYn8c4WRteZai0'}, {'name': 'Flesh and the Power It Holds', 'year': 1998, 'artists': "['Death']", 'id': '0G

## 7: Creating a Playlist

In [14]:
playlist_name = input("Enter a playlist name:")
playlist_description = "We hope you enjoy the music we curated for you!"

Enter a playlist name: test


In [15]:
def createPlaylist(name=playlist_name, description=playlist_description, user_id=user_id):
    data = json.dumps({
            "name": name,
            "description": description,
            "public": True
        })
    url = f"https://api.spotify.com/v1/users/{user_id}/playlists"
    response = postAPIrequest(auth_token, url, data)
    response_json = response.json()
    playlist_id = response_json["id"]
    return playlist_id

In [16]:
def searchForTrack(track):
    """
    This function modifies the song's name and artist to make it work with the Spotify API.
    """
    url = f"https://api.spotify.com/v1/tracks/{track['id']}"
    response = getAPIrequest(auth_token, url)
    response_json = response.json()
    track_uri = response_json["uri"]
    return track_uri
        

In [17]:
def addSongsToPlaylist(playlist_id, tracks):
    track_uris = [searchForTrack(track) for track in tracks]
    #print(track_uris)
    data = json.dumps(track_uris)
    url = f"https://api.spotify.com/v1/playlists/{playlist_id}/tracks"
    response = postAPIrequest(auth_token, url, data)
    response_json = response.json()
    return response_json

In [18]:
addSongsToPlaylist(createPlaylist(), recommended)

{'snapshot_id': 'Myw0ZjUyYjdjYjBhNDliOWUxYjY5NTcxNjUyMWViM2IxYWQ5YjQ2YTc2'}