Instructions

To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [1]:
# import libraries
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd

In [2]:
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id="da55f67d57504befa94a4c8a982c7f77",
                                                           client_secret="3b08e6c5c38744eeb726342990f814bc"))

In [41]:
# found this function on the internet and changed it so it can return more than 100 results at a time

def playlist_df(playlist_id):
    
    # Create empty dataframe
    playlist_features_list = ["artist","album","track_name",  "track_id","danceability","energy","key","loudness","mode", "speechiness","instrumentalness","liveness","valence","tempo", "duration_ms","time_signature"]
    
    playlist_df = pd.DataFrame(columns = playlist_features_list)
    
    # Loop through every track in the playlist, extract features and append the features to the playlist df

    results = sp.user_playlist_tracks("spotify", playlist_id)
    tracks = results['items']

    for oset in range(100,results['total'],100):
        results = sp.user_playlist_tracks("spotify", playlist_id, offset=oset)
        tracks += results['items']

   
    for track in tracks:
            # Create empty dict
        playlist_features = {}
            # Get metadata
        playlist_features["artist"] = track["track"]["album"]["artists"][0]["name"]
        playlist_features["album"] = track["track"]["album"]["name"]
        playlist_features["track_name"] = track["track"]["name"]
        playlist_features["track_id"] = track["track"]["id"]
        
        # Get audio features
        audio_features = sp.audio_features(playlist_features["track_id"])[0]
        for feature in playlist_features_list[4:]:
            playlist_features[feature] = audio_features[feature]
            
            # Concat the dfs
        track_df = pd.DataFrame(playlist_features, index = [0])
        playlist_df = pd.concat([playlist_df, track_df], ignore_index = True)
        
    return playlist_df

In [43]:
# this is a playlist with 5000 songs

playlist_df("4rnleEAOdmFAbRcNCgZMpY") 

Unnamed: 0,artist,album,track_name,track_id,danceability,energy,key,loudness,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,Hozier,Hozier (Deluxe),Take Me To Church,7dS5EaCoMnN7DzlpT6aRn2,0.566,0.664,4,-5.303,0,0.0464,0,0.116,0.437,128.945,241688,4
1,Mike Posner,31 Minutes to Takeoff,Cooler Than Me - Single Mix,2V4bv1fNWfTcyRJKmej6Sj,0.768,0.820,7,-4.630,0,0.0474,0,0.689,0.625,129.965,213293,4
2,"Tyler, The Creator",Flower Boy,See You Again (feat. Kali Uchis),7KA4W4McWYRpgf0fWsJZWB,0.558,0.559,6,-9.222,1,0.0959,0.000007,0.109,0.620,78.558,180387,4
3,Bastille,Bad Blood,Pompeii,3gbBpTdY8lnQwqxNCcf795,0.679,0.715,9,-6.383,1,0.0407,0,0.271,0.571,127.435,214148,4
4,Shakira,"Oral Fixation, Vol. 2 (Expanded Edition)",Hips Don't Lie (feat. Wyclef Jean),3ZFTkvIE7kyPt6Nu3PEa7V,0.778,0.824,10,-5.892,0,0.0707,0,0.405,0.758,100.024,218093,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5290,MARINA,The Family Jewels,Hermit the Frog,4Zcz6saEkOII3PlXd9gN3o,0.609,0.679,0,-4.545,1,0.0312,0,0.199,0.487,122.034,215960,4
5291,Olivia Rodrigo,deja vu,deja vu,61KpQadow081I2AsbeLcsb,0.439,0.610,9,-7.236,1,0.1160,0.000011,0.341,0.172,181.088,215508,4
5292,BIA,FOR CERTAIN,WHOLE LOTTA MONEY,5yorXJWdBan1Vlh116ZtQ7,0.897,0.371,1,-5.019,1,0.3680,0,0.325,0.441,81.008,156005,4
5293,Ashnikko,DEMIDEVIL,Slumber Party (feat. Princess Nokia),11ZulcYY4lowvcQm4oe3VJ,0.964,0.398,11,-8.981,0,0.0795,0.000039,0.101,0.563,105.012,178405,4
