# API wrappers - Create your collection of songs & audio features
## Instructions

To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [170]:
import pandas as pd
from pandas import json_normalize

In [171]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

In [172]:
secrets_file = open("secrets.txt","r")
string = secrets_file.read()

In [173]:
secrets_dict={}
for line in string.split('\n'):
    if len(line) > 0:
        #print(line.split(':'))
        secrets_dict[line.split(':')[0]]=line.split(':')[1].strip()

In [174]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

#Initialize SpotiPy with user credentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['clientid'],
                                                           client_secret=secrets_dict['clientsecret']))

In [175]:
playlist = sp.user_playlist_tracks("spotify", "1hMzceeWw7QiI6vaBkcEJO")

In [176]:
# this one is biiiig!
playlist["total"] 

10000

In [177]:
len(playlist["items"])

100

In [178]:
def get_categories():
    return sp.categories(limit = 50)['categories']['items'] #getting the categories

In [179]:
categories = get_categories() # store categories in a list of dictionaries
len(categories)

47

In [180]:
for i in range(len(categories)):
    print(f"{i}: {categories[i]['id']}")

0: toplists
1: 0JQ5DAqbMKFEC4WFtoNRpw
2: 0JQ5DAqbMKFQ00XGBls6ym
3: 0JQ5DAqbMKFzHmL4tf05da
4: 0JQ5DAqbMKFHOzuVTgTizF
5: 0JQ5DAqbMKFDXXwE9BDJAr
6: 0JQ5DAqbMKFFzDl7qN9Apr
7: 0JQ5DAqbMKFOOxftoKZxod
8: 0JQ5DAqbMKFImHYGo3eTSg
9: 0JQ5DAqbMKFPw634sFwguI
10: 0JQ5DAqbMKFA6SOHvT3gck
11: 0JQ5DAqbMKFIRybaNTYXXy
12: 0JQ5DAqbMKFCuoRTxhYWow
13: 0JQ5DAqbMKFAXlCG6QvYQ4
14: 0JQ5DAqbMKFCWjUTdzaG0e
15: 0JQ5DAqbMKFFtlLYUHv8bT
16: 0JQ5DAqbMKFIVNxQgRNSg0
17: 0JQ5DAqbMKFDkd668ypn6O
18: 0JQ5DAqbMKFx0uLQR2okcc
19: 0JQ5DAqbMKFFoimhOqWzLB
20: 0JQ5DAqbMKFEZPnFQSFB1T
21: 0JQ5DAqbMKFJKoGyUMo2hE
22: 0JQ5DAqbMKFCbimwdOYlsl
23: 0JQ5DAqbMKFF9bY76LXmfI
24: 0JQ5DAqbMKFAUsdyVjCQuL
25: 0JQ5DAqbMKFPrEiAOxgac3
26: 0JQ5DAqbMKFRY5ok2pxXJ0
27: 0JQ5DAqbMKFKLfwjuJMoNC
28: 0JQ5DAqbMKFLb2EqgLtpjC
29: 0JQ5DAqbMKFGvOw3O4nLAf
30: 0JQ5DAqbMKFEOEBCABAxo9
31: 0JQ5DAqbMKFAJ5xb0fwo9m
32: 0JQ5DAqbMKFIpEuaCnimBj
33: 0JQ5DAqbMKFAjfauKLOZiv
34: 0JQ5DAqbMKFObNLOHydSW8
35: 0JQ5DAqbMKFxXaXKP7zcDp
36: 0JQ5DAqbMKFCfObibaOZbv
37: 0JQ5DAqbMKFAQy4HL4XU2

In [189]:
def get_top_playlists_by_genre(genre_id): 
    return sp.category_playlists(category_id=genre_id, limit=15)['playlists']['items']

In [190]:
from tqdm import tqdm
import time 

top_playlists_by_genre = {}
for category in tqdm(categories, desc = 'Fetching top playlists for each category.'):
    try:
        top_playlists_by_genre[category['name']] = get_top_playlists_by_genre(category['id'])
        time.sleep(0.5)  # Introducing a half-second delay between each call
    except spotipy.SpotifyException as e:
        print(f"Error fetching playlists for category {category['name']}: {e}")

Fetching top playlists for each category.:  91%|███████████████████████████████████▋   | 43/47 [00:27<00:02,  1.61it/s]HTTP Error for GET to https://api.spotify.com/v1/browse/categories/funk/playlists with Params: {'country': None, 'limit': 15, 'offset': 0} returned 404 due to Specified id doesn't exist


Error fetching playlists for category Funk: http status: 404, code:-1 - https://api.spotify.com/v1/browse/categories/funk/playlists?limit=15&offset=0:
 Specified id doesn't exist, reason: None


Fetching top playlists for each category.: 100%|███████████████████████████████████████| 47/47 [00:29<00:00,  1.58it/s]


In [191]:
len(top_playlists_by_genre)

list = top_playlists_by_genre.keys()
list

dict_keys(['Top Lists', 'Pop', 'Hip-Hop', 'Mood', 'Dance/Electronic', 'Rock', 'Chill', 'RADAR', 'Fresh Finds', 'EQUAL', 'Party', 'In the car', 'Sleep', 'Workout', 'Indie', 'Alternative', 'Decades', 'Metal', 'At Home', 'Kids & Family', 'R&B', 'Reggae', 'Focus', 'Frequency', 'Romance', 'Classical', 'Cooking & Dining', 'Country', 'Wellness', 'K-Pop', 'Netflix', 'Jazz', 'Soul', 'Punk', 'Caribbean', 'Latin', 'Gaming', 'Travel', 'Instrumental', 'Ambient', 'Tastemakers', 'Trending', 'Arab', 'Folk & Acoustic', 'Blues', 'Word'])

In [192]:
for i in list:
    print(f"{i} ({len(top_playlists_by_genre[i])})")
    if top_playlists_by_genre[i] is not None:
        for e in range(len(top_playlists_by_genre[i])):
            if top_playlists_by_genre[i][e] is not None:
                print(f""" -{top_playlists_by_genre[i][e]['name']}""")


Top Lists (13)
 -Hot Hits Deutschland
 -Main Stage
 -Modus Mio
 -New Music Friday Deutschland
 -mint
 -RapCaviar
 -Rock This
 -Are & Be
 -Fresh Finds
 -Top 50 – Deutschland
 -Top 50 – Global
 -Viral 50 – Global
 -Viral 50 – Deutschland
Pop (15)
 -Hot Pink
 -Barbie Official Playlist
 -Throwback Jams
 -Pop Remix
 -Chill Pop
 -Romcom
 -after school club 🪄
 -y2k
 -Summer Pop
 -Retro Prom
 -Disco Decadence
 -2000s Pop
 -hot girl walk
 -neon cowgirl
 -DISCOLAND
Hip-Hop (15)
 -Modus Mio
 -Deutschrap Brandneu
 -Shisha Club
 -Deutschrap Royal
 -DREAMCHASERS
 -Rap in deep
 -Rap oder Liebe
 -Deutschrap Untergrund
 -Deutschrap Party
 -Deutschrap Asphalt
 -Deutschrap: Die Klassiker
 -Deutschrap Primetime
 -Straßenrap Classics
 -Deutschrap: Die Anfänge
 -word!
Mood (15)
 -me right now
 -Stimmungsmacher
 -Herbst Chillout
 -Happy Dance
 -Herbstgefühle
 -Top of the Morning
 -Feelgood Indie
 -Happy Hits!
 -Kaffeehausmusik
 -Feelgood Pop
 -Happy Beats
 -Zuhause
 -Jazz Vibes
 -Sanfter Pop
 -Van Life
Dance

In [185]:
# list_of_playlist_id = []
# for playlist in top_playlists_by_genre:
#     try: 
#         list_of_playlist_id = list_of_playlist_id + playlist[]

In [193]:
test = sp.user_playlist_tracks('spotify', '37i9dQZF1DX7ZUug1ANKRP' )

In [187]:
# all_playlist_ids = []
# for genre in top_playlists_by_genre.values():
#     for playlist in genre:
#         all_playlist_ids.append(playlist['id'])

In [194]:
all_playlist_ids = [playlist['id'] for genre in top_playlists_by_genre.values() for playlist in genre]
len(all_playlist_ids)

676

In [None]:
len(test)

In [None]:
test["items"][0]["track"].keys()

In [195]:
from tqdm import tqdm
from time import sleep

def get_all_tracks(playlist_ids):
    all_tracks = []
    for playlist_id in tqdm(playlist_ids):
        tracks = get_tracks_from_playlist(playlist_id)
        all_tracks.extend(tracks)
#         all_tracks.extend([extract_track_info(track) for track in tracks])
    return all_tracks

def get_tracks_from_playlist(playlist_id):
    offset = 0
    tracks = []
    while True:
        response = sp.playlist_tracks(playlist_id, offset=offset)
        tracks.extend(response['items'])
        if response['next'] is None:
            break
        offset += 100
        time.sleep(0.5)
    return tracks

In [196]:
all_tracks = get_all_tracks(all_playlist_ids)

100%|████████████████████████████████████████████████████████████████████████████████| 676/676 [06:31<00:00,  1.73it/s]


In [198]:
len(all_tracks)

73251

In [197]:
all_tracks

KeyboardInterrupt: 

In [208]:
def nullTable(dataset):
    round(dataset.isna().sum()/len(dataset),4)*100  # shows the percentage of null values in a column
    nulls_df = pd.DataFrame(round(dataset.isna().sum()/len(dataset),4)*100)
    nulls_df
    nulls_df = nulls_df.reset_index()
    nulls_df
    nulls_df.columns = ['header_name', 'percent_nulls']
    display(nulls_df)
    display(dataset.isna().sum())

In [212]:
tracks = json_normalize(all_tracks)
len(tracks)

73251

In [226]:
tracks.to_pickle('all_tracks_from_top_playlists.pkl')

In [None]:
nullTable(tracks)

In [214]:
tracks.dropna(subset='track.artists', inplace = True)

In [211]:
# tracks = tracks.loc(pd.notna(tracks['track.artists']))

TypeError: unhashable type: 'Series'

In [215]:
nullTable(tracks)


Unnamed: 0,header_name,percent_nulls
0,added_at,0.0
1,is_local,0.0
2,primary_color,99.89
3,added_by.external_urls.spotify,0.0
4,added_by.href,0.0
5,added_by.id,0.0
6,added_by.type,0.0
7,added_by.uri,0.0
8,track.album.album_type,0.0
9,track.album.artists,0.0


added_at                                  0
is_local                                  0
primary_color                         73147
added_by.external_urls.spotify            0
added_by.href                             0
added_by.id                               0
added_by.type                             0
added_by.uri                              0
track.album.album_type                    0
track.album.artists                       0
track.album.available_markets             0
track.album.external_urls.spotify         0
track.album.href                          0
track.album.id                            0
track.album.images                        0
track.album.name                          0
track.album.release_date                  0
track.album.release_date_precision        0
track.album.total_tracks                  0
track.album.type                          0
track.album.uri                           0
track.artists                             0
track.available_markets         

In [216]:
def list_to_dict(x):
    return {i: x[i] for i in range(len(x))}

In [217]:
tracks['artist_dict'] = tracks['track.artists'].apply(list_to_dict)

In [218]:
tracks['track.id']

0        31nfdEooLEq7dn3UMcIeB5
1        0gMTEHzNIyvxikxyUFFJxO
2        2IGMVunIBsBLtEQyoI1Mu7
3        5YWu6dt50Nieq8qi572RF8
4        59NraMJsLaMCVtwXTSia8i
                  ...          
73246    1XQ8bXmDNU7VD4IvAsq6my
73247    5O15B1osF9aPtpU2sQvkqB
73248    4KL1ztWsn27X4yFWmpaS9t
73249    5ymvXhdhvCpu1c6vIoKrBr
73250    14RTzCzFizaOdDgxfUsPs2
Name: track.id, Length: 73224, dtype: object

In [219]:
tracks.columns

Index(['added_at', 'is_local', 'primary_color',
       'added_by.external_urls.spotify', 'added_by.href', 'added_by.id',
       'added_by.type', 'added_by.uri', 'track.album.album_type',
       'track.album.artists', 'track.album.available_markets',
       'track.album.external_urls.spotify', 'track.album.href',
       'track.album.id', 'track.album.images', 'track.album.name',
       'track.album.release_date', 'track.album.release_date_precision',
       'track.album.total_tracks', 'track.album.type', 'track.album.uri',
       'track.artists', 'track.available_markets', 'track.disc_number',
       'track.duration_ms', 'track.episode', 'track.explicit',
       'track.external_ids.isrc', 'track.external_urls.spotify', 'track.href',
       'track.id', 'track.is_local', 'track.name', 'track.popularity',
       'track.preview_url', 'track.track', 'track.track_number', 'track.type',
       'track.uri', 'video_thumbnail.url', 'track', 'video_thumbnail',
       'artist_dict'],
      dtype='o

In [221]:
chunks = [(i, i+100) for i in range(0, len(tracks), 100)]
audio_features_list = []
for chunk in tqdm(chunks):
    id_list100 = tracks['track.id'][chunk[0]:chunk[1]]
    audio_features_list = audio_features_list + sp.audio_features(id_list100)
    sleep(randint(1,3000)/1000)
len(audio_features_list)

100%|████████████████████████████████████████████████████████████████████████████████| 733/733 [20:42<00:00,  1.69s/it]


73224

In [222]:
audio_feature_df = json_normalize(audio_features_list )

In [223]:
audio_feature_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.634,0.824,2.0,-3.394,0.0,0.0470,0.0908,0.071100,0.1190,0.371,137.959,audio_features,31nfdEooLEq7dn3UMcIeB5,spotify:track:31nfdEooLEq7dn3UMcIeB5,https://api.spotify.com/v1/tracks/31nfdEooLEq7...,https://api.spotify.com/v1/audio-analysis/31nf...,178156.0,4.0
1,0.756,0.647,6.0,-5.488,0.0,0.1590,0.2790,0.000005,0.1020,0.464,134.995,audio_features,0gMTEHzNIyvxikxyUFFJxO,spotify:track:0gMTEHzNIyvxikxyUFFJxO,https://api.spotify.com/v1/tracks/0gMTEHzNIyvx...,https://api.spotify.com/v1/audio-analysis/0gMT...,195466.0,4.0
2,0.868,0.538,5.0,-8.603,1.0,0.1740,0.2690,0.000003,0.0901,0.732,99.968,audio_features,2IGMVunIBsBLtEQyoI1Mu7,spotify:track:2IGMVunIBsBLtEQyoI1Mu7,https://api.spotify.com/v1/tracks/2IGMVunIBsBL...,https://api.spotify.com/v1/audio-analysis/2IGM...,231750.0,4.0
3,0.767,0.539,4.0,-8.518,0.0,0.0629,0.3640,0.039700,0.0772,0.368,128.055,audio_features,5YWu6dt50Nieq8qi572RF8,spotify:track:5YWu6dt50Nieq8qi572RF8,https://api.spotify.com/v1/tracks/5YWu6dt50Nie...,https://api.spotify.com/v1/audio-analysis/5YWu...,131524.0,4.0
4,0.638,0.717,8.0,-5.804,1.0,0.0375,0.0010,0.000002,0.1130,0.422,141.904,audio_features,59NraMJsLaMCVtwXTSia8i,spotify:track:59NraMJsLaMCVtwXTSia8i,https://api.spotify.com/v1/tracks/59NraMJsLaMC...,https://api.spotify.com/v1/audio-analysis/59Nr...,132359.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73219,0.633,0.759,0.0,-11.833,1.0,0.0368,0.1220,0.424000,0.1320,0.926,110.034,audio_features,1XQ8bXmDNU7VD4IvAsq6my,spotify:track:1XQ8bXmDNU7VD4IvAsq6my,https://api.spotify.com/v1/tracks/1XQ8bXmDNU7V...,https://api.spotify.com/v1/audio-analysis/1XQ8...,163533.0,4.0
73220,0.830,0.759,0.0,-11.880,1.0,0.0315,0.2140,0.876000,0.1150,0.955,125.009,audio_features,5O15B1osF9aPtpU2sQvkqB,spotify:track:5O15B1osF9aPtpU2sQvkqB,https://api.spotify.com/v1/tracks/5O15B1osF9aP...,https://api.spotify.com/v1/audio-analysis/5O15...,177133.0,4.0
73221,0.811,0.603,7.0,-12.997,1.0,0.0649,0.5420,0.925000,0.0911,0.725,110.014,audio_features,4KL1ztWsn27X4yFWmpaS9t,spotify:track:4KL1ztWsn27X4yFWmpaS9t,https://api.spotify.com/v1/tracks/4KL1ztWsn27X...,https://api.spotify.com/v1/audio-analysis/4KL1...,153453.0,4.0
73222,0.819,0.222,9.0,-15.183,0.0,0.0397,0.3490,0.112000,0.1080,0.791,137.286,audio_features,5ymvXhdhvCpu1c6vIoKrBr,spotify:track:5ymvXhdhvCpu1c6vIoKrBr,https://api.spotify.com/v1/tracks/5ymvXhdhvCpu...,https://api.spotify.com/v1/audio-analysis/5ymv...,134200.0,4.0


In [225]:
audio_feature_df.to_pickle('audio_feature_df.pkl')