# Lab | API wrappers - Create your collection of songs & audio features


#### Instructions 


To move forward with the project, you need to create a collection of songs with their audio features - as large as possible! 

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster.
The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [2]:
secrets_file = open("secrets.txt","r")

In [3]:
string = secrets_file.read()

In [4]:
string

'clientid:806b30c89127425d9de916ba36c86ea8\nclientsecret:0f6eaf58bc364cd598cac727e6da88b4'

In [5]:
string.split('\n')

['clientid:806b30c89127425d9de916ba36c86ea8',
 'clientsecret:0f6eaf58bc364cd598cac727e6da88b4']

In [6]:
secrets_dict={}
for line in string.split('\n'):
    if len(line) > 0:
        #print(line.split(':'))
        secrets_dict[line.split(':')[0]]=line.split(':')[1].strip()

In [7]:
secrets_dict

{'clientid': '806b30c89127425d9de916ba36c86ea8',
 'clientsecret': '0f6eaf58bc364cd598cac727e6da88b4'}

In [8]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['clientid'],
                                                           client_secret=secrets_dict['clientsecret']))

Note: This file format can also be used to store program configuration parameters

#### authentication with secrets

In [9]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

#Initialize SpotiPy with user credentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['clientid'],
                                                           client_secret=secrets_dict['clientsecret']))

### Playlists

We will need to collect a "database" of songs. Playlists are a good way to access relatively large amounts of songs.

### Playlists | Playlist #1

In [10]:
# we will need more songs for our clustering
playlist = sp.user_playlist_tracks("spotify", "1G8IpkZKobrIlXcVPoSIuf")

In [11]:
# this one is biiiig!
playlist["total"] 

10000

In [1]:
# playlist['items']

In [13]:
# Each page is limited to 100 tracks, we will have to fix it:
len(playlist["items"])

100

In [14]:
# we could use the url to the next page which is provided...
playlist['next']

'https://api.spotify.com/v1/playlists/1G8IpkZKobrIlXcVPoSIuf/tracks?offset=100&limit=100&additional_types=track'

In [15]:
from random import randint
from time import sleep

def get_playlist_tracks(playlist_id):
    results = sp.user_playlist_tracks("spotify",playlist_id)
    tracks = results['items']
    while results['next']!=None:
        results = sp.next(results)
        tracks = tracks + results['items']
        sleep(randint(1,3000)/1000) # respectful nap
    return tracks

In [16]:
# this will take at least around num_pages_in_playlist * (avg_sleep_time + processing_time) = 53 * (2+0.1) = 110 seconds
all_tracks = get_playlist_tracks("1G8IpkZKobrIlXcVPoSIuf")
len(all_tracks)

10000

In [17]:
import pandas as pd
from pandas import json_normalize
tracks = json_normalize(all_tracks)

In [18]:
tracks.columns

Index(['added_at', 'is_local', 'primary_color',
       'added_by.external_urls.spotify', 'added_by.href', 'added_by.id',
       'added_by.type', 'added_by.uri', 'track.album.album_type',
       'track.album.artists', 'track.album.available_markets',
       'track.album.external_urls.spotify', 'track.album.href',
       'track.album.id', 'track.album.images', 'track.album.name',
       'track.album.release_date', 'track.album.release_date_precision',
       'track.album.total_tracks', 'track.album.type', 'track.album.uri',
       'track.artists', 'track.available_markets', 'track.disc_number',
       'track.duration_ms', 'track.episode', 'track.explicit',
       'track.external_ids.isrc', 'track.external_urls.spotify', 'track.href',
       'track.id', 'track.is_local', 'track.name', 'track.popularity',
       'track.preview_url', 'track.track', 'track.track_number', 'track.type',
       'track.uri', 'video_thumbnail.url', 'track'],
      dtype='object')

In [19]:
tracks['track.artists'][0]

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/74ASZWbe4lXaubB36ztrGX'},
  'href': 'https://api.spotify.com/v1/artists/74ASZWbe4lXaubB36ztrGX',
  'id': '74ASZWbe4lXaubB36ztrGX',
  'name': 'Bob Dylan',
  'type': 'artist',
  'uri': 'spotify:artist:74ASZWbe4lXaubB36ztrGX'}]

In [20]:
tracks['track.id'][0]

'3AhXZa8sUQht0UEdBJgpGc'

In [21]:
# data = tracks[['track.artists','track.id']]

# def expand_list_dict2(row):
#     df = json_normalize(row['track.artists'])
#     df['song_id'] = tracks.loc[row.name, 'track.id']
#     return df

# tracks['artists_dfs'] = data.apply(expand_list_dict2, axis=1)
# print(tracks['artists_dfs'].head())

In [22]:
# def expand_list_dict2(row):
#     df = json_normalize(row['track.artists'])
#     df['song_id'] = row['track.id']
#     return df

# tracks2['artists_dfs'] = tracks2.apply(expand_list_dict2, axis=1)
# tracks2['artists_dfs'][0]

In [23]:
# JOSE
def expand_list_dict2(row):
    artists_data = row['track.artists']
    if isinstance(artists_data, list):  # Check if it's a list
        artists_data = [artist for artist in artists_data if isinstance(artist, dict)]  # Filter out non-dictionaries
        if artists_data:  # Check if there's any artist data present
            df = json_normalize(artists_data)
            df['song_id'] = row['track.id']
            return df
    return None  # Return None if it's not a list or has no valid artist data

In [24]:
tracks['artists_dfs'] = tracks.apply(expand_list_dict2, axis=1)
tracks['artists_dfs'][0]

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/74ASZWbe4lX...,74ASZWbe4lXaubB36ztrGX,Bob Dylan,artist,spotify:artist:74ASZWbe4lXaubB36ztrGX,https://open.spotify.com/artist/74ASZWbe4lXaub...,3AhXZa8sUQht0UEdBJgpGc


In [25]:
# now we create a new dataframe with all these artists
artist_df = pd.DataFrame(columns=['href', 'id', 'name', 'type', 'uri'])
for mini_df in tracks['artists_dfs']:
    #display(val)
    artist_df = pd.concat([artist_df, mini_df], axis=0)
artist_df

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/74ASZWbe4lX...,74ASZWbe4lXaubB36ztrGX,Bob Dylan,artist,spotify:artist:74ASZWbe4lXaubB36ztrGX,https://open.spotify.com/artist/74ASZWbe4lXaub...,3AhXZa8sUQht0UEdBJgpGc
0,https://api.spotify.com/v1/artists/6olE6TJLqED...,6olE6TJLqED3rqDCT0FyPh,Nirvana,artist,spotify:artist:6olE6TJLqED3rqDCT0FyPh,https://open.spotify.com/artist/6olE6TJLqED3rq...,3oTlkzk1OtrhH8wBAduVEi
0,https://api.spotify.com/v1/artists/3WrFJ7ztbog...,3WrFJ7ztbogyGnTHbHJFl2,The Beatles,artist,spotify:artist:3WrFJ7ztbogyGnTHbHJFl2,https://open.spotify.com/artist/3WrFJ7ztbogyGn...,3ZFBeIyP41HhnALjxWy1pR
0,https://api.spotify.com/v1/artists/3oDbviiivRW...,3oDbviiivRWhXwIE8hxkVV,The Beach Boys,artist,spotify:artist:3oDbviiivRWhXwIE8hxkVV,https://open.spotify.com/artist/3oDbviiivRWhXw...,5Qt4Cc66g24QWwGP3YYV9y
0,https://api.spotify.com/v1/artists/293zczrfYaf...,293zczrfYafIItmnmM3coR,Chuck Berry,artist,spotify:artist:293zczrfYafIItmnmM3coR,https://open.spotify.com/artist/293zczrfYafIIt...,7MH2ZclofPlTrZOkPzZKhK
...,...,...,...,...,...,...,...
0,https://api.spotify.com/v1/artists/2vwI9jlKSgJ...,2vwI9jlKSgJbne3dlTzaLO,Skids,artist,spotify:artist:2vwI9jlKSgJbne3dlTzaLO,https://open.spotify.com/artist/2vwI9jlKSgJbne...,2QSD3K3b3BJ8DPhGhQfDPW
0,https://api.spotify.com/v1/artists/7xTKLpo7UCz...,7xTKLpo7UCzXSnlH7fOIoM,Redman,artist,spotify:artist:7xTKLpo7UCzXSnlH7fOIoM,https://open.spotify.com/artist/7xTKLpo7UCzXSn...,49XnDVsYOHgV4gFZeCojKj
0,https://api.spotify.com/v1/artists/6nB0iY1cjSY...,6nB0iY1cjSY1KyhYyuIIKH,FKA twigs,artist,spotify:artist:6nB0iY1cjSY1KyhYyuIIKH,https://open.spotify.com/artist/6nB0iY1cjSY1Ky...,5Y9IIH8Xmo1nuk0gfFjc4Q
0,https://api.spotify.com/v1/artists/3LrsctPHK5w...,3LrsctPHK5wMdvEqvFN8BW,The Mighty Lemon Drops,artist,spotify:artist:3LrsctPHK5wMdvEqvFN8BW,https://open.spotify.com/artist/3LrsctPHK5wMdv...,0ya0JYEFoXNviB8RMeHDtW


In [26]:
df_merged = pd.merge(left=tracks,
                    right=artist_df,
                    how='inner',
                    left_on='track.id',
                    right_on='song_id')
df_merged.head()

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,video_thumbnail.url,track,artists_dfs,href,id,name,type,uri,external_urls.spotify,song_id
0,2020-11-29T15:02:07Z,False,,https://open.spotify.com/user/acclaimedmusic,https://api.spotify.com/v1/users/acclaimedmusic,acclaimedmusic,user,spotify:user:acclaimedmusic,album,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/74ASZWbe4lX...,74ASZWbe4lXaubB36ztrGX,Bob Dylan,artist,spotify:artist:74ASZWbe4lXaubB36ztrGX,https://open.spotify.com/artist/74ASZWbe4lXaub...,3AhXZa8sUQht0UEdBJgpGc
1,2020-11-29T15:02:07Z,False,,https://open.spotify.com/user/acclaimedmusic,https://api.spotify.com/v1/users/acclaimedmusic,acclaimedmusic,user,spotify:user:acclaimedmusic,album,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/6olE6TJLqED...,6olE6TJLqED3rqDCT0FyPh,Nirvana,artist,spotify:artist:6olE6TJLqED3rqDCT0FyPh,https://open.spotify.com/artist/6olE6TJLqED3rq...,3oTlkzk1OtrhH8wBAduVEi
2,2020-11-29T15:02:07Z,False,,https://open.spotify.com/user/acclaimedmusic,https://api.spotify.com/v1/users/acclaimedmusic,acclaimedmusic,user,spotify:user:acclaimedmusic,album,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/3WrFJ7ztbog...,3WrFJ7ztbogyGnTHbHJFl2,The Beatles,artist,spotify:artist:3WrFJ7ztbogyGnTHbHJFl2,https://open.spotify.com/artist/3WrFJ7ztbogyGn...,3ZFBeIyP41HhnALjxWy1pR
3,2020-11-29T15:02:07Z,False,,https://open.spotify.com/user/acclaimedmusic,https://api.spotify.com/v1/users/acclaimedmusic,acclaimedmusic,user,spotify:user:acclaimedmusic,album,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/3oDbviiivRW...,3oDbviiivRWhXwIE8hxkVV,The Beach Boys,artist,spotify:artist:3oDbviiivRWhXwIE8hxkVV,https://open.spotify.com/artist/3oDbviiivRWhXw...,5Qt4Cc66g24QWwGP3YYV9y
4,2020-11-29T15:02:07Z,False,,https://open.spotify.com/user/acclaimedmusic,https://api.spotify.com/v1/users/acclaimedmusic,acclaimedmusic,user,spotify:user:acclaimedmusic,compilation,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/293zczrfYaf...,293zczrfYafIItmnmM3coR,Chuck Berry,artist,spotify:artist:293zczrfYafIItmnmM3coR,https://open.spotify.com/artist/293zczrfYafIIt...,7MH2ZclofPlTrZOkPzZKhK


In [27]:
df_final = df_merged[['track.name', 'name', 'song_id']]
df_final

Unnamed: 0,track.name,name,song_id
0,Like a Rolling Stone,Bob Dylan,3AhXZa8sUQht0UEdBJgpGc
1,Smells Like Teen Spirit,Nirvana,3oTlkzk1OtrhH8wBAduVEi
2,A Day In The Life - Remastered,The Beatles,3ZFBeIyP41HhnALjxWy1pR
3,Good Vibrations (Mono),The Beach Boys,5Qt4Cc66g24QWwGP3YYV9y
4,Johnny B Goode,Chuck Berry,7MH2ZclofPlTrZOkPzZKhK
...,...,...,...
13624,Into The Valley,Skids,2QSD3K3b3BJ8DPhGhQfDPW
13625,Tonight's Da Night,Redman,49XnDVsYOHgV4gFZeCojKj
13626,Figure 8,FKA twigs,5Y9IIH8Xmo1nuk0gfFjc4Q
13627,Like An Angel,The Mighty Lemon Drops,0ya0JYEFoXNviB8RMeHDtW


In [72]:
df_final.to_csv('df_final.csv', index=False)

### Playlists | Playlist #2

In [28]:
playlist = sp.user_playlist_tracks("spotify", "5xqpyfZyS1DVysoevdVyEn")

In [29]:
playlist["total"] 

10991

In [30]:
len(playlist["items"])

100

In [31]:
all_tracks = get_playlist_tracks("5xqpyfZyS1DVysoevdVyEn")
len(all_tracks)

10991

In [32]:
tracks = json_normalize(all_tracks)

In [33]:
tracks['track.artists'][0]

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/1xU878Z1QtBldR7ru9owdU'},
  'href': 'https://api.spotify.com/v1/artists/1xU878Z1QtBldR7ru9owdU',
  'id': '1xU878Z1QtBldR7ru9owdU',
  'name': 'Phoenix',
  'type': 'artist',
  'uri': 'spotify:artist:1xU878Z1QtBldR7ru9owdU'}]

In [34]:
tracks['track.id'][0]

'3AA8xNhDC0MpqwkGX3EP5V'

In [35]:
tracks['artists_dfs'] = tracks.apply(expand_list_dict2, axis=1)
tracks['artists_dfs'][0]

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/1xU878Z1QtB...,1xU878Z1QtBldR7ru9owdU,Phoenix,artist,spotify:artist:1xU878Z1QtBldR7ru9owdU,https://open.spotify.com/artist/1xU878Z1QtBldR...,3AA8xNhDC0MpqwkGX3EP5V


In [36]:
artist_df = pd.DataFrame(columns=['href', 'id', 'name', 'type', 'uri'])
for mini_df in tracks['artists_dfs']:
    artist_df = pd.concat([artist_df, mini_df], axis=0)
artist_df.head()

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/1xU878Z1QtB...,1xU878Z1QtBldR7ru9owdU,Phoenix,artist,spotify:artist:1xU878Z1QtBldR7ru9owdU,https://open.spotify.com/artist/1xU878Z1QtBldR...,3AA8xNhDC0MpqwkGX3EP5V
0,https://api.spotify.com/v1/artists/2oWyxHNcxMc...,2oWyxHNcxMcrvNIHMX2apR,Radio 4,artist,spotify:artist:2oWyxHNcxMcrvNIHMX2apR,https://open.spotify.com/artist/2oWyxHNcxMcrvN...,2i8lRhmZNKaM1ypHwRybuD
0,https://api.spotify.com/v1/artists/2UyhhfLOvfL...,2UyhhfLOvfLs7ZhzsAaNC3,Mount Sims,artist,spotify:artist:2UyhhfLOvfLs7ZhzsAaNC3,https://open.spotify.com/artist/2UyhhfLOvfLs7Z...,4flxgPC0426CEeSrzQIic0
0,https://api.spotify.com/v1/artists/1gkSl4XpHIH...,1gkSl4XpHIHI4I1WQbfXOE,Peaches,artist,spotify:artist:1gkSl4XpHIHI4I1WQbfXOE,https://open.spotify.com/artist/1gkSl4XpHIHI4I...,1XHFob24QklIXtLRopKirJ
0,https://api.spotify.com/v1/artists/3GpAL7oEFD3...,3GpAL7oEFD37IJDOOiirqy,Zongamin,artist,spotify:artist:3GpAL7oEFD37IJDOOiirqy,https://open.spotify.com/artist/3GpAL7oEFD37IJ...,4JB847zlgViLq8tJIzRsZy


In [37]:
df_merged = pd.merge(left=tracks,
                    right=artist_df,
                    how='inner',
                    left_on='track.id',
                    right_on='song_id')
df_merged.head()

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,track.uri,video_thumbnail.url,artists_dfs,href,id,name,type,uri,external_urls.spotify,song_id
0,2016-04-13T17:09:41Z,False,,https://open.spotify.com/user/secretagentfrog,https://api.spotify.com/v1/users/secretagentfrog,secretagentfrog,user,spotify:user:secretagentfrog,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:3AA8xNhDC0MpqwkGX3EP5V,,...,https://api.spotify.com/v1/artists/1xU878Z1QtB...,1xU878Z1QtBldR7ru9owdU,Phoenix,artist,spotify:artist:1xU878Z1QtBldR7ru9owdU,https://open.spotify.com/artist/1xU878Z1QtBldR...,3AA8xNhDC0MpqwkGX3EP5V
1,2016-04-13T17:09:41Z,False,,https://open.spotify.com/user/secretagentfrog,https://api.spotify.com/v1/users/secretagentfrog,secretagentfrog,user,spotify:user:secretagentfrog,single,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:2i8lRhmZNKaM1ypHwRybuD,,...,https://api.spotify.com/v1/artists/2oWyxHNcxMc...,2oWyxHNcxMcrvNIHMX2apR,Radio 4,artist,spotify:artist:2oWyxHNcxMcrvNIHMX2apR,https://open.spotify.com/artist/2oWyxHNcxMcrvN...,2i8lRhmZNKaM1ypHwRybuD
2,2016-04-13T17:09:41Z,False,,https://open.spotify.com/user/secretagentfrog,https://api.spotify.com/v1/users/secretagentfrog,secretagentfrog,user,spotify:user:secretagentfrog,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:4flxgPC0426CEeSrzQIic0,,...,https://api.spotify.com/v1/artists/2UyhhfLOvfL...,2UyhhfLOvfLs7ZhzsAaNC3,Mount Sims,artist,spotify:artist:2UyhhfLOvfLs7ZhzsAaNC3,https://open.spotify.com/artist/2UyhhfLOvfLs7Z...,4flxgPC0426CEeSrzQIic0
3,2016-04-13T17:09:41Z,False,,https://open.spotify.com/user/secretagentfrog,https://api.spotify.com/v1/users/secretagentfrog,secretagentfrog,user,spotify:user:secretagentfrog,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:1XHFob24QklIXtLRopKirJ,,...,https://api.spotify.com/v1/artists/1gkSl4XpHIH...,1gkSl4XpHIHI4I1WQbfXOE,Peaches,artist,spotify:artist:1gkSl4XpHIHI4I1WQbfXOE,https://open.spotify.com/artist/1gkSl4XpHIHI4I...,1XHFob24QklIXtLRopKirJ
4,2016-04-13T17:09:41Z,False,,https://open.spotify.com/user/secretagentfrog,https://api.spotify.com/v1/users/secretagentfrog,secretagentfrog,user,spotify:user:secretagentfrog,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:4JB847zlgViLq8tJIzRsZy,,...,https://api.spotify.com/v1/artists/3GpAL7oEFD3...,3GpAL7oEFD37IJDOOiirqy,Zongamin,artist,spotify:artist:3GpAL7oEFD37IJDOOiirqy,https://open.spotify.com/artist/3GpAL7oEFD37IJ...,4JB847zlgViLq8tJIzRsZy


In [38]:
df_final2 = df_merged[['track.name', 'name', 'song_id']]
df_final2

Unnamed: 0,track.name,name,song_id
0,If I Ever Feel Better,Phoenix,3AA8xNhDC0MpqwkGX3EP5V
1,Dance To The Underground,Radio 4,2i8lRhmZNKaM1ypHwRybuD
2,How We Do,Mount Sims,4flxgPC0426CEeSrzQIic0
3,Fuck the Pain Away,Peaches,1XHFob24QklIXtLRopKirJ
4,Street Surgery 2,Zongamin,4JB847zlgViLq8tJIzRsZy
...,...,...,...
12676,Hate Is an Attractive Force,SLEEP RADIO,1v6vHkDHBjm4OHUzeOKber
12677,Promises,Dorian Concept,5TN8n4LQS8Or2WEnU56pXs
12678,16 Beat,Metronomy,6MEzr1cmAJi3FXBjXVXMnm
12679,Amplified In The Silence,Manchester Orchestra,6ovSjbBKatyx9KvWsxkAZo


In [73]:
df_final2.to_csv('df_final2.csv', index=False)

### Playlists | Playlist #3

In [39]:
playlist = sp.user_playlist_tracks("spotify", "1YL4XoegERoragv0RK2RC9")

In [40]:
playlist["total"] 

9994

In [41]:
len(playlist["items"])

100

In [42]:
all_tracks = get_playlist_tracks("1YL4XoegERoragv0RK2RC9")
len(all_tracks)

9994

In [43]:
tracks = json_normalize(all_tracks)

In [44]:
tracks['track.artists'][0]

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/4qwGe91Bz9K2T8jXTZ815W'},
  'href': 'https://api.spotify.com/v1/artists/4qwGe91Bz9K2T8jXTZ815W',
  'id': '4qwGe91Bz9K2T8jXTZ815W',
  'name': 'Janet Jackson',
  'type': 'artist',
  'uri': 'spotify:artist:4qwGe91Bz9K2T8jXTZ815W'}]

In [45]:
tracks['track.id'][0]

'1IQu8MIytsL3io4PvubwtW'

In [46]:
tracks['artists_dfs'] = tracks.apply(expand_list_dict2, axis=1)
tracks['artists_dfs'][0]

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/4qwGe91Bz9K...,4qwGe91Bz9K2T8jXTZ815W,Janet Jackson,artist,spotify:artist:4qwGe91Bz9K2T8jXTZ815W,https://open.spotify.com/artist/4qwGe91Bz9K2T8...,1IQu8MIytsL3io4PvubwtW


In [47]:
artist_df = pd.DataFrame(columns=['href', 'id', 'name', 'type', 'uri'])
for mini_df in tracks['artists_dfs']:
    artist_df = pd.concat([artist_df, mini_df], axis=0)
artist_df.head()

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/4qwGe91Bz9K...,4qwGe91Bz9K2T8jXTZ815W,Janet Jackson,artist,spotify:artist:4qwGe91Bz9K2T8jXTZ815W,https://open.spotify.com/artist/4qwGe91Bz9K2T8...,1IQu8MIytsL3io4PvubwtW
0,https://api.spotify.com/v1/artists/2HcwFjNelS4...,2HcwFjNelS49kFbfvMxQYw,Robbie Williams,artist,spotify:artist:2HcwFjNelS49kFbfvMxQYw,https://open.spotify.com/artist/2HcwFjNelS49kF...,1YW369EbVyjpeLE3YbsjKQ
0,https://api.spotify.com/v1/artists/7pUbbv62ajr...,7pUbbv62ajr1JDVFaftZJT,Marcia Hines,artist,spotify:artist:7pUbbv62ajr1JDVFaftZJT,https://open.spotify.com/artist/7pUbbv62ajr1JD...,5r4a8aHraKhaK1u7OKvHwW
0,https://api.spotify.com/v1/artists/75L9s8KVrhC...,75L9s8KVrhCNtBUkZFnDFW,Vanessa Williams,artist,spotify:artist:75L9s8KVrhCNtBUkZFnDFW,https://open.spotify.com/artist/75L9s8KVrhCNtB...,4qjrCkcVbsYlitCqbBkeKe
0,https://api.spotify.com/v1/artists/74XFHRwlV6O...,74XFHRwlV6OrjEM0A2NCMF,Paramore,artist,spotify:artist:74XFHRwlV6OrjEM0A2NCMF,https://open.spotify.com/artist/74XFHRwlV6OrjE...,5u8F359esydVGN4thHmSRz


In [48]:
df_merged = pd.merge(left=tracks,
                    right=artist_df,
                    how='inner',
                    left_on='track.id',
                    right_on='song_id')
df_merged.head()

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,video_thumbnail.url,track,artists_dfs,href,id,name,type,uri,external_urls.spotify,song_id
0,2021-08-08T09:26:31Z,False,,https://open.spotify.com/user/bradnumber1,https://api.spotify.com/v1/users/bradnumber1,bradnumber1,user,spotify:user:bradnumber1,compilation,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/4qwGe91Bz9K...,4qwGe91Bz9K2T8jXTZ815W,Janet Jackson,artist,spotify:artist:4qwGe91Bz9K2T8jXTZ815W,https://open.spotify.com/artist/4qwGe91Bz9K2T8...,1IQu8MIytsL3io4PvubwtW
1,2021-08-08T09:26:31Z,False,,https://open.spotify.com/user/bradnumber1,https://api.spotify.com/v1/users/bradnumber1,bradnumber1,user,spotify:user:bradnumber1,album,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/2HcwFjNelS4...,2HcwFjNelS49kFbfvMxQYw,Robbie Williams,artist,spotify:artist:2HcwFjNelS49kFbfvMxQYw,https://open.spotify.com/artist/2HcwFjNelS49kF...,1YW369EbVyjpeLE3YbsjKQ
2,2021-08-08T09:26:31Z,False,,https://open.spotify.com/user/bradnumber1,https://api.spotify.com/v1/users/bradnumber1,bradnumber1,user,spotify:user:bradnumber1,single,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/7pUbbv62ajr...,7pUbbv62ajr1JDVFaftZJT,Marcia Hines,artist,spotify:artist:7pUbbv62ajr1JDVFaftZJT,https://open.spotify.com/artist/7pUbbv62ajr1JD...,5r4a8aHraKhaK1u7OKvHwW
3,2021-08-08T09:26:31Z,False,,https://open.spotify.com/user/bradnumber1,https://api.spotify.com/v1/users/bradnumber1,bradnumber1,user,spotify:user:bradnumber1,compilation,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/75L9s8KVrhC...,75L9s8KVrhCNtBUkZFnDFW,Vanessa Williams,artist,spotify:artist:75L9s8KVrhCNtBUkZFnDFW,https://open.spotify.com/artist/75L9s8KVrhCNtB...,4qjrCkcVbsYlitCqbBkeKe
4,2021-08-08T09:26:31Z,False,,https://open.spotify.com/user/bradnumber1,https://api.spotify.com/v1/users/bradnumber1,bradnumber1,user,spotify:user:bradnumber1,album,[{'external_urls': {'spotify': 'https://open.s...,...,,,...,https://api.spotify.com/v1/artists/74XFHRwlV6O...,74XFHRwlV6OrjEM0A2NCMF,Paramore,artist,spotify:artist:74XFHRwlV6OrjEM0A2NCMF,https://open.spotify.com/artist/74XFHRwlV6OrjE...,5u8F359esydVGN4thHmSRz


In [49]:
df_final3 = df_merged[['track.name', 'name', 'song_id']]
df_final3

Unnamed: 0,track.name,name,song_id
0,Runaway,Janet Jackson,1IQu8MIytsL3io4PvubwtW
1,Love My Life,Robbie Williams,1YW369EbVyjpeLE3YbsjKQ
2,You - Remastered,Marcia Hines,5r4a8aHraKhaK1u7OKvHwW
3,Colors Of The Wind - End Title,Vanessa Williams,4qjrCkcVbsYlitCqbBkeKe
4,Careful,Paramore,5u8F359esydVGN4thHmSRz
...,...,...,...
12153,Good as Hell (feat. Ariana Grande) - Remix,Lizzo,07Oz5StQ7GRoygNLaXs2pd
12154,Good as Hell (feat. Ariana Grande) - Remix,Ariana Grande,07Oz5StQ7GRoygNLaXs2pd
12155,Who's Laughing Now,Ava Max,73h4oe03sZy8bXfQLfnqMv
12156,Stop Callin' Me,Shakaya,0uaB8avevQNDd32qEqh6mx


In [74]:
df_final3.to_csv('df_final3.csv', index=False)

### Playlists | Playlist #4

In [93]:
playlist = sp.user_playlist_tracks("spotify", "6sx8rpqIE3R9sfGSb8Udxy")

In [94]:
playlist["total"] 

9999

In [95]:
len(playlist["items"])

100

In [96]:
all_tracks = get_playlist_tracks("6sx8rpqIE3R9sfGSb8Udxy")
len(all_tracks)

9999

In [101]:
tracks = json_normalize(all_tracks)

In [102]:
tracks['track.artists'][0]

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/6i6WlGzQtXtz7GcC5H5st5'},
  'href': 'https://api.spotify.com/v1/artists/6i6WlGzQtXtz7GcC5H5st5',
  'id': '6i6WlGzQtXtz7GcC5H5st5',
  'name': '10cc',
  'type': 'artist',
  'uri': 'spotify:artist:6i6WlGzQtXtz7GcC5H5st5'}]

In [103]:
tracks['track.id'][0]

'3wSaVY4rEIAwl5RcHfSUrz'

In [104]:
tracks['artists_dfs'] = tracks.apply(expand_list_dict2, axis=1)
tracks['artists_dfs'][0]

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,3wSaVY4rEIAwl5RcHfSUrz


In [105]:
artist_df = pd.DataFrame(columns=['href', 'id', 'name', 'type', 'uri'])
for mini_df in tracks['artists_dfs']:
    artist_df = pd.concat([artist_df, mini_df], axis=0)
artist_df.head()

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,3wSaVY4rEIAwl5RcHfSUrz
0,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,1LOZMYF5s8qhW7Rv4w2gun
0,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,15vb7UNhJ5c5K4exsUk3B3
0,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,1fMGRxKRtIKNyaMMGrzInM
0,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,4mmWv18Tz155oy6x7PEe3q


In [106]:
df_merged = pd.merge(left=tracks,
                    right=artist_df,
                    how='inner',
                    left_on='track.id',
                    right_on='song_id')
df_merged.head()

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,track.uri,video_thumbnail.url,artists_dfs,href,id,name,type,uri,external_urls.spotify,song_id
0,2022-12-27T12:42:24Z,False,,https://open.spotify.com/user/paterbobo,https://api.spotify.com/v1/users/paterbobo,paterbobo,user,spotify:user:paterbobo,compilation,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:3wSaVY4rEIAwl5RcHfSUrz,,...,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,3wSaVY4rEIAwl5RcHfSUrz
1,2022-12-27T12:42:36Z,False,,https://open.spotify.com/user/paterbobo,https://api.spotify.com/v1/users/paterbobo,paterbobo,user,spotify:user:paterbobo,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:1LOZMYF5s8qhW7Rv4w2gun,,...,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,1LOZMYF5s8qhW7Rv4w2gun
2,2022-12-27T19:08:09Z,False,,https://open.spotify.com/user/paterbobo,https://api.spotify.com/v1/users/paterbobo,paterbobo,user,spotify:user:paterbobo,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:15vb7UNhJ5c5K4exsUk3B3,,...,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,15vb7UNhJ5c5K4exsUk3B3
3,2022-12-27T19:08:21Z,False,,https://open.spotify.com/user/paterbobo,https://api.spotify.com/v1/users/paterbobo,paterbobo,user,spotify:user:paterbobo,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:1fMGRxKRtIKNyaMMGrzInM,,...,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,1fMGRxKRtIKNyaMMGrzInM
4,2022-12-27T19:08:38Z,False,,https://open.spotify.com/user/paterbobo,https://api.spotify.com/v1/users/paterbobo,paterbobo,user,spotify:user:paterbobo,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:4mmWv18Tz155oy6x7PEe3q,,...,https://api.spotify.com/v1/artists/6i6WlGzQtXt...,6i6WlGzQtXtz7GcC5H5st5,10cc,artist,spotify:artist:6i6WlGzQtXtz7GcC5H5st5,https://open.spotify.com/artist/6i6WlGzQtXtz7G...,4mmWv18Tz155oy6x7PEe3q


In [107]:
df_final4 = df_merged[['track.name', 'name', 'song_id']]
df_final4

Unnamed: 0,track.name,name,song_id
0,Donna,10cc,3wSaVY4rEIAwl5RcHfSUrz
1,Dreadlock Holiday,10cc,1LOZMYF5s8qhW7Rv4w2gun
2,Oomachasaooma (Feel The Love),10cc,15vb7UNhJ5c5K4exsUk3B3
3,Good Morning Judge,10cc,1fMGRxKRtIKNyaMMGrzInM
4,I'm Mandy Fly Me,10cc,4mmWv18Tz155oy6x7PEe3q
...,...,...,...
9646,Blue feat. Ilse de Lange,Zucchero,583qzYY2ARvSDchyq8PkTF
9647,Blue feat. Ilse de Lange,Ilse DeLange,583qzYY2ARvSDchyq8PkTF
9648,"All For Love - From ""The Three Musketeers"" Sou...",Bryan Adams,3sDWjYgYzAPUcwzuo6OGs8
9649,"All For Love - From ""The Three Musketeers"" Sou...",Sting,3sDWjYgYzAPUcwzuo6OGs8


In [108]:
df_final4.to_csv('df_final4.csv', index=False)

### Playlists | Playlist #5

In [10]:
playlist = sp.user_playlist_tracks("spotify", "32twOqGf8gIswTgzG3IKxP")

In [11]:
playlist["total"] 

9985

In [12]:
# playlist['items']

In [13]:
# Each page is limited to 100 tracks, we will have to fix it:
len(playlist["items"])

100

In [14]:
# we could use the url to the next page which is provided...
playlist['next']

'https://api.spotify.com/v1/playlists/32twOqGf8gIswTgzG3IKxP/tracks?offset=100&limit=100&additional_types=track'

In [15]:
from random import randint
from time import sleep

def get_playlist_tracks(playlist_id):
    results = sp.user_playlist_tracks("spotify",playlist_id)
    tracks = results['items']
    while results['next']!=None:
        results = sp.next(results)
        tracks = tracks + results['items']
        sleep(randint(1,3000)/1000) # respectful nap
    return tracks

In [16]:
# this will take at least around num_pages_in_playlist * (avg_sleep_time + processing_time) = 53 * (2+0.1) = 110 seconds
all_tracks = get_playlist_tracks("32twOqGf8gIswTgzG3IKxP")
len(all_tracks)

9985

In [17]:
import pandas as pd
from pandas import json_normalize
tracks = json_normalize(all_tracks)

In [18]:
tracks.columns

Index(['added_at', 'is_local', 'primary_color',
       'added_by.external_urls.spotify', 'added_by.href', 'added_by.id',
       'added_by.type', 'added_by.uri', 'track.album.album_type',
       'track.album.artists', 'track.album.available_markets',
       'track.album.external_urls.spotify', 'track.album.href',
       'track.album.id', 'track.album.images', 'track.album.name',
       'track.album.release_date', 'track.album.release_date_precision',
       'track.album.total_tracks', 'track.album.type', 'track.album.uri',
       'track.artists', 'track.available_markets', 'track.disc_number',
       'track.duration_ms', 'track.episode', 'track.explicit',
       'track.external_ids.isrc', 'track.external_urls.spotify', 'track.href',
       'track.id', 'track.is_local', 'track.name', 'track.popularity',
       'track.preview_url', 'track.track', 'track.track_number', 'track.type',
       'track.uri', 'video_thumbnail.url'],
      dtype='object')

In [19]:
tracks['track.artists'][0]

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/0rvjqX7ttXeg3mTy8Xscbt'},
  'href': 'https://api.spotify.com/v1/artists/0rvjqX7ttXeg3mTy8Xscbt',
  'id': '0rvjqX7ttXeg3mTy8Xscbt',
  'name': 'Journey',
  'type': 'artist',
  'uri': 'spotify:artist:0rvjqX7ttXeg3mTy8Xscbt'}]

In [20]:
tracks['track.id'][0]

'77NNZQSqzLNqh2A9JhLRkg'

In [21]:
# JOSE
def expand_list_dict2(row):
    artists_data = row['track.artists']
    if isinstance(artists_data, list):  # Check if it's a list
        artists_data = [artist for artist in artists_data if isinstance(artist, dict)]  # Filter out non-dictionaries
        if artists_data:  # Check if there's any artist data present
            df = json_normalize(artists_data)
            df['song_id'] = row['track.id']
            return df
    return None  # Return None if it's not a list or has no valid artist data

In [22]:
tracks['artists_dfs'] = tracks.apply(expand_list_dict2, axis=1)
tracks['artists_dfs'][0]

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/0rvjqX7ttXe...,0rvjqX7ttXeg3mTy8Xscbt,Journey,artist,spotify:artist:0rvjqX7ttXeg3mTy8Xscbt,https://open.spotify.com/artist/0rvjqX7ttXeg3m...,77NNZQSqzLNqh2A9JhLRkg


In [23]:
# now we create a new dataframe with all these artists
artist_df = pd.DataFrame(columns=['href', 'id', 'name', 'type', 'uri'])
for mini_df in tracks['artists_dfs']:
    #display(val)
    artist_df = pd.concat([artist_df, mini_df], axis=0)
artist_df

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/0rvjqX7ttXe...,0rvjqX7ttXeg3mTy8Xscbt,Journey,artist,spotify:artist:0rvjqX7ttXeg3mTy8Xscbt,https://open.spotify.com/artist/0rvjqX7ttXeg3m...,77NNZQSqzLNqh2A9JhLRkg
0,https://api.spotify.com/v1/artists/4gzpq5DPGxS...,4gzpq5DPGxSnKTe4SA8HAU,Coldplay,artist,spotify:artist:4gzpq5DPGxSnKTe4SA8HAU,https://open.spotify.com/artist/4gzpq5DPGxSnKT...,7LVHVU3tWfcxj5aiPFEW4Q
0,https://api.spotify.com/v1/artists/4gzpq5DPGxS...,4gzpq5DPGxSnKTe4SA8HAU,Coldplay,artist,spotify:artist:4gzpq5DPGxSnKTe4SA8HAU,https://open.spotify.com/artist/4gzpq5DPGxSnKT...,75JFxkI2RXiU7L9VXzMkle
0,https://api.spotify.com/v1/artists/3qm84nBOXUE...,3qm84nBOXUEQ2vnTfUTTFC,Guns N' Roses,artist,spotify:artist:3qm84nBOXUEQ2vnTfUTTFC,https://open.spotify.com/artist/3qm84nBOXUEQ2v...,1kDYCn9VIDmOIbLZ2Xc7OO
0,https://api.spotify.com/v1/artists/4gzpq5DPGxS...,4gzpq5DPGxSnKTe4SA8HAU,Coldplay,artist,spotify:artist:4gzpq5DPGxSnKTe4SA8HAU,https://open.spotify.com/artist/4gzpq5DPGxSnKT...,1mea3bSkSGXuIRvnydlB5b
...,...,...,...,...,...,...,...
0,https://api.spotify.com/v1/artists/0YrtvWJMgSd...,0YrtvWJMgSdVrk3SfNjTbx,Death Cab for Cutie,artist,spotify:artist:0YrtvWJMgSdVrk3SfNjTbx,https://open.spotify.com/artist/0YrtvWJMgSdVrk...,5vFKQF10Jhyysg3JUbfBUd
0,https://api.spotify.com/v1/artists/5lPsVvHVDr6...,5lPsVvHVDr6R5mDxRUXdOs,A1,artist,spotify:artist:5lPsVvHVDr6R5mDxRUXdOs,https://open.spotify.com/artist/5lPsVvHVDr6R5m...,6IVJFZI4fejlJ9XTa3CG5G
0,https://api.spotify.com/v1/artists/04gDigrS5kc...,04gDigrS5kc9YWfZHwBETP,Maroon 5,artist,spotify:artist:04gDigrS5kc9YWfZHwBETP,https://open.spotify.com/artist/04gDigrS5kc9YW...,0Yb0L5qlyMDyAZor9d57iS
0,https://api.spotify.com/v1/artists/1RJkmcXtvDd...,1RJkmcXtvDdyhfn2PbrZzQ,Egotrippi,artist,spotify:artist:1RJkmcXtvDdyhfn2PbrZzQ,https://open.spotify.com/artist/1RJkmcXtvDdyhf...,5esJb63Nbi1LeG0v4NyKTt


In [24]:
df_merged = pd.merge(left=tracks,
                    right=artist_df,
                    how='inner',
                    left_on='track.id',
                    right_on='song_id')
df_merged.head()

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,track.uri,video_thumbnail.url,artists_dfs,href,id,name,type,uri,external_urls.spotify,song_id
0,2010-11-14T19:56:54Z,False,,https://open.spotify.com/user/,https://api.spotify.com/v1/users/,,user,spotify:user:,compilation,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:77NNZQSqzLNqh2A9JhLRkg,,...,https://api.spotify.com/v1/artists/0rvjqX7ttXe...,0rvjqX7ttXeg3mTy8Xscbt,Journey,artist,spotify:artist:0rvjqX7ttXeg3mTy8Xscbt,https://open.spotify.com/artist/0rvjqX7ttXeg3m...,77NNZQSqzLNqh2A9JhLRkg
1,2010-11-14T19:56:54Z,False,,https://open.spotify.com/user/,https://api.spotify.com/v1/users/,,user,spotify:user:,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:7LVHVU3tWfcxj5aiPFEW4Q,,...,https://api.spotify.com/v1/artists/4gzpq5DPGxS...,4gzpq5DPGxSnKTe4SA8HAU,Coldplay,artist,spotify:artist:4gzpq5DPGxSnKTe4SA8HAU,https://open.spotify.com/artist/4gzpq5DPGxSnKT...,7LVHVU3tWfcxj5aiPFEW4Q
2,2010-11-14T19:56:54Z,False,,https://open.spotify.com/user/,https://api.spotify.com/v1/users/,,user,spotify:user:,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:75JFxkI2RXiU7L9VXzMkle,,...,https://api.spotify.com/v1/artists/4gzpq5DPGxS...,4gzpq5DPGxSnKTe4SA8HAU,Coldplay,artist,spotify:artist:4gzpq5DPGxSnKTe4SA8HAU,https://open.spotify.com/artist/4gzpq5DPGxSnKT...,75JFxkI2RXiU7L9VXzMkle
3,2010-11-14T19:56:54Z,False,,https://open.spotify.com/user/,https://api.spotify.com/v1/users/,,user,spotify:user:,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:1kDYCn9VIDmOIbLZ2Xc7OO,,...,https://api.spotify.com/v1/artists/3qm84nBOXUE...,3qm84nBOXUEQ2vnTfUTTFC,Guns N' Roses,artist,spotify:artist:3qm84nBOXUEQ2vnTfUTTFC,https://open.spotify.com/artist/3qm84nBOXUEQ2v...,1kDYCn9VIDmOIbLZ2Xc7OO
4,2010-11-14T19:56:54Z,False,,https://open.spotify.com/user/,https://api.spotify.com/v1/users/,,user,spotify:user:,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:1mea3bSkSGXuIRvnydlB5b,,...,https://api.spotify.com/v1/artists/4gzpq5DPGxS...,4gzpq5DPGxSnKTe4SA8HAU,Coldplay,artist,spotify:artist:4gzpq5DPGxSnKTe4SA8HAU,https://open.spotify.com/artist/4gzpq5DPGxSnKT...,1mea3bSkSGXuIRvnydlB5b


In [25]:
df_final5 = df_merged[['track.name', 'name', 'song_id']]
df_final5

Unnamed: 0,track.name,name,song_id
0,Don't Stop Believin',Journey,77NNZQSqzLNqh2A9JhLRkg
1,Fix You,Coldplay,7LVHVU3tWfcxj5aiPFEW4Q
2,The Scientist,Coldplay,75JFxkI2RXiU7L9VXzMkle
3,Sweet Child O' Mine,Guns N' Roses,1kDYCn9VIDmOIbLZ2Xc7OO
4,Viva La Vida,Coldplay,1mea3bSkSGXuIRvnydlB5b
...,...,...,...
10544,Idiot Prayer,Nick Cave & The Bad Seeds,2UGFmJOznwWfqjlCeswzoR
10545,The Human Stain,Kamelot,5skbmePF1sdCRQGchir7wi
10546,Dance On A Volcano - 2007 - Remaster,Genesis,3ZSsvthF4bBHngSe90ZHTG
10547,Jimmy Jazz,The Clash,6hDiRvumjEujaRqiV6E83s


In [26]:
df_final5.to_csv('df_final5.csv', index=False)

### Playlists | concatenate all playlists

In [38]:
file_path = 'df_final.csv'
df_final = pd.read_csv(file_path)
df_final.shape

(13629, 3)

In [39]:
file_path2 = 'df_final2.csv'
df_final2 = pd.read_csv(file_path2)
df_final2.shape

(12681, 3)

In [40]:
file_path3 = 'df_final3.csv'
df_final3 = pd.read_csv(file_path3)
df_final3.shape

(12158, 3)

In [41]:
file_path4 = 'df_final4.csv'
df_final4 = pd.read_csv(file_path4)
df_final4.shape

(9651, 3)

In [42]:
file_path5 = 'df_final5.csv'
df_final5 = pd.read_csv(file_path5)
df_final5.shape

(10549, 3)

In [44]:
dfs_to_concat = [df_final, df_final2, df_final3, df_final4, df_final5]
result_df = pd.concat(dfs_to_concat, ignore_index=True)
print(result_df.head())
result_df.to_csv('concatenated_output.csv', index=False)

                       track.name            name                 song_id
0            Like a Rolling Stone       Bob Dylan  3AhXZa8sUQht0UEdBJgpGc
1         Smells Like Teen Spirit         Nirvana  3oTlkzk1OtrhH8wBAduVEi
2  A Day In The Life - Remastered     The Beatles  3ZFBeIyP41HhnALjxWy1pR
3          Good Vibrations (Mono)  The Beach Boys  5Qt4Cc66g24QWwGP3YYV9y
4                  Johnny B Goode     Chuck Berry  7MH2ZclofPlTrZOkPzZKhK


In [45]:
result_df.shape

(58668, 3)

### Playlists | Clean Duplicates

In [46]:
result_df_no_duplicates = result_df.drop_duplicates()
print(result_df_no_duplicates.head())
result_df_no_duplicates.to_csv('output_no_duplicates.csv', index=False)
result_df_no_duplicates.shape

                       track.name            name                 song_id
0            Like a Rolling Stone       Bob Dylan  3AhXZa8sUQht0UEdBJgpGc
1         Smells Like Teen Spirit         Nirvana  3oTlkzk1OtrhH8wBAduVEi
2  A Day In The Life - Remastered     The Beatles  3ZFBeIyP41HhnALjxWy1pR
3          Good Vibrations (Mono)  The Beach Boys  5Qt4Cc66g24QWwGP3YYV9y
4                  Johnny B Goode     Chuck Berry  7MH2ZclofPlTrZOkPzZKhK


(52773, 3)

### Playlists | Clean Nulls

In [47]:
result_df_no_nulls = result_df_no_duplicates.dropna()
result_df_no_nulls.shape

(50609, 3)

### Audio features

In [54]:
df_final=result_df_no_nulls
df_final.head()

Unnamed: 0,track.name,name,song_id
0,Like a Rolling Stone,Bob Dylan,3AhXZa8sUQht0UEdBJgpGc
1,Smells Like Teen Spirit,Nirvana,3oTlkzk1OtrhH8wBAduVEi
2,A Day In The Life - Remastered,The Beatles,3ZFBeIyP41HhnALjxWy1pR
3,Good Vibrations (Mono),The Beach Boys,5Qt4Cc66g24QWwGP3YYV9y
4,Johnny B Goode,Chuck Berry,7MH2ZclofPlTrZOkPzZKhK


In [55]:
chunks = [(i, i+100) for i in range(0, len(df_final), 100)]
chunks
audio_features_list = []
for chunk in chunks:
    id_list100 = df_final['song_id'][chunk[0]:chunk[1]]
    audio_features_list = audio_features_list + sp.audio_features(id_list100)
    sleep(randint(1,3000)/1000)
len(audio_features_list)

50609

In [66]:
# !pip install requests spotipy

In [56]:
for chunk in chunks:
    id_list100 = df_final['song_id'][chunk[0]:chunk[1]]
    try:
        audio_features = sp.audio_features(id_list100)
        audio_features_list += audio_features
    except SpotifyException as e:
        if e.http_status == 429:
            if 'Retry-After' in e.headers:
                sleep(int(e.headers['Retry-After']))
            else:
                # Otherwise, use a fixed delay
                sleep(10)  # Adjust the delay as needed
            continue
        else:
            print(f"Error: {e}")
    sleep(randint(1, 3))  # Add a delay to avoid rate limiting
len(audio_features_list)

101218

In [57]:
audio_features_df = json_normalize(audio_features_list)

In [58]:
audio_features_df.drop_duplicates(inplace=True)

In [67]:
df_w_audio_ft = pd.merge(left=result_df_no_nulls,
                        right=audio_features_df,
                        how='inner',
                        left_on='song_id',
                        right_on='id')
df_w_audio_ft.head()

Unnamed: 0,track.name,name,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Like a Rolling Stone,Bob Dylan,3AhXZa8sUQht0UEdBJgpGc,0.482,0.721,0,-6.839,1,0.0321,0.731,...,0.189,0.557,95.263,audio_features,3AhXZa8sUQht0UEdBJgpGc,spotify:track:3AhXZa8sUQht0UEdBJgpGc,https://api.spotify.com/v1/tracks/3AhXZa8sUQht...,https://api.spotify.com/v1/audio-analysis/3AhX...,369600,4
1,Smells Like Teen Spirit,Nirvana,3oTlkzk1OtrhH8wBAduVEi,0.485,0.863,1,-9.027,1,0.0495,1.2e-05,...,0.138,0.767,116.835,audio_features,3oTlkzk1OtrhH8wBAduVEi,spotify:track:3oTlkzk1OtrhH8wBAduVEi,https://api.spotify.com/v1/tracks/3oTlkzk1Otrh...,https://api.spotify.com/v1/audio-analysis/3oTl...,300977,4
2,A Day In The Life - Remastered,The Beatles,3ZFBeIyP41HhnALjxWy1pR,0.364,0.457,4,-14.162,0,0.0675,0.29,...,0.922,0.175,163.219,audio_features,3ZFBeIyP41HhnALjxWy1pR,spotify:track:3ZFBeIyP41HhnALjxWy1pR,https://api.spotify.com/v1/tracks/3ZFBeIyP41Hh...,https://api.spotify.com/v1/audio-analysis/3ZFB...,337413,4
3,Good Vibrations (Mono),The Beach Boys,5Qt4Cc66g24QWwGP3YYV9y,0.398,0.413,1,-10.934,1,0.0388,0.0822,...,0.0891,0.331,133.574,audio_features,5Qt4Cc66g24QWwGP3YYV9y,spotify:track:5Qt4Cc66g24QWwGP3YYV9y,https://api.spotify.com/v1/tracks/5Qt4Cc66g24Q...,https://api.spotify.com/v1/audio-analysis/5Qt4...,219147,4
4,Johnny B Goode,Chuck Berry,7MH2ZclofPlTrZOkPzZKhK,0.518,0.756,10,-10.851,1,0.0915,0.735,...,0.317,0.968,166.429,audio_features,7MH2ZclofPlTrZOkPzZKhK,spotify:track:7MH2ZclofPlTrZOkPzZKhK,https://api.spotify.com/v1/tracks/7MH2ZclofPlT...,https://api.spotify.com/v1/audio-analysis/7MH2...,160893,4


In [63]:
df_w_audio_ft.to_csv('df_w_audio_ft2.csv', index=False)