<div style='text-align: right;'>
    bojan.gjokjevski
</div>

---

# Lab | API wrappers
### Create your collection of songs & audio features

#### Instructions

To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

---

In [1]:
import pandas as pd
import spotipy

In [2]:
from pandas import json_normalize
from spotipy.oauth2 import SpotifyClientCredentials
from random import randint
from time import sleep

In [3]:
secrets_file = open('secrets.txt','r')

In [4]:
string = secrets_file.read()

In [5]:
# string

In [6]:
# string.split('\n')

In [7]:
secrets_dict = {}

for line in string.split('\n'):
    if len(line) > 0:
#         print(line.split(':'))
        secrets_dict[line.split(':')[0]]=line.split(':')[1].strip()

In [8]:
# secrets_dict

In [9]:
# initialize SpotiPy with user credentials

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['clientid'],
                                                           client_secret=secrets_dict['clientsecret']))

### Import songs and artists from playlists

In [10]:
# function to extract tracks from spotify based on a spotify playlist

def get_playlist_tracks(playlist_id):

    results = sp.user_playlist_tracks("spotify",playlist_id)
    tracks = results['items']
    
    while results['next']!=None:
        
        results = sp.next(results)
        tracks = tracks + results['items']
        sleep(randint(1,3000)/1000) # respectful nap
    
    return tracks

In [11]:
# first playlist
tracks1 = get_playlist_tracks("6yPiKpy7evrwvZodByKvM9")

In [12]:
# second playlist
tracks2 = get_playlist_tracks("4vBaGi0msaCBxhAq9lWXk3")

In [13]:
type(tracks1)

list

In [14]:
# playlists length and total number of tracks
print(len(tracks1))
print(len(tracks2))
print('total:',len(tracks1)+len(tracks2))

10000
5633
total: 15633


In [15]:
# json_normalizing the data, from list to df
tracks1_df = json_normalize(tracks1)
tracks2_df = json_normalize(tracks2)

In [16]:
# df for the first playlist
print(tracks1_df.shape)
tracks1_df.head()

(10000, 40)


Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,track.id,track.is_local,track.name,track.popularity,track.preview_url,track.track,track.track_number,track.type,track.uri,video_thumbnail.url
0,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,album,[{'external_urls': {'spotify': 'https://open.s...,...,33xMbeHzmWd6Od0BmLZEUs,False,2K,0,,True,17,track,spotify:track:33xMbeHzmWd6Od0BmLZEUs,
1,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,album,[{'external_urls': {'spotify': 'https://open.s...,...,3UnyplmZaq547hwsfOR5yy,False,4 Billion Souls,24,https://p.scdn.co/mp3-preview/d6645e0eeb0f6849...,True,2,track,spotify:track:3UnyplmZaq547hwsfOR5yy,
2,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,album,[{'external_urls': {'spotify': 'https://open.s...,...,1w8QCSDH4QobcQeT4uMKLm,False,4 Minute Warning,0,,True,8,track,spotify:track:1w8QCSDH4QobcQeT4uMKLm,
3,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,album,[{'external_urls': {'spotify': 'https://open.s...,...,7J9mBHG4J2eIfDAv5BehKA,False,7 Element,0,,True,2,track,spotify:track:7J9mBHG4J2eIfDAv5BehKA,
4,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,compilation,[{'external_urls': {'spotify': 'https://open.s...,...,1VZedwJj1gyi88WFRhfThb,False,#9 Dream,5,https://p.scdn.co/mp3-preview/1902adb4d960b913...,True,47,track,spotify:track:1VZedwJj1gyi88WFRhfThb,


In [17]:
# df for the second playlist
print(tracks2_df.shape)
tracks2_df.head()

(5633, 40)


Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,track.id,track.is_local,track.name,track.popularity,track.preview_url,track.track,track.track_number,track.type,track.uri,video_thumbnail.url
0,2014-07-28T00:20:28Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,7dmLg6dYLmjlJXNwFEPQqf,False,You're The One - Un mal pour un bien,0,,True,5,track,spotify:track:7dmLg6dYLmjlJXNwFEPQqf,
1,2014-09-19T22:14:40Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,3QcuZo6WLcFkqqLmDs0d95,False,Doctor My Eyes,12,,True,4,track,spotify:track:3QcuZo6WLcFkqqLmDs0d95,
2,2018-10-18T05:20:13Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,1KHdq8NK9QxnGjdXb55NiG,False,Falling in Love at a Coffee Shop,53,https://p.scdn.co/mp3-preview/760f873c73d8cacd...,True,4,track,spotify:track:1KHdq8NK9QxnGjdXb55NiG,
3,2018-10-18T05:20:48Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,40h65HAR8COEoqkMwUUQHu,False,Peaceful Easy Feeling - 2013 Remaster,70,https://p.scdn.co/mp3-preview/2b5b9400d354e0c4...,True,9,track,spotify:track:40h65HAR8COEoqkMwUUQHu,
4,2018-10-18T05:20:50Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,3oAWTk92mZBxKBOKf8mR5v,False,Summertime Blues,63,,True,7,track,spotify:track:3oAWTk92mZBxKBOKf8mR5v,


In [18]:
# function to extract the data from dict(cell) 'track.artists' and add it into the df as a new column 'song_id'

def expand_list_dict2(row):

    df = json_normalize(row['track.artists'])
    df['song_id'] = row['track.id']
    
    return df

In [19]:
type(tracks1_df)

pandas.core.frame.DataFrame

In [20]:
#  applying the fuction on the first df, to extract the data from cells in column track.artists and add them at the end
tracks1_df['artists_dfs'] = tracks1_df.apply(expand_list_dict2, axis=1)

In [21]:
tracks1_df['artists_dfs'][3]

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/0UK6JkgUMa2...,0UK6JkgUMa28b4t8eCtg6P,Vitas,artist,spotify:artist:0UK6JkgUMa28b4t8eCtg6P,https://open.spotify.com/artist/0UK6JkgUMa28b4...,7J9mBHG4J2eIfDAv5BehKA


In [22]:
# applying the fuction on the second df, to extract the data from cells in column track.artists and add them at the end
tracks2_df['artists_dfs'] = tracks2_df.apply(expand_list_dict2, axis=1)
tracks2_df['artists_dfs'][7]

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/09C0xjtosNA...,09C0xjtosNAIXP36wTnWxd,Fats Domino,artist,spotify:artist:09C0xjtosNAIXP36wTnWxd,https://open.spotify.com/artist/09C0xjtosNAIXP...,4ZfQwNx3FlCN07cnUvekh3


In [23]:
artist1_df = pd.DataFrame(columns=['href', 'id', 'name', 'type', 'uri'])
artist2_df = pd.DataFrame(columns=['href', 'id', 'name', 'type', 'uri'])

In [24]:
for mini_df in tracks1_df['artists_dfs']:
    artist1_df = pd.concat([artist1_df, mini_df], axis=0)
    
artist1_df

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/0IVapwlnM3d...,0IVapwlnM3dEOiMsHXsghT,Nosaj Thing,artist,spotify:artist:0IVapwlnM3dEOiMsHXsghT,https://open.spotify.com/artist/0IVapwlnM3dEOi...,33xMbeHzmWd6Od0BmLZEUs
0,https://api.spotify.com/v1/artists/22WZ7M8sxp5...,22WZ7M8sxp5THdruNY3gXt,The Doors,artist,spotify:artist:22WZ7M8sxp5THdruNY3gXt,https://open.spotify.com/artist/22WZ7M8sxp5THd...,3UnyplmZaq547hwsfOR5yy
0,https://api.spotify.com/v1/artists/4Z8W4fKeB5Y...,4Z8W4fKeB5YxbusRsdQVPb,Radiohead,artist,spotify:artist:4Z8W4fKeB5YxbusRsdQVPb,https://open.spotify.com/artist/4Z8W4fKeB5Yxbu...,1w8QCSDH4QobcQeT4uMKLm
0,https://api.spotify.com/v1/artists/0UK6JkgUMa2...,0UK6JkgUMa28b4t8eCtg6P,Vitas,artist,spotify:artist:0UK6JkgUMa28b4t8eCtg6P,https://open.spotify.com/artist/0UK6JkgUMa28b4...,7J9mBHG4J2eIfDAv5BehKA
0,https://api.spotify.com/v1/artists/4KWTAlx2Rvb...,4KWTAlx2RvbpseOGMEmROg,R.E.M.,artist,spotify:artist:4KWTAlx2RvbpseOGMEmROg,https://open.spotify.com/artist/4KWTAlx2Rvbpse...,1VZedwJj1gyi88WFRhfThb
...,...,...,...,...,...,...,...
0,https://api.spotify.com/v1/artists/3RGLhK1IP9j...,3RGLhK1IP9jnYFH4BRFJBS,The Clash,artist,spotify:artist:3RGLhK1IP9jnYFH4BRFJBS,https://open.spotify.com/artist/3RGLhK1IP9jnYF...,5jzma6gCzYtKB1DbEwFZKH
0,https://api.spotify.com/v1/artists/3ICyfoySNDZ...,3ICyfoySNDZqtBVmaBT84I,War,artist,spotify:artist:3ICyfoySNDZqtBVmaBT84I,https://open.spotify.com/artist/3ICyfoySNDZqtB...,2fmMPJb5EzZCx8BcNJvVk4
0,https://api.spotify.com/v1/artists/3OsRAKCvk37...,3OsRAKCvk37zwYcnzRf5XF,Moby,artist,spotify:artist:3OsRAKCvk37zwYcnzRf5XF,https://open.spotify.com/artist/3OsRAKCvk37zwY...,60rIdEPDrzyLiLC0icp3xz
0,https://api.spotify.com/v1/artists/023YMawCG3O...,023YMawCG3OvACmRjWxLWC,The Cat Empire,artist,spotify:artist:023YMawCG3OvACmRjWxLWC,https://open.spotify.com/artist/023YMawCG3OvAC...,0sEm1ld0V8YTCPcjPVfIsc


In [25]:
for mini_df in tracks2_df['artists_dfs']:
    artist2_df = pd.concat([artist2_df, mini_df], axis=0)
    
artist2_df

Unnamed: 0,href,id,name,type,uri,external_urls.spotify,song_id
0,https://api.spotify.com/v1/artists/6nKqt1nbSBE...,6nKqt1nbSBEq3iUXD1Xgz8,Petula Clark,artist,spotify:artist:6nKqt1nbSBEq3iUXD1Xgz8,https://open.spotify.com/artist/6nKqt1nbSBEq3i...,7dmLg6dYLmjlJXNwFEPQqf
0,https://api.spotify.com/v1/artists/5lkiCO9UQ8B...,5lkiCO9UQ8B23dZ1o0UV4m,Jackson Browne,artist,spotify:artist:5lkiCO9UQ8B23dZ1o0UV4m,https://open.spotify.com/artist/5lkiCO9UQ8B23d...,3QcuZo6WLcFkqqLmDs0d95
0,https://api.spotify.com/v1/artists/1whjlG0NSaQ...,1whjlG0NSaQytgDIWz10GS,Landon Pigg,artist,spotify:artist:1whjlG0NSaQytgDIWz10GS,https://open.spotify.com/artist/1whjlG0NSaQytg...,1KHdq8NK9QxnGjdXb55NiG
0,https://api.spotify.com/v1/artists/0ECwFtbIWEV...,0ECwFtbIWEVNwjlrfc6xoL,Eagles,artist,spotify:artist:0ECwFtbIWEVNwjlrfc6xoL,https://open.spotify.com/artist/0ECwFtbIWEVNwj...,40h65HAR8COEoqkMwUUQHu
0,https://api.spotify.com/v1/artists/1p0t3JtUTay...,1p0t3JtUTayV2wb1RGN9mO,Eddie Cochran,artist,spotify:artist:1p0t3JtUTayV2wb1RGN9mO,https://open.spotify.com/artist/1p0t3JtUTayV2w...,3oAWTk92mZBxKBOKf8mR5v
...,...,...,...,...,...,...,...
0,https://api.spotify.com/v1/artists/69g2TelswPN...,69g2TelswPN1IiFDKvaoSL,Chairmen Of The Board,artist,spotify:artist:69g2TelswPN1IiFDKvaoSL,https://open.spotify.com/artist/69g2TelswPN1Ii...,6qbtTps7ZebLrWtqtu1joj
0,https://api.spotify.com/v1/artists/36eDX6PQlJk...,36eDX6PQlJkjxXUhIINO5w,Jay & The Techniques,artist,spotify:artist:36eDX6PQlJkjxXUhIINO5w,https://open.spotify.com/artist/36eDX6PQlJkjxX...,1m1UTLYGCzEciEdSfpK1Yu
0,https://api.spotify.com/v1/artists/3ahRanZQnsZ...,3ahRanZQnsZmAVMNqRsimM,The Ad Libs,artist,spotify:artist:3ahRanZQnsZmAVMNqRsimM,https://open.spotify.com/artist/3ahRanZQnsZmAV...,3kOrjPYYNLlWp7RJlzyg66
0,https://api.spotify.com/v1/artists/1eEfMU2AhEo...,1eEfMU2AhEo7XnKgL7c304,Carpenters,artist,spotify:artist:1eEfMU2AhEo7XnKgL7c304,https://open.spotify.com/artist/1eEfMU2AhEo7Xn...,0OFUUZdNXJ6KP8DvhN6WVN


In [26]:
df1_merged = pd.merge(left=tracks1_df,
                      right=artist1_df,
                      how='inner',
                      left_on='track.id',
                      right_on='song_id')
df1_merged.head()

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,track.uri,video_thumbnail.url,artists_dfs,href,id,name,type,uri,external_urls.spotify,song_id
0,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:33xMbeHzmWd6Od0BmLZEUs,,...,https://api.spotify.com/v1/artists/0IVapwlnM3d...,0IVapwlnM3dEOiMsHXsghT,Nosaj Thing,artist,spotify:artist:0IVapwlnM3dEOiMsHXsghT,https://open.spotify.com/artist/0IVapwlnM3dEOi...,33xMbeHzmWd6Od0BmLZEUs
1,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:3UnyplmZaq547hwsfOR5yy,,...,https://api.spotify.com/v1/artists/22WZ7M8sxp5...,22WZ7M8sxp5THdruNY3gXt,The Doors,artist,spotify:artist:22WZ7M8sxp5THdruNY3gXt,https://open.spotify.com/artist/22WZ7M8sxp5THd...,3UnyplmZaq547hwsfOR5yy
2,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:1w8QCSDH4QobcQeT4uMKLm,,...,https://api.spotify.com/v1/artists/4Z8W4fKeB5Y...,4Z8W4fKeB5YxbusRsdQVPb,Radiohead,artist,spotify:artist:4Z8W4fKeB5YxbusRsdQVPb,https://open.spotify.com/artist/4Z8W4fKeB5Yxbu...,1w8QCSDH4QobcQeT4uMKLm
3,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:7J9mBHG4J2eIfDAv5BehKA,,...,https://api.spotify.com/v1/artists/0UK6JkgUMa2...,0UK6JkgUMa28b4t8eCtg6P,Vitas,artist,spotify:artist:0UK6JkgUMa28b4t8eCtg6P,https://open.spotify.com/artist/0UK6JkgUMa28b4...,7J9mBHG4J2eIfDAv5BehKA
4,2017-02-27T01:38:09Z,False,,https://open.spotify.com/user/12160726861,https://api.spotify.com/v1/users/12160726861,12160726861,user,spotify:user:12160726861,compilation,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:1VZedwJj1gyi88WFRhfThb,,...,https://api.spotify.com/v1/artists/4KWTAlx2Rvb...,4KWTAlx2RvbpseOGMEmROg,R.E.M.,artist,spotify:artist:4KWTAlx2RvbpseOGMEmROg,https://open.spotify.com/artist/4KWTAlx2Rvbpse...,1VZedwJj1gyi88WFRhfThb


In [27]:
df2_merged = pd.merge(left=tracks2_df,
                      right=artist2_df,
                      how='inner',
                      left_on='track.id',
                      right_on='song_id')
df2_merged.head()

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,track.uri,video_thumbnail.url,artists_dfs,href,id,name,type,uri,external_urls.spotify,song_id
0,2014-07-28T00:20:28Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:7dmLg6dYLmjlJXNwFEPQqf,,...,https://api.spotify.com/v1/artists/6nKqt1nbSBE...,6nKqt1nbSBEq3iUXD1Xgz8,Petula Clark,artist,spotify:artist:6nKqt1nbSBEq3iUXD1Xgz8,https://open.spotify.com/artist/6nKqt1nbSBEq3i...,7dmLg6dYLmjlJXNwFEPQqf
1,2014-09-19T22:14:40Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:3QcuZo6WLcFkqqLmDs0d95,,...,https://api.spotify.com/v1/artists/5lkiCO9UQ8B...,5lkiCO9UQ8B23dZ1o0UV4m,Jackson Browne,artist,spotify:artist:5lkiCO9UQ8B23dZ1o0UV4m,https://open.spotify.com/artist/5lkiCO9UQ8B23d...,3QcuZo6WLcFkqqLmDs0d95
2,2018-10-18T05:20:13Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:1KHdq8NK9QxnGjdXb55NiG,,...,https://api.spotify.com/v1/artists/1whjlG0NSaQ...,1whjlG0NSaQytgDIWz10GS,Landon Pigg,artist,spotify:artist:1whjlG0NSaQytgDIWz10GS,https://open.spotify.com/artist/1whjlG0NSaQytg...,1KHdq8NK9QxnGjdXb55NiG
3,2018-10-18T05:20:48Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:40h65HAR8COEoqkMwUUQHu,,...,https://api.spotify.com/v1/artists/0ECwFtbIWEV...,0ECwFtbIWEVNwjlrfc6xoL,Eagles,artist,spotify:artist:0ECwFtbIWEVNwjlrfc6xoL,https://open.spotify.com/artist/0ECwFtbIWEVNwj...,40h65HAR8COEoqkMwUUQHu
4,2018-10-18T05:20:50Z,False,,https://open.spotify.com/user/larrypeay,https://api.spotify.com/v1/users/larrypeay,larrypeay,user,spotify:user:larrypeay,album,[{'external_urls': {'spotify': 'https://open.s...,...,spotify:track:3oAWTk92mZBxKBOKf8mR5v,,...,https://api.spotify.com/v1/artists/1p0t3JtUTay...,1p0t3JtUTayV2wb1RGN9mO,Eddie Cochran,artist,spotify:artist:1p0t3JtUTayV2wb1RGN9mO,https://open.spotify.com/artist/1p0t3JtUTayV2w...,3oAWTk92mZBxKBOKf8mR5v


In [28]:
df = pd.concat([df1_merged, df2_merged], axis=0)

In [29]:
type(df)

pandas.core.frame.DataFrame

In [30]:
df_final = df[['track.name', 'name', 'song_id']]

In [31]:
df_final.duplicated().value_counts()

False    17739
True      6748
Name: count, dtype: int64

In [32]:
df_final = df_final.drop_duplicates(keep='first', ignore_index=True)

In [33]:
df_final

Unnamed: 0,track.name,name,song_id
0,2K,Nosaj Thing,33xMbeHzmWd6Od0BmLZEUs
1,4 Billion Souls,The Doors,3UnyplmZaq547hwsfOR5yy
2,4 Minute Warning,Radiohead,1w8QCSDH4QobcQeT4uMKLm
3,7 Element,Vitas,7J9mBHG4J2eIfDAv5BehKA
4,#9 Dream,R.E.M.,1VZedwJj1gyi88WFRhfThb
...,...,...,...
17734,Give Me Just A Little More Time,Chairmen Of The Board,6qbtTps7ZebLrWtqtu1joj
17735,"Apples, Peaches, Pumpkin Pie",Jay & The Techniques,1m1UTLYGCzEciEdSfpK1Yu
17736,The Boy From New York City,The Ad Libs,3kOrjPYYNLlWp7RJlzyg66
17737,Intermission,Carpenters,0OFUUZdNXJ6KP8DvhN6WVN


In [34]:
df_final['track.name'].duplicated().value_counts()

track.name
False    13363
True      4376
Name: count, dtype: int64

In [35]:
df_final['name'].duplicated().value_counts()

name
True     12580
False     5159
Name: count, dtype: int64

In [36]:
df_final['song_id'].duplicated().value_counts()

song_id
False    15078
True      2661
Name: count, dtype: int64

In [37]:
df_final.dropna(axis=0, subset='song_id', inplace=True)

In [38]:
# checking audio_features for a certain song 
sp.audio_features('33xMbeHzmWd6Od0BmLZEUs')

[{'danceability': 0.31,
  'energy': 0.445,
  'key': 7,
  'loudness': -13.355,
  'mode': 0,
  'speechiness': 0.0863,
  'acousticness': 0.094,
  'instrumentalness': 0.0678,
  'liveness': 0.113,
  'valence': 0.122,
  'tempo': 95.36,
  'type': 'audio_features',
  'id': '33xMbeHzmWd6Od0BmLZEUs',
  'uri': 'spotify:track:33xMbeHzmWd6Od0BmLZEUs',
  'track_href': 'https://api.spotify.com/v1/tracks/33xMbeHzmWd6Od0BmLZEUs',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/33xMbeHzmWd6Od0BmLZEUs',
  'duration_ms': 152560,
  'time_signature': 3}]

In [39]:
chunks = [(i, i+100) for i in range(0, len(df_final), 100)]
# chunks

In [40]:
audio_features_list = []

In [41]:
# extractig audio features from spotify through chunks of id's stored in a list, for all song in total

for chunk in chunks:
    
    id_list100 = df_final['song_id'][chunk[0]:chunk[1]]
    audio_features_list = audio_features_list + sp.audio_features(id_list100)
    sleep(randint(1,3000)/1000)
    
len(audio_features_list)

17172

In [42]:
audio_features_df = json_normalize(audio_features_list)

In [43]:
audio_features_df.drop_duplicates(inplace=True) # duplicates because some songs have more artists

In [44]:
songs_audiof_df = pd.merge(left=df_final,
                           right=audio_features_df,
                           how='inner',
                           left_on='song_id',
                           right_on='id')

In [45]:
# combined df with songs, artists and audio features for everysong
songs_audiof_df

Unnamed: 0,track.name,name,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,2K,Nosaj Thing,33xMbeHzmWd6Od0BmLZEUs,0.310,0.4450,7,-13.355,0,0.0863,0.0940,...,0.1130,0.122,95.360,audio_features,33xMbeHzmWd6Od0BmLZEUs,spotify:track:33xMbeHzmWd6Od0BmLZEUs,https://api.spotify.com/v1/tracks/33xMbeHzmWd6...,https://api.spotify.com/v1/audio-analysis/33xM...,152560,3
1,4 Billion Souls,The Doors,3UnyplmZaq547hwsfOR5yy,0.419,0.5650,5,-11.565,1,0.0347,0.1370,...,0.1280,0.648,151.277,audio_features,3UnyplmZaq547hwsfOR5yy,spotify:track:3UnyplmZaq547hwsfOR5yy,https://api.spotify.com/v1/tracks/3UnyplmZaq54...,https://api.spotify.com/v1/audio-analysis/3Uny...,197707,4
2,4 Minute Warning,Radiohead,1w8QCSDH4QobcQeT4uMKLm,0.354,0.3020,9,-13.078,1,0.0326,0.5900,...,0.1110,0.223,123.753,audio_features,1w8QCSDH4QobcQeT4uMKLm,spotify:track:1w8QCSDH4QobcQeT4uMKLm,https://api.spotify.com/v1/tracks/1w8QCSDH4Qob...,https://api.spotify.com/v1/audio-analysis/1w8Q...,244285,4
3,7 Element,Vitas,7J9mBHG4J2eIfDAv5BehKA,0.727,0.7850,5,-6.707,0,0.0603,0.3250,...,0.3100,0.960,129.649,audio_features,7J9mBHG4J2eIfDAv5BehKA,spotify:track:7J9mBHG4J2eIfDAv5BehKA,https://api.spotify.com/v1/tracks/7J9mBHG4J2eI...,https://api.spotify.com/v1/audio-analysis/7J9m...,249940,4
4,#9 Dream,R.E.M.,1VZedwJj1gyi88WFRhfThb,0.571,0.7240,0,-5.967,1,0.0260,0.0231,...,0.0919,0.385,116.755,audio_features,1VZedwJj1gyi88WFRhfThb,spotify:track:1VZedwJj1gyi88WFRhfThb,https://api.spotify.com/v1/tracks/1VZedwJj1gyi...,https://api.spotify.com/v1/audio-analysis/1VZe...,278320,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17167,Give Me Just A Little More Time,Chairmen Of The Board,6qbtTps7ZebLrWtqtu1joj,0.583,0.4570,9,-11.166,0,0.0419,0.0635,...,0.0716,0.612,126.555,audio_features,6qbtTps7ZebLrWtqtu1joj,spotify:track:6qbtTps7ZebLrWtqtu1joj,https://api.spotify.com/v1/tracks/6qbtTps7ZebL...,https://api.spotify.com/v1/audio-analysis/6qbt...,160240,4
17168,"Apples, Peaches, Pumpkin Pie",Jay & The Techniques,1m1UTLYGCzEciEdSfpK1Yu,0.637,0.7150,10,-8.670,1,0.0404,0.4240,...,0.1130,0.951,139.500,audio_features,1m1UTLYGCzEciEdSfpK1Yu,spotify:track:1m1UTLYGCzEciEdSfpK1Yu,https://api.spotify.com/v1/tracks/1m1UTLYGCzEc...,https://api.spotify.com/v1/audio-analysis/1m1U...,148493,4
17169,The Boy From New York City,The Ad Libs,3kOrjPYYNLlWp7RJlzyg66,0.474,0.8950,11,-5.105,1,0.0826,0.6880,...,0.1940,0.845,149.192,audio_features,3kOrjPYYNLlWp7RJlzyg66,spotify:track:3kOrjPYYNLlWp7RJlzyg66,https://api.spotify.com/v1/tracks/3kOrjPYYNLlW...,https://api.spotify.com/v1/audio-analysis/3kOr...,181000,4
17170,Intermission,Carpenters,0OFUUZdNXJ6KP8DvhN6WVN,0.340,0.0605,3,-17.755,1,0.0395,0.9750,...,0.1920,0.265,117.927,audio_features,0OFUUZdNXJ6KP8DvhN6WVN,spotify:track:0OFUUZdNXJ6KP8DvhN6WVN,https://api.spotify.com/v1/tracks/0OFUUZdNXJ6K...,https://api.spotify.com/v1/audio-analysis/0OFU...,26960,4


In [46]:
# saving the df as .csv in the folder with the notebook 
songs_audiof_df.to_csv('spotify_songs_b.csv', index=False)

In [47]:
pd.read_csv('spotify_songs_b.csv')

Unnamed: 0,track.name,name,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,2K,Nosaj Thing,33xMbeHzmWd6Od0BmLZEUs,0.310,0.4450,7,-13.355,0,0.0863,0.0940,...,0.1130,0.122,95.360,audio_features,33xMbeHzmWd6Od0BmLZEUs,spotify:track:33xMbeHzmWd6Od0BmLZEUs,https://api.spotify.com/v1/tracks/33xMbeHzmWd6...,https://api.spotify.com/v1/audio-analysis/33xM...,152560,3
1,4 Billion Souls,The Doors,3UnyplmZaq547hwsfOR5yy,0.419,0.5650,5,-11.565,1,0.0347,0.1370,...,0.1280,0.648,151.277,audio_features,3UnyplmZaq547hwsfOR5yy,spotify:track:3UnyplmZaq547hwsfOR5yy,https://api.spotify.com/v1/tracks/3UnyplmZaq54...,https://api.spotify.com/v1/audio-analysis/3Uny...,197707,4
2,4 Minute Warning,Radiohead,1w8QCSDH4QobcQeT4uMKLm,0.354,0.3020,9,-13.078,1,0.0326,0.5900,...,0.1110,0.223,123.753,audio_features,1w8QCSDH4QobcQeT4uMKLm,spotify:track:1w8QCSDH4QobcQeT4uMKLm,https://api.spotify.com/v1/tracks/1w8QCSDH4Qob...,https://api.spotify.com/v1/audio-analysis/1w8Q...,244285,4
3,7 Element,Vitas,7J9mBHG4J2eIfDAv5BehKA,0.727,0.7850,5,-6.707,0,0.0603,0.3250,...,0.3100,0.960,129.649,audio_features,7J9mBHG4J2eIfDAv5BehKA,spotify:track:7J9mBHG4J2eIfDAv5BehKA,https://api.spotify.com/v1/tracks/7J9mBHG4J2eI...,https://api.spotify.com/v1/audio-analysis/7J9m...,249940,4
4,#9 Dream,R.E.M.,1VZedwJj1gyi88WFRhfThb,0.571,0.7240,0,-5.967,1,0.0260,0.0231,...,0.0919,0.385,116.755,audio_features,1VZedwJj1gyi88WFRhfThb,spotify:track:1VZedwJj1gyi88WFRhfThb,https://api.spotify.com/v1/tracks/1VZedwJj1gyi...,https://api.spotify.com/v1/audio-analysis/1VZe...,278320,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17167,Give Me Just A Little More Time,Chairmen Of The Board,6qbtTps7ZebLrWtqtu1joj,0.583,0.4570,9,-11.166,0,0.0419,0.0635,...,0.0716,0.612,126.555,audio_features,6qbtTps7ZebLrWtqtu1joj,spotify:track:6qbtTps7ZebLrWtqtu1joj,https://api.spotify.com/v1/tracks/6qbtTps7ZebL...,https://api.spotify.com/v1/audio-analysis/6qbt...,160240,4
17168,"Apples, Peaches, Pumpkin Pie",Jay & The Techniques,1m1UTLYGCzEciEdSfpK1Yu,0.637,0.7150,10,-8.670,1,0.0404,0.4240,...,0.1130,0.951,139.500,audio_features,1m1UTLYGCzEciEdSfpK1Yu,spotify:track:1m1UTLYGCzEciEdSfpK1Yu,https://api.spotify.com/v1/tracks/1m1UTLYGCzEc...,https://api.spotify.com/v1/audio-analysis/1m1U...,148493,4
17169,The Boy From New York City,The Ad Libs,3kOrjPYYNLlWp7RJlzyg66,0.474,0.8950,11,-5.105,1,0.0826,0.6880,...,0.1940,0.845,149.192,audio_features,3kOrjPYYNLlWp7RJlzyg66,spotify:track:3kOrjPYYNLlWp7RJlzyg66,https://api.spotify.com/v1/tracks/3kOrjPYYNLlW...,https://api.spotify.com/v1/audio-analysis/3kOr...,181000,4
17170,Intermission,Carpenters,0OFUUZdNXJ6KP8DvhN6WVN,0.340,0.0605,3,-17.755,1,0.0395,0.9750,...,0.1920,0.265,117.927,audio_features,0OFUUZdNXJ6KP8DvhN6WVN,spotify:track:0OFUUZdNXJ6KP8DvhN6WVN,https://api.spotify.com/v1/tracks/0OFUUZdNXJ6K...,https://api.spotify.com/v1/audio-analysis/0OFU...,26960,4
