## Spotify API notebook
This Notebook is used to extract songs and artits information using Python APIs

In [1]:
# libraries import

from dotenv import load_dotenv
import os
import base64
from requests import post,get
import json 
import pandas as pd
import numpy as np

Here we load the keys necessary to access the Spotify API

In [2]:
load_dotenv()
client_id=os.getenv('CLIENT_ID')
client_service=os.getenv('CLIENT_SECRET')

Functions used to access the Spotify API endpoint

In [3]:
def get_token():
    auth_string= client_id+':'+client_service
    auth_bytes= auth_string.encode('utf-8')
    auth_base64= str(base64.b64encode(auth_bytes),'utf-8')

    url='https://accounts.spotify.com/api/token'
    headers={
        'Authorization': 'Basic '+auth_base64,
        'Content-Type': 'application/x-www-form-urlencoded'
    }
    data={'grant_type': 'client_credentials'}
    result= post(url, headers=headers, data=data)
    json_result=json.loads(result.content)
    token=json_result['access_token']
    return token

def get_auth_header(token):
    return {'Authorization': 'Bearer '+token}

This function is used to retrieve information of the given Spotify artist

In [4]:
def search_for_artist(token, artist_name):
    url= 'https://api.spotify.com/v1/search'
    headers= get_auth_header(token)
    query= f'?q={artist_name}&type=artist&limit=1'

    query_url= url+query
    result=get(query_url,headers=headers)
    if result.status_code == 200:
         # Request was successful, proceed with parsing the response
        json_result=json.loads(result.content)['artists']['items']
        if len(json_result)==0:
            print('No artists')
            return None
        return json_result[0]
    else:
        # Handle error cases
        print(f"Error: {result.status_code}")
        return None

This function aims to extract informations of the specified Spotify song

In [5]:
def search_for_song(token,song_id):
    url= f'https://api.spotify.com/v1/tracks/?ids={song_id}'
    headers= get_auth_header(token)
    

    result=get(url,headers=headers)
    if result.status_code == 200:
        # Request was successful, proceed with parsing the response
        json_result = json.loads(result.content)['tracks']
        return json_result
    else:
        # Handle error cases
        print(f"Error: {result.status_code}")
        return None

Data pre-processing of the dataset

In [6]:
token=get_token()
df = pd.read_csv('/Users/gianluca/Downloads/Spotify_Youtube-1.csv',usecols=['Artist','Uri'])
data=df['Artist'].drop_duplicates()
data=pd.DataFrame(data,columns =['Artist'])
data['Followers']=0
data['Genres']=""
data['Popularity']=0

song_uris=df[['Uri']]
song_ids= song_uris['Uri'].apply(lambda x: x.split(':')[-1])
song_ids=pd.DataFrame(song_ids,columns =['Uri'])

album=pd.DataFrame(columns=['Id','Album','Total_tracks','Release_date','Available_market'])


Here we perform the gathering information of the songs present in the dataset. 

Since the API has a time limit requests, we subdivide the songs into batches of ~40 songs, in order to perform less request to the Spotify endpoint.

In [7]:
# 500 is the number of batches
df_batches = np.array_split(song_ids, 500)
request=""
index=0
for i, batch in enumerate(df_batches):
    print(f"start processing batch {i+1} \n size:{len(batch)}")
    for j,album_info in batch.iterrows():
        request+=album_info[0]+','
    request=request[:-1]
    res=search_for_song(token,request)
    res=pd.DataFrame(res)
    for info in res.iterrows():
        info=info[1]
        album.at[index,'Id']=info[0]['uri']
        album.at[index,'Album']=info[0]['name'] #track name
        album.at[index,'Total_tracks']=info[0]['total_tracks']
        album.at[index,'Release_date']=info[0]['release_date']
        album.at[index,"Available_market"]=info['available_markets']
        index+=1

    print(f"end processing batch {i+1} \n")
    request=""

start processing batch 1 
 size:42
end processing batch 1 

start processing batch 2 
 size:42
end processing batch 2 

start processing batch 3 
 size:42
end processing batch 3 

start processing batch 4 
 size:42
end processing batch 4 

start processing batch 5 
 size:42
end processing batch 5 

start processing batch 6 
 size:42
end processing batch 6 

start processing batch 7 
 size:42
end processing batch 7 

start processing batch 8 
 size:42
end processing batch 8 

start processing batch 9 
 size:42
end processing batch 9 

start processing batch 10 
 size:42
end processing batch 10 

start processing batch 11 
 size:42
end processing batch 11 

start processing batch 12 
 size:42
end processing batch 12 

start processing batch 13 
 size:42
end processing batch 13 

start processing batch 14 
 size:42
end processing batch 14 

start processing batch 15 
 size:42
end processing batch 15 

start processing batch 16 
 size:42
end processing batch 16 

start processing batch 17 

In this section, the gathering of artists info information is performed

In [8]:
for index, artist in data.iterrows():
    info=search_for_artist(token,artist['Artist'])
    data.at[index,'Followers']=info['followers']['total']
    data.at[index,'Genres']=info['genres']
    data.at[index,'Popularity']=info['popularity']
    

Gorillaz
Red Hot Chili Peppers
50 Cent
Metallica
Coldplay
Daft Punk
Linkin Park
Radiohead
AC/DC
Black Eyed Peas
Michael Jackson
P!nk
Eminem
Pharrell Williams
Khalid
Shakira
Machine Gun Kelly
Nicky Jam
The Beatles
Pitbull
Sean Paul
Elvis Presley
2Pac
Snoop Dogg
Lil Wayne
Sia
Fleetwood Mac
ABBA
Frank Sinatra
Tiësto
Elton John
JAY-Z
21 Savage
Kanye West
Luis Miguel
Beyoncé
Daddy Yankee
Don Toliver
Camilo
Britney Spears
Farruko
Wisin & Yandel
Disney
Chris Brown
Tory Lanez
Mariah Carey
A.R. Rahman
Die drei ???
Pritam
Shreya Ghoshal
Pink Floyd
System Of A Down
Green Day
Ludovico Einaudi
Muse
Aerosmith
Slipknot
Oasis
The Rolling Stones
Hans Zimmer
Deftones
blink-182
Bob Marley & The Wailers
David Bowie
Bon Jovi
Dr. Dre
Creedence Clearwater Revival
U2
Nate Dogg
Juanes
Maná
Bee Gees
Billy Joel
The Notorious B.I.G.
Enrique Iglesias
Timbaland
Alejandro Sanz
Whitney Houston
Madonna
Alicia Keys
John Mayer
Christina Aguilera
Marc Anthony
Ricardo Arjona
Stevie Wonder
Ludacris
Plan B
Aventura
Usher
Vi

Here are specified where the output files are saved

In [9]:
data.reset_index(drop=True, inplace=True)
print(data)
data.to_csv('../data/Artist_info.csv',index=False,encoding='utf-8')
album.drop_duplicates()
album.to_csv('../data/Album_info.csv',index=False,encoding='utf-8')

                     Artist  Followers  \
0                  Gorillaz   11029273   
1     Red Hot Chili Peppers   20336411   
2                   50 Cent   12956918   
3                 Metallica   26352973   
4                  Coldplay   48226976   
...                     ...        ...   
2074         Grupo Frontera    2355149   
2075              Jung Kook   11298008   
2076            LE SSERAFIM    3536979   
2077               ThxSoMch     308065   
2078            SICK LEGEND     179969   

                                                 Genres  Popularity  
0              [alternative hip hop, modern rock, rock]          76  
1     [alternative rock, funk metal, funk rock, perm...          81  
2     [east coast hip hop, gangster rap, hip hop, po...          80  
3     [hard rock, metal, old school thrash, rock, th...          79  
4                                 [permanent wave, pop]          85  
...                                                 ...         ...  
2074 

TypeError: unhashable type: 'list'