# Lab | Extending the internal databases with audio features

At this point, you have the **hot_songs** and the **not_hot_songs** databases. However, you don't have any acoustic information about the songs. 
The purpose of this lab is to use Spotify's API to extend both databases with this information to use it later.

# Instructions

* Create a function to search a given **single** song in the Spotify API: **search_song(title, artist, limit)**. 

Later, you can will use this function in the song recommender to get the audio features of each song in the database (considering only the first match, even though it might not be the best match because your time is limited and you can spend time determining the best match for each song). Keep in mind, that a given song might not be available on Spotify's API (make sure to use the song's title and artist searching the song). If the song is not found, the function must return an empty string as the href/id/uri. Also, in this case, you should remove this song from the database. You should consider using a try: except: clause like:

```python
def search_song(title, artist, limit):
  ...

list_of_ids = []
try:
  id = search_song(title, artist, limit)
  list_of_ids.append(id)
except:
  print("Song not found!")
  list_of_ids.append("")

df["id"] = list_of_ids

# Code to remove songs without IDs from the databases.
```

On the other hand, you can also use this function in the song recommender to search for the **user's song ID** in the Spotify API. However, this time you want to make sure that you get the right match. Therefore, you would like to create dataframe with a list of five matches, present them to the user, and let him select the right one like:

|   | Title | Artist |
|---|--------|-------|
| 0 | Giorgia on My Mind | Carmichaels |
| 1 | Giorgia on My Mind | Ray Charles |

Once the desired song is located, **the function should return the href/id/uri of the song to the code** (not to the user) to get the audio features.

* Create a function **get_audio_features(list_of_song_ids)** to obtain the audio features of a given list of songs (the content of list_of_songs can be the href/id/uri or a list with a single song IDs). 

Be careful to not exceed the number of calls to the API otherwise, you will be banned and you will have to wait several hours before launching a new request [see here](https://developer.spotify.com/documentation/web-api/guides/rate-limits/).

A good strategy to prevent this problem is to split the list of song IDs into "chunks" of 50 song IDs and wait 20 seconds before asking for the audio features of the next "chunk" (for your own peace of mind add a "print("Collecting IDs for chunk...") message to show the progress). To create chunks of song IDs consider using [np.split](https://numpy.org/doc/stable/reference/generated/numpy.split.html)

Then, use this function to create a Pandas Dataframe with the audio features of all the songs in the databases. Hint: create a dictionary with the song's audio features as keys and an **empty list as values**. Then, fill in the lists with the corresponding audio features of each song. Finally, create a data frame with the audio features from the dictionary.

* Once the previous function has been created, create another function **add_audio_features(df, audio_features_df)** to concat a given dataframe with the audio features dataframe and return the extended data frame.

* Finally, replace the old internal files of songs (hot and not hot) with the extended data frames with the audio features and save them into separate files on the disk.

* Remember to store your functions inside a "functions.py" library in order to be used by your final song recommender.


In [7]:
# import libraries
import requests
import pandas as pd
import numpy as np 

In [8]:
# import not_hot_songs and hot_songs df:

not_hot_songs = pd.read_csv('./not_hot_song.csv')
display(not_hot_songs.head())

hot_songs = pd.read_csv('./billboard_top_100.csv')
display(hot_songs.head())


Unnamed: 0,artist_name,track_name
0,glenn miller,the little man who wasn't there
1,misfits,american psycho
2,elliott smith,somebody that i used to know
3,june carter cash,juke box blues
4,"emerson, lake & palmer","karn evil 9 1st impression, pt. 1"


Unnamed: 0,Song_title,Artist
0,Rockin' Around The Christmas Tree,Brenda Lee
1,All I Want For Christmas Is You,Mariah Carey
2,Jingle Bell Rock,Bobby Helms
3,Last Christmas,Wham!
4,A Holly Jolly Christmas,Burl Ives


In [413]:
# merge not_hot_song_3000 and hot_songs to find overlapping songs
merge = pd.merge(not_hot_songs,hot_songs,how='left', left_on='track_name', right_on='Song_title')
merge

Unnamed: 0,artist_name,track_name,Song_title,Artist
0,glenn miller,the little man who wasn't there,,
1,misfits,american psycho,,
2,elliott smith,somebody that i used to know,,
3,june carter cash,juke box blues,,
4,"emerson, lake & palmer","karn evil 9 1st impression, pt. 1",,
...,...,...,...,...
2995,guns n' roses,reckless life,,
2996,t. rex,pain and love,,
2997,greta van fleet,brave new world,,
2998,g. love & special sauce,i-76,,


In [414]:
merge.isnull().sum() # there is no overlapping song between not_hot_song_3000 and hot_songs

artist_name       0
track_name        0
Song_title     3000
Artist         3000
dtype: int64

### START: BUILD FUNCTIONS FOR SONG RECOMMENDER


In [10]:
# import path and credentials config.py file

import sys
sys.path.append('/Users/minhnguyen/IronHack2023-2024/Bootcamp/')
from config_2 import *

# import libraries
import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials
from time import sleep

In [11]:
#Initialize SpotiPy with user credentias #
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=Client_ID, client_secret=Client_Secret))

In [1]:
%%writefile get_feature.py

#import libraries
import requests
import pandas as pd
import numpy as np 

import sys
sys.path.append('/Users/minhnguyen/IronHack2023-2024/Bootcamp/')
from config_2 import *

import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials
from time import sleep

#Initialize SpotiPy with user credentias #
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=Client_ID, client_secret=Client_Secret))


# BLOCK A: FUNCTION TO SEARCH SPOTIFY ID FOR A SONG WITH SONG TITLE AND ARTIST
# function search song:
# results = sp.search(q="track:'+Great Gatsby+' artist:'+Rod Wave'", limit=1)
def search_song(title:str, artist:str = None, limit:int = 1) ->str:
    """
    Searches for a song on Spotify based on the given title and optional artist.
    
    Parameters:
    - title (str): The title of the song to search for.
    - artist (str, optional): The artist of the song. If provided, the search is refined
      to match both the title and artist.
    - limit (int, optional): The maximum number of search results to retrieve. Default is 1.

    Returns:
    - str: The Spotify ID of the first matching song found in the search results.

    Note:
    - The function uses the Spotify API to perform the search.
    - If no match is found, an IndexError may occur. It is advisable to handle such cases
      when using this function.
    """
    if artist == None:
        result=sp.search(q=f"track:{title}", limit=limit)
        song_id = result['tracks']['items'][0]['id']
    else:
        result=sp.search(q=f"track:{title} artist:{artist}", limit=limit)
        song_id = result['tracks']['items'][0]['id']
    return song_id

# BLOCK B: FUNCTION TO SPLIT A LIST OR A DATAFRAME INTO SUBSETS OF ~50 ITEMS
# split list of song ids:
def chunks (song_ids, n:int =50)-> list:
    """
    Divides a sequence of song IDs into chunks of a specified size.

    Parameters:
    - song_ids (list or pandas.DataFrame): The sequence of song IDs to be divided into chunks.
      It can be either a list or a pandas DataFrame.
    - n (int, optional): The desired size of each chunk. Default is 50.

    Returns:
    - list: A list containing chunks of song IDs, where each chunk has a maximum size of 'n'.

    Note:
    - If 'song_ids' is a list, the chunks are created using list slicing.
    - If 'song_ids' is a pandas DataFrame, the chunks are created using DataFrame row slicing.
    - If 'song_ids' is smaller than 'n', a single chunk containing all elements is returned.
    """
    if len(song_ids) > n:
        if type(song_ids) == list:
            chunks = [song_ids[x:x+n] for x in range(0, len(song_ids), n)]
            return chunks
        elif type(song_ids) == pd.DataFrame:
            chunks = [song_ids.iloc[x:x+n,] for x in range(0, len(song_ids), n)]
            return chunks
        else:
            pass
        
    else:
        chunks = [song_ids]
        return chunks


# BLOCK C: FUNCTION TO GET THE LIST OF SONG IDS
# getting list of spotify song ids 
def get_list_song_ids(df,col_1:str="Song_title", col_2:str="Artist" ):
    """
    Collects Spotify song IDs for a DataFrame containing song titles and optional artist information.

    Parameters:
    - df (pandas.DataFrame): The DataFrame containing song information, including titles and artists.
    - col_1 (str, optional): The column name for song titles. Default is "Song_title".
    - col_2 (str, optional): The column name for artist information. Default is "Artist".

    Returns:
    - tuple: A tuple containing two elements:
        - list: A list of Spotify song IDs collected for the provided DataFrame. Note that None
                may be present in the list for songs that were not found.
        - pandas.DataFrame: A cleaned DataFrame with added 'song_id' column.

    Note:
    - The function uses the 'search_song' function to retrieve Spotify song IDs.
    - It divides the DataFrame into chunks, performs Spotify searches for each chunk, and
      includes a sleep interval to avoid exceeding rate limits.
    - The resulting 'song_ids' list contains only non-None values (successfully found song IDs).
    - The 'clean_hot_song' DataFrame is the original DataFrame with rows containing
      None in the 'song_id' column dropped.
    """
    df2 = df.copy()
    hot_songs_dfs = chunks(df2)
    song_ids = []    
    for index, hot_songs_df in enumerate(hot_songs_dfs):        
        print(f'Collecting spotify song_id for chunk {index}')        
        for i,row in hot_songs_df.iterrows():        
            try:
                # search for spotify id
                song_id = search_song (row[col_1], row[col_2])
                song_ids.append(song_id)                
            except:
                print("Song not found")
                song_ids.append(None)                
        print("sleep a bit before getting the next chunk")  
        sleep(30)
    df2['song_id'] = song_ids
    song_ids_final = [value for value in song_ids if value is not None]
    clean_hot_song = df2.dropna()     
    return song_ids_final, clean_hot_song    

### BLOCK D: AUDIO feature: combine BLOCK 1 and BLOCK 2 in one FUNCTION
# getting audio features function from a list of 50 or 100 song ids:
def get_audio_features (list:list):
    """
    Retrieves audio features from Spotify for a list of song IDs.

    Parameters:
    - list (list): A list of Spotify song IDs for which audio features will be retrieved.

    Returns:
    - tuple: A tuple containing two elements:
        - dict: A dictionary containing audio features for each song ID. Keys include:
                'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
                'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type',
                'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature'.
        - pandas.DataFrame: A DataFrame representing the same audio features in a tabular format.

    Note:
    - The function uses the Spotify API to retrieve audio features for each song ID in the provided list.
    - It divides the list into chunks to avoid exceeding rate limits.
    - The resulting 'audio_features_dict' contains lists of values for each audio feature.
    - 'audio_features_df' is a DataFrame created from 'audio_features_dict' for tabular representation.
    - Rate limiting is handled, and the function waits between chunks to avoid API restrictions.
    """
    sublists = chunks(list,100)
    audio_features_dict ={'danceability':[], 'energy':[], 'key':[], 'loudness':[], 'mode':[], 'speechiness':[], 'acousticness':[],'instrumentalness':[], 'liveness':[], 'valence':[], 'tempo':[], 'type':[], 'id':[], 'uri':[], 'track_href':[], 'analysis_url':[], 'duration_ms':[], 'time_signature':[]}
    for index,list in enumerate(sublists):
        print(f"Retrieving audio_features from chunk {index}")
        # get audio_features
        try:
            audio_features = sp.audio_features(list)
            for feature in audio_features:
                for key in audio_features_dict:
                    audio_features_dict[key].append(feature[key])
            #audio_features['song_id'] = song_id # add dict item with key 'song_id' and value song_id            
            #audio_features_list.append(audio_features)
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                retry_after = int(e.response.headers.get('Retry-After', 1))
                print(f"Rate limited. Retrying after {retry_after} seconds.")
                sleep(retry_after + 1)
                continue
            else:
                raise
        except Exception as e:
            print(f"Failed to get audio features for some track IDs: {e}")

        print("sleep a bit before getting the next chunk")  
        sleep(60)

    audio_features_df = pd.DataFrame(audio_features_dict)
    return audio_features_dict, audio_features_df

### BLOCK E: FUNCTION TO COMBINE DATAFRAME HOT SONG WITH DATAFRAME OF AUDIO FEATURES
# function to add audio features to song_name, artist df:
def add_audio_features (df1, df2, left_col, right_col, how = 'inner' ):
    """
    Adds audio features from one DataFrame to another based on specified columns.

    Parameters:
    - df1 (pandas.DataFrame): The left DataFrame to which audio features will be added.
    - df2 (pandas.DataFrame): The right DataFrame containing audio features to be added.
    - left_col (str): The column in df1 used for merging.
    - right_col (str): The column in df2 used for merging.

    Returns:
    - pandas.DataFrame: A new DataFrame resulting from the merge of df1 and df2 based on the specified columns.

    Note:
    - The function uses pandas' merge function to combine the two DataFrames.
    - 'left_col' and 'right_col' are used as the merging keys.
    - The resulting 'extended_df' DataFrame contains all columns from both DataFrames.
    """
    extended_df = pd.merge(df1, df2, left_on=left_col, right_on=right_col, how = how)
    return extended_df

Writing get_feature.py


In [12]:
### BLOCK A: FUNCTION TO SEARCH SPOTIFY ID FOR A SONG WITH SONG TITLE AND ARTIST
# function search song:
# results = sp.search(q="track:'+Great Gatsby+' artist:'+Rod Wave'", limit=1)
def search_song(title:str, artist:str = None, limit:int = 1) ->str:
    """
    Searches for a song on Spotify based on the given title and optional artist.
    
    Parameters:
    - title (str): The title of the song to search for.
    - artist (str, optional): The artist of the song. If provided, the search is refined
      to match both the title and artist.
    - limit (int, optional): The maximum number of search results to retrieve. Default is 1.

    Returns:
    - str: The Spotify ID of the first matching song found in the search results.

    Note:
    - The function uses the Spotify API to perform the search.
    - If no match is found, an IndexError may occur. It is advisable to handle such cases
      when using this function.
    """
    if artist == None:
        result=sp.search(q=f"track:{title}", limit=limit)
        song_id = result['tracks']['items'][0]['id']
    else:
        result=sp.search(q=f"track:{title} artist:{artist}", limit=limit)
        song_id = result['tracks']['items'][0]['id']
    return song_id
        

In [13]:
### BLOCK B: FUNCTION TO SPLIT A LIST OR A DATAFRAME INTO SUBSETS OF ~50 ITEMS
# split list of song ids:
def chunks (song_ids, n:int =50)-> list:
    """
    Divides a sequence of song IDs into chunks of a specified size.

    Parameters:
    - song_ids (list or pandas.DataFrame): The sequence of song IDs to be divided into chunks.
      It can be either a list or a pandas DataFrame.
    - n (int, optional): The desired size of each chunk. Default is 50.

    Returns:
    - list: A list containing chunks of song IDs, where each chunk has a maximum size of 'n'.

    Note:
    - If 'song_ids' is a list, the chunks are created using list slicing.
    - If 'song_ids' is a pandas DataFrame, the chunks are created using DataFrame row slicing.
    - If 'song_ids' is smaller than 'n', a single chunk containing all elements is returned.
    """
    if len(song_ids) > n:
        if type(song_ids) == list:
            chunks = [song_ids[x:x+n] for x in range(0, len(song_ids), n)]
            return chunks
        elif type(song_ids) == pd.DataFrame:
            chunks = [song_ids.iloc[x:x+n,] for x in range(0, len(song_ids), n)]
            return chunks
        else:
            pass
        
    else:
        chunks = [song_ids]
        return chunks
    

In [14]:
### BLOCK C: FUNCTION TO GET THE LIST OF SONG IDS
# getting list of spotify song ids 
def get_list_song_ids(df,col_1:str="Song_title", col_2:str="Artist" ):
    """
    Collects Spotify song IDs for a DataFrame containing song titles and optional artist information.

    Parameters:
    - df (pandas.DataFrame): The DataFrame containing song information, including titles and artists.
    - col_1 (str, optional): The column name for song titles. Default is "Song_title".
    - col_2 (str, optional): The column name for artist information. Default is "Artist".

    Returns:
    - tuple: A tuple containing two elements:
        - list: A list of Spotify song IDs collected for the provided DataFrame. Note that None
                may be present in the list for songs that were not found.
        - pandas.DataFrame: A cleaned DataFrame with added 'song_id' column.

    Note:
    - The function uses the 'search_song' function to retrieve Spotify song IDs.
    - It divides the DataFrame into chunks, performs Spotify searches for each chunk, and
      includes a sleep interval to avoid exceeding rate limits.
    - The resulting 'song_ids' list contains only non-None values (successfully found song IDs).
    - The 'clean_hot_song' DataFrame is the original DataFrame with rows containing
      None in the 'song_id' column dropped.
    """
    df2 = df.copy()
    hot_songs_dfs = chunks(df2)
    song_ids = []    
    for index, hot_songs_df in enumerate(hot_songs_dfs):        
        print(f'Collecting spotify song_id for chunk {index}')        
        for i,row in hot_songs_df.iterrows():        
            try:
                # search for spotify id
                song_id = search_song (row[col_1], row[col_2])
                song_ids.append(song_id)                
            except:
                print("Song not found")
                song_ids.append(None)                
        print("sleep a bit before getting the next chunk")  
        sleep(30)
    df2['song_id'] = song_ids
    song_ids_final = [value for value in song_ids if value is not None]
    clean_hot_song = df2.dropna()     
    return song_ids_final, clean_hot_song    

In [15]:
### BLOCK D: AUDIO feature: combine BLOCK 1 and BLOCK 2 in one FUNCTION
# getting audio features function from a list of 50 or 100 song ids:
def get_audio_features (list:list):
    """
    Retrieves audio features from Spotify for a list of song IDs.

    Parameters:
    - list (list): A list of Spotify song IDs for which audio features will be retrieved.

    Returns:
    - tuple: A tuple containing two elements:
        - dict: A dictionary containing audio features for each song ID. Keys include:
                'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
                'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type',
                'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature'.
        - pandas.DataFrame: A DataFrame representing the same audio features in a tabular format.

    Note:
    - The function uses the Spotify API to retrieve audio features for each song ID in the provided list.
    - It divides the list into chunks to avoid exceeding rate limits.
    - The resulting 'audio_features_dict' contains lists of values for each audio feature.
    - 'audio_features_df' is a DataFrame created from 'audio_features_dict' for tabular representation.
    - Rate limiting is handled, and the function waits between chunks to avoid API restrictions.
    """
    sublists = chunks(list,100)
    audio_features_dict ={'danceability':[], 'energy':[], 'key':[], 'loudness':[], 'mode':[], 'speechiness':[], 'acousticness':[],'instrumentalness':[], 'liveness':[], 'valence':[], 'tempo':[], 'type':[], 'id':[], 'uri':[], 'track_href':[], 'analysis_url':[], 'duration_ms':[], 'time_signature':[]}
    for index,list in enumerate(sublists):
        print(f"Retrieving audio_features from chunk {index}")
        # get audio_features
        try:
            audio_features = sp.audio_features(list)
            for feature in audio_features:
                for key in audio_features_dict:
                    audio_features_dict[key].append(feature[key])
            #audio_features['song_id'] = song_id # add dict item with key 'song_id' and value song_id            
            #audio_features_list.append(audio_features)
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                retry_after = int(e.response.headers.get('Retry-After', 1))
                print(f"Rate limited. Retrying after {retry_after} seconds.")
                sleep(retry_after + 1)
                continue
            else:
                raise
        except Exception as e:
            print(f"Failed to get audio features for some track IDs: {e}")

        print("sleep a bit before getting the next chunk")  
        sleep(60)

    audio_features_df = pd.DataFrame(audio_features_dict)
    return audio_features_dict, audio_features_df


In [16]:
### BLOCK E: FUNCTION TO COMBINE DATAFRAME HOT SONG WITH DATAFRAME OF AUDIO FEATURES
# function to add audio features to song_name, artist df:
def add_audio_features (df1, df2, left_col, right_col, how = 'inner' ):
    """
    Adds audio features from one DataFrame to another based on specified columns.

    Parameters:
    - df1 (pandas.DataFrame): The left DataFrame to which audio features will be added.
    - df2 (pandas.DataFrame): The right DataFrame containing audio features to be added.
    - left_col (str): The column in df1 used for merging.
    - right_col (str): The column in df2 used for merging.

    Returns:
    - pandas.DataFrame: A new DataFrame resulting from the merge of df1 and df2 based on the specified columns.

    Note:
    - The function uses pandas' merge function to combine the two DataFrames.
    - 'left_col' and 'right_col' are used as the merging keys.
    - The resulting 'extended_df' DataFrame contains all columns from both DataFrames.
    """
    extended_df = pd.merge(df1, df2, left_on=left_col, right_on=right_col, how = how)
    return extended_df

In [17]:
# getting lists of song IDs and clean dataframes (with columns: song name, artist, song id) for hot_songs and not_hot_songs
hot_song_ids_list, clean_hot_song = get_list_song_ids(hot_songs)
not_hot_song_ids_list, clean_not_hot_song = get_list_song_ids(not_hot_songs, 'track_name', 'artist_name')

Collecting spotify song_id for chunk 0
Song not found
Song not found
Song not found
Song not found
Song not found
sleep a bit before getting the next chunk
Collecting spotify song_id for chunk 1
Song not found
Song not found
Song not found
Song not found
Song not found
Song not found
Song not found
Song not found
Song not found
Song not found
Song not found
sleep a bit before getting the next chunk


In [19]:
# check output clean_hot_song dataframe
clean_hot_song


Unnamed: 0,Song_title,Artist,song_id
0,Rockin' Around The Christmas Tree,Brenda Lee,2EjXfH91m7f8HiJN1yQg97
1,All I Want For Christmas Is You,Mariah Carey,0bYg9bo50gSsH3LtXe2SQn
2,Jingle Bell Rock,Bobby Helms,7vQbuQcyTflfCIOu3Uzzya
3,Last Christmas,Wham!,2FRnf9qhLbvw8fu4IBXx78
4,A Holly Jolly Christmas,Burl Ives,77khP2fIVhSW23NwxrRluh
...,...,...,...
93,Feather,Sabrina Carpenter,2Zo1PcszsT9WQ0ANntJbID
94,Can't Catch Me Now,Olivia Rodrigo,56xHMIfQPoe0prrSi3BGhf
95,El Amor de Su Vida,Grupo Frontera & Grupo Firme,0O3U5iwTbiXCREMkvotJuN
96,Standing Next To You,Jung Kook,2KslE17cAJNHTsI2MI0jb2


In [None]:
# saving outputs: list of ids and dataframe for not_hot_song and hot_song

import pickle

with open('not_hot_list.pkl', 'wb') as file:
    pickle.dump(not_hot_song_ids_list,file)

with open('hot_list.pkl', 'wb') as file:
    pickle.dump(hot_song_ids_list,file)

clean_hot_song.to_csv('clean_hot_song.csv')
clean_not_hot_song.to_csv('clean_not_hot_song.csv')


In [424]:
# check how many song ids retrieved for the hot_song and not_hot_song
display(len(not_hot_song_ids_list))
len(hot_song_ids_list)

2823

84

In [426]:
# getting audio features list and dataframes for hot_songs and not_hot_songs
audio_features_list_hot, audio_features_df_hot = get_audio_features(hot_song_ids_list)
audio_features_list_not_hot, audio_features_df_not_hot = get_audio_features(not_hot_song_ids_list)

Retrieving audio_features from chunk 0
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 0
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 1
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 2
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 3
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 4
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 5
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 6
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 7
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 8
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 9
sleep a bit before getting the next chunk
Retrieving audio_features from chunk 10
sleep a bit before getting the next chunk
Retrieving audio_features f

In [428]:
# there is one fail during getting audio features due to connection aborted, therefore I check for the audio_features_df_not_hot, how it looks and the length of it
# shape is 2723 rows x 18 columns -> number of row matches with 1 fail due to disrupted connection (as chunk size is 100)
audio_features_df_not_hot

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.734,0.422,10,-10.772,1,0.1750,0.868000,0.037100,0.1140,0.5770,88.387,audio_features,3ZDS2we70XCOEndkITg8Ft,spotify:track:3ZDS2we70XCOEndkITg8Ft,https://api.spotify.com/v1/tracks/3ZDS2we70XCO...,https://api.spotify.com/v1/audio-analysis/3ZDS...,182827,4
1,0.484,0.995,7,-5.300,1,0.1020,0.000248,0.001130,0.3810,0.0771,109.556,audio_features,77v5yDwMLiJlIsHNJQxXrY,spotify:track:77v5yDwMLiJlIsHNJQxXrY,https://api.spotify.com/v1/tracks/77v5yDwMLiJl...,https://api.spotify.com/v1/audio-analysis/77v5...,126907,4
2,0.576,0.367,1,-14.785,1,0.0300,0.432000,0.000002,0.1240,0.7700,108.446,audio_features,4xfAVJL8R7mVYbDk8a9xOY,spotify:track:4xfAVJL8R7mVYbDk8a9xOY,https://api.spotify.com/v1/tracks/4xfAVJL8R7mV...,https://api.spotify.com/v1/audio-analysis/4xfA...,129333,4
3,0.854,0.356,6,-8.780,1,0.0609,0.863000,0.000675,0.1310,0.8450,113.283,audio_features,5x46pk7OZUv1WWiP6iEdMy,spotify:track:5x46pk7OZUv1WWiP6iEdMy,https://api.spotify.com/v1/tracks/5x46pk7OZUv1...,https://api.spotify.com/v1/audio-analysis/5x46...,138600,4
4,0.384,0.817,1,-9.079,1,0.0661,0.267000,0.001910,0.0756,0.6850,128.490,audio_features,6WfjV745TiUcD8KdUKp241,spotify:track:6WfjV745TiUcD8KdUKp241,https://api.spotify.com/v1/tracks/6WfjV745TiUc...,https://api.spotify.com/v1/audio-analysis/6Wfj...,285760,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2718,0.442,0.888,10,-8.820,0,0.0663,0.000054,0.853000,0.9330,0.4620,108.024,audio_features,7I2Q7AQqVInZYwOoDHZGOf,spotify:track:7I2Q7AQqVInZYwOoDHZGOf,https://api.spotify.com/v1/tracks/7I2Q7AQqVInZ...,https://api.spotify.com/v1/audio-analysis/7I2Q...,200600,4
2719,0.588,0.629,0,-10.226,0,0.0318,0.055500,0.000000,0.3110,0.7630,136.091,audio_features,5Qe7x9x8mfl2YcbYTP5yRM,spotify:track:5Qe7x9x8mfl2YcbYTP5yRM,https://api.spotify.com/v1/tracks/5Qe7x9x8mfl2...,https://api.spotify.com/v1/audio-analysis/5Qe7...,221440,4
2720,0.341,0.848,7,-4.475,1,0.0371,0.058300,0.129000,0.1360,0.5890,136.568,audio_features,0HwdiT4MVRzY0OALcjq0Oh,spotify:track:0HwdiT4MVRzY0OALcjq0Oh,https://api.spotify.com/v1/tracks/0HwdiT4MVRzY...,https://api.spotify.com/v1/audio-analysis/0Hwd...,300880,4
2721,0.643,0.713,11,-6.716,0,0.2150,0.138000,0.000000,0.5650,0.4920,103.942,audio_features,40c4Zo0r2HH9aKDgEsbsVA,spotify:track:40c4Zo0r2HH9aKDgEsbsVA,https://api.spotify.com/v1/tracks/40c4Zo0r2HH9...,https://api.spotify.com/v1/audio-analysis/40c4...,225640,4


In [429]:
# getting extended dataframes for hot_songs and not_hot_songs
extended_df_hot = add_audio_features(clean_hot_song,audio_features_df_hot, 'song_id', 'id' )
extended_df_not_hot = add_audio_features(clean_not_hot_song,audio_features_df_not_hot, 'song_id', 'id' )

In [437]:
audio_features_df_not_hot

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.734,0.422,10,-10.772,1,0.1750,0.868000,0.037100,0.1140,0.5770,88.387,audio_features,3ZDS2we70XCOEndkITg8Ft,spotify:track:3ZDS2we70XCOEndkITg8Ft,https://api.spotify.com/v1/tracks/3ZDS2we70XCO...,https://api.spotify.com/v1/audio-analysis/3ZDS...,182827,4
1,0.484,0.995,7,-5.300,1,0.1020,0.000248,0.001130,0.3810,0.0771,109.556,audio_features,77v5yDwMLiJlIsHNJQxXrY,spotify:track:77v5yDwMLiJlIsHNJQxXrY,https://api.spotify.com/v1/tracks/77v5yDwMLiJl...,https://api.spotify.com/v1/audio-analysis/77v5...,126907,4
2,0.576,0.367,1,-14.785,1,0.0300,0.432000,0.000002,0.1240,0.7700,108.446,audio_features,4xfAVJL8R7mVYbDk8a9xOY,spotify:track:4xfAVJL8R7mVYbDk8a9xOY,https://api.spotify.com/v1/tracks/4xfAVJL8R7mV...,https://api.spotify.com/v1/audio-analysis/4xfA...,129333,4
3,0.854,0.356,6,-8.780,1,0.0609,0.863000,0.000675,0.1310,0.8450,113.283,audio_features,5x46pk7OZUv1WWiP6iEdMy,spotify:track:5x46pk7OZUv1WWiP6iEdMy,https://api.spotify.com/v1/tracks/5x46pk7OZUv1...,https://api.spotify.com/v1/audio-analysis/5x46...,138600,4
4,0.384,0.817,1,-9.079,1,0.0661,0.267000,0.001910,0.0756,0.6850,128.490,audio_features,6WfjV745TiUcD8KdUKp241,spotify:track:6WfjV745TiUcD8KdUKp241,https://api.spotify.com/v1/tracks/6WfjV745TiUc...,https://api.spotify.com/v1/audio-analysis/6Wfj...,285760,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2718,0.442,0.888,10,-8.820,0,0.0663,0.000054,0.853000,0.9330,0.4620,108.024,audio_features,7I2Q7AQqVInZYwOoDHZGOf,spotify:track:7I2Q7AQqVInZYwOoDHZGOf,https://api.spotify.com/v1/tracks/7I2Q7AQqVInZ...,https://api.spotify.com/v1/audio-analysis/7I2Q...,200600,4
2719,0.588,0.629,0,-10.226,0,0.0318,0.055500,0.000000,0.3110,0.7630,136.091,audio_features,5Qe7x9x8mfl2YcbYTP5yRM,spotify:track:5Qe7x9x8mfl2YcbYTP5yRM,https://api.spotify.com/v1/tracks/5Qe7x9x8mfl2...,https://api.spotify.com/v1/audio-analysis/5Qe7...,221440,4
2720,0.341,0.848,7,-4.475,1,0.0371,0.058300,0.129000,0.1360,0.5890,136.568,audio_features,0HwdiT4MVRzY0OALcjq0Oh,spotify:track:0HwdiT4MVRzY0OALcjq0Oh,https://api.spotify.com/v1/tracks/0HwdiT4MVRzY...,https://api.spotify.com/v1/audio-analysis/0Hwd...,300880,4
2721,0.643,0.713,11,-6.716,0,0.2150,0.138000,0.000000,0.5650,0.4920,103.942,audio_features,40c4Zo0r2HH9aKDgEsbsVA,spotify:track:40c4Zo0r2HH9aKDgEsbsVA,https://api.spotify.com/v1/tracks/40c4Zo0r2HH9...,https://api.spotify.com/v1/audio-analysis/40c4...,225640,4


In [452]:
# check dataframe extended_df_hot
extended_df_hot

Unnamed: 0,Song_title,Artist,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Rockin' Around The Christmas Tree,Brenda Lee,2EjXfH91m7f8HiJN1yQg97,0.589,0.472,8,-8.749,1,0.0502,0.6140,...,0.5050,0.898,67.196,audio_features,2EjXfH91m7f8HiJN1yQg97,spotify:track:2EjXfH91m7f8HiJN1yQg97,https://api.spotify.com/v1/tracks/2EjXfH91m7f8...,https://api.spotify.com/v1/audio-analysis/2EjX...,126267,4
1,All I Want For Christmas Is You,Mariah Carey,0bYg9bo50gSsH3LtXe2SQn,0.336,0.627,7,-7.463,1,0.0384,0.1640,...,0.0708,0.350,150.273,audio_features,0bYg9bo50gSsH3LtXe2SQn,spotify:track:0bYg9bo50gSsH3LtXe2SQn,https://api.spotify.com/v1/tracks/0bYg9bo50gSs...,https://api.spotify.com/v1/audio-analysis/0bYg...,241107,4
2,Jingle Bell Rock,Bobby Helms,7vQbuQcyTflfCIOu3Uzzya,0.754,0.424,2,-8.463,1,0.0363,0.6430,...,0.0652,0.806,119.705,audio_features,7vQbuQcyTflfCIOu3Uzzya,spotify:track:7vQbuQcyTflfCIOu3Uzzya,https://api.spotify.com/v1/tracks/7vQbuQcyTflf...,https://api.spotify.com/v1/audio-analysis/7vQb...,130973,4
3,Last Christmas,Wham!,2FRnf9qhLbvw8fu4IBXx78,0.735,0.478,2,-12.472,1,0.0293,0.1890,...,0.3550,0.947,107.682,audio_features,2FRnf9qhLbvw8fu4IBXx78,spotify:track:2FRnf9qhLbvw8fu4IBXx78,https://api.spotify.com/v1/tracks/2FRnf9qhLbvw...,https://api.spotify.com/v1/audio-analysis/2FRn...,262960,4
4,A Holly Jolly Christmas,Burl Ives,77khP2fIVhSW23NwxrRluh,0.683,0.375,0,-13.056,1,0.0303,0.5790,...,0.0760,0.888,140.467,audio_features,77khP2fIVhSW23NwxrRluh,spotify:track:77khP2fIVhSW23NwxrRluh,https://api.spotify.com/v1/tracks/77khP2fIVhSW...,https://api.spotify.com/v1/audio-analysis/77kh...,135533,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,Feather,Sabrina Carpenter,2Zo1PcszsT9WQ0ANntJbID,0.787,0.686,6,-4.370,0,0.0339,0.0893,...,0.0927,0.836,123.510,audio_features,2Zo1PcszsT9WQ0ANntJbID,spotify:track:2Zo1PcszsT9WQ0ANntJbID,https://api.spotify.com/v1/tracks/2Zo1PcszsT9W...,https://api.spotify.com/v1/audio-analysis/2Zo1...,185553,4
80,Can't Catch Me Now,Olivia Rodrigo,56xHMIfQPoe0prrSi3BGhf,0.521,0.366,11,-7.698,0,0.0340,0.8380,...,0.1200,0.225,141.965,audio_features,56xHMIfQPoe0prrSi3BGhf,spotify:track:56xHMIfQPoe0prrSi3BGhf,https://api.spotify.com/v1/tracks/56xHMIfQPoe0...,https://api.spotify.com/v1/audio-analysis/56xH...,205484,4
81,El Amor de Su Vida,Grupo Frontera & Grupo Firme,0O3U5iwTbiXCREMkvotJuN,0.616,0.834,9,-3.069,1,0.0664,0.0600,...,0.3380,0.746,151.701,audio_features,0O3U5iwTbiXCREMkvotJuN,spotify:track:0O3U5iwTbiXCREMkvotJuN,https://api.spotify.com/v1/tracks/0O3U5iwTbiXC...,https://api.spotify.com/v1/audio-analysis/0O3U...,165848,4
82,Standing Next To You,Jung Kook,2KslE17cAJNHTsI2MI0jb2,0.711,0.809,2,-4.389,0,0.0955,0.0447,...,0.3390,0.816,106.017,audio_features,2KslE17cAJNHTsI2MI0jb2,spotify:track:2KslE17cAJNHTsI2MI0jb2,https://api.spotify.com/v1/tracks/2KslE17cAJNH...,https://api.spotify.com/v1/audio-analysis/2Ksl...,206020,4


In [436]:
# check dataframe extended_df_not_hot
extended_df_not_hot

Unnamed: 0,artist_name,track_name,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,glenn miller,the little man who wasn't there,3ZDS2we70XCOEndkITg8Ft,0.734,0.422,10.0,-10.772,1.0,0.1750,0.868000,...,0.1140,0.5770,88.387,audio_features,3ZDS2we70XCOEndkITg8Ft,spotify:track:3ZDS2we70XCOEndkITg8Ft,https://api.spotify.com/v1/tracks/3ZDS2we70XCO...,https://api.spotify.com/v1/audio-analysis/3ZDS...,182827.0,4.0
1,misfits,american psycho,77v5yDwMLiJlIsHNJQxXrY,0.484,0.995,7.0,-5.300,1.0,0.1020,0.000248,...,0.3810,0.0771,109.556,audio_features,77v5yDwMLiJlIsHNJQxXrY,spotify:track:77v5yDwMLiJlIsHNJQxXrY,https://api.spotify.com/v1/tracks/77v5yDwMLiJl...,https://api.spotify.com/v1/audio-analysis/77v5...,126907.0,4.0
2,elliott smith,somebody that i used to know,4xfAVJL8R7mVYbDk8a9xOY,0.576,0.367,1.0,-14.785,1.0,0.0300,0.432000,...,0.1240,0.7700,108.446,audio_features,4xfAVJL8R7mVYbDk8a9xOY,spotify:track:4xfAVJL8R7mVYbDk8a9xOY,https://api.spotify.com/v1/tracks/4xfAVJL8R7mV...,https://api.spotify.com/v1/audio-analysis/4xfA...,129333.0,4.0
3,june carter cash,juke box blues,5x46pk7OZUv1WWiP6iEdMy,0.854,0.356,6.0,-8.780,1.0,0.0609,0.863000,...,0.1310,0.8450,113.283,audio_features,5x46pk7OZUv1WWiP6iEdMy,spotify:track:5x46pk7OZUv1WWiP6iEdMy,https://api.spotify.com/v1/tracks/5x46pk7OZUv1...,https://api.spotify.com/v1/audio-analysis/5x46...,138600.0,4.0
4,"emerson, lake & palmer","karn evil 9 1st impression, pt. 1",6WfjV745TiUcD8KdUKp241,0.384,0.817,1.0,-9.079,1.0,0.0661,0.267000,...,0.0756,0.6850,128.490,audio_features,6WfjV745TiUcD8KdUKp241,spotify:track:6WfjV745TiUcD8KdUKp241,https://api.spotify.com/v1/tracks/6WfjV745TiUc...,https://api.spotify.com/v1/audio-analysis/6Wfj...,285760.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2822,guns n' roses,reckless life,7I2Q7AQqVInZYwOoDHZGOf,0.442,0.888,10.0,-8.820,0.0,0.0663,0.000054,...,0.9330,0.4620,108.024,audio_features,7I2Q7AQqVInZYwOoDHZGOf,spotify:track:7I2Q7AQqVInZYwOoDHZGOf,https://api.spotify.com/v1/tracks/7I2Q7AQqVInZ...,https://api.spotify.com/v1/audio-analysis/7I2Q...,200600.0,4.0
2823,t. rex,pain and love,5Qe7x9x8mfl2YcbYTP5yRM,0.588,0.629,0.0,-10.226,0.0,0.0318,0.055500,...,0.3110,0.7630,136.091,audio_features,5Qe7x9x8mfl2YcbYTP5yRM,spotify:track:5Qe7x9x8mfl2YcbYTP5yRM,https://api.spotify.com/v1/tracks/5Qe7x9x8mfl2...,https://api.spotify.com/v1/audio-analysis/5Qe7...,221440.0,4.0
2824,greta van fleet,brave new world,0HwdiT4MVRzY0OALcjq0Oh,0.341,0.848,7.0,-4.475,1.0,0.0371,0.058300,...,0.1360,0.5890,136.568,audio_features,0HwdiT4MVRzY0OALcjq0Oh,spotify:track:0HwdiT4MVRzY0OALcjq0Oh,https://api.spotify.com/v1/tracks/0HwdiT4MVRzY...,https://api.spotify.com/v1/audio-analysis/0Hwd...,300880.0,4.0
2825,g. love & special sauce,i-76,40c4Zo0r2HH9aKDgEsbsVA,0.643,0.713,11.0,-6.716,0.0,0.2150,0.138000,...,0.5650,0.4920,103.942,audio_features,40c4Zo0r2HH9aKDgEsbsVA,spotify:track:40c4Zo0r2HH9aKDgEsbsVA,https://api.spotify.com/v1/tracks/40c4Zo0r2HH9...,https://api.spotify.com/v1/audio-analysis/40c4...,225640.0,4.0


In [None]:
# saving dataframes hot_songs and not_hot_songs with audio features
extended_df_hot.to_csv('extended_hot_song.csv')
extended_df_not_hot.to_csv('extended_not_hot_song.csv')

# BELOW ARE EXPLORATORY CODES

In [None]:
# test search spotify
results = sp.search(q="track:'+Great Gatsby+' artist:'+Rod Wave'", limit=1)
results
#type(results)

In [None]:
results.keys()

In [None]:
results['tracks']['items'][0]['id']

In [None]:
audio = sp.audio_features("4M68xjcc42oxyphhzpOWXS")[0]
#data = [audio[0], audio[0]]
#pd.DataFrame(audio)
#test = audio[0]
#test['song_name'] = 'test'
#test
type(audio)
audio['test']='minh'
audio
data = [audio, audio]
data
pd.DataFrame(data)

In [325]:
audio

[{'danceability': 0.67,
  'energy': 0.555,
  'key': 1,
  'loudness': -9.12,
  'mode': 0,
  'speechiness': 0.0908,
  'acousticness': 0.0382,
  'instrumentalness': 6.36e-06,
  'liveness': 0.414,
  'valence': 0.16,
  'tempo': 156.937,
  'type': 'audio_features',
  'id': '4M68xjcc42oxyphhzpOWXS',
  'uri': 'spotify:track:4M68xjcc42oxyphhzpOWXS',
  'track_href': 'https://api.spotify.com/v1/tracks/4M68xjcc42oxyphhzpOWXS',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/4M68xjcc42oxyphhzpOWXS',
  'duration_ms': 146808,
  'time_signature': 4},
 {'danceability': 0.529,
  'energy': 0.851,
  'key': 6,
  'loudness': -7.005,
  'mode': 1,
  'speechiness': 0.0315,
  'acousticness': 0.000241,
  'instrumentalness': 0.000256,
  'liveness': 0.114,
  'valence': 0.534,
  'tempo': 84.015,
  'type': 'audio_features',
  'id': '3CqdEcYe3xWopkzxQxiyIK',
  'uri': 'spotify:track:3CqdEcYe3xWopkzxQxiyIK',
  'track_href': 'https://api.spotify.com/v1/tracks/3CqdEcYe3xWopkzxQxiyIK',
  'analysis_url': 'http

In [337]:

        
       
l

{'danceability': [0.67, 0.529],
 'energy': [0.555, 0.851],
 'key': [1, 6],
 'loudness': [-9.12, -7.005],
 'mode': [0, 1],
 'speechiness': [0.0908, 0.0315],
 'acousticness': [0.0382, 0.000241],
 'instrumentalness': [6.36e-06, 0.000256],
 'liveness': [0.414, 0.114],
 'valence': [0.16, 0.534],
 'tempo': [156.937, 84.015],
 'type': ['audio_features', 'audio_features'],
 'id': ['4M68xjcc42oxyphhzpOWXS', '3CqdEcYe3xWopkzxQxiyIK'],
 'uri': ['spotify:track:4M68xjcc42oxyphhzpOWXS',
  'spotify:track:3CqdEcYe3xWopkzxQxiyIK'],
 'track_href': ['https://api.spotify.com/v1/tracks/4M68xjcc42oxyphhzpOWXS',
  'https://api.spotify.com/v1/tracks/3CqdEcYe3xWopkzxQxiyIK'],
 'analysis_url': ['https://api.spotify.com/v1/audio-analysis/4M68xjcc42oxyphhzpOWXS',
  'https://api.spotify.com/v1/audio-analysis/3CqdEcYe3xWopkzxQxiyIK'],
 'duration_ms': [146808, 235957],
 'time_signature': [4, 4]}

In [338]:
pd.DataFrame(l)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.67,0.555,1,-9.12,0,0.0908,0.0382,6e-06,0.414,0.16,156.937,audio_features,4M68xjcc42oxyphhzpOWXS,spotify:track:4M68xjcc42oxyphhzpOWXS,https://api.spotify.com/v1/tracks/4M68xjcc42ox...,https://api.spotify.com/v1/audio-analysis/4M68...,146808,4
1,0.529,0.851,6,-7.005,1,0.0315,0.000241,0.000256,0.114,0.534,84.015,audio_features,3CqdEcYe3xWopkzxQxiyIK,spotify:track:3CqdEcYe3xWopkzxQxiyIK,https://api.spotify.com/v1/tracks/3CqdEcYe3xWo...,https://api.spotify.com/v1/audio-analysis/3Cqd...,235957,4


In [None]:
sp.search(q="track:'+4M68xjcc42oxyphhzpOWXS+'", limit=)

In [None]:
df_test_1 = hot_songs.sample(n=5, random_state=1)
df_test_2 = hot_songs.sample(n=5, random_state=2)

In [None]:
song_ids, song_ids_df, hot_song_ids_df = get_list_song_ids(df_test_1)

In [None]:
song_ids

In [None]:
hot_song_ids_df

In [None]:
x = get_audio_features(song_ids)

In [None]:
x
x_df = pd.DataFrame(x)
x_df

In [None]:
last = add_audio_features(hot_song_ids_df, x_df,'song_id')
last

In [None]:
df_test_1 = hot_songs.sample(n=5, random_state=1)
df_test_2 = hot_songs.sample(n=5, random_state=2)

In [None]:
# test code:
from time import sleep

test = [df_test_1, df_test_2]

song_ids = []
for hot_songs_df in test:
    for index, row in hot_songs_df.iterrows():
        try:
            song_id =  search_song (row['Song_title'], row['Artist'])
            song_ids.append(song_id)
        except:
            print("Song not found")
            song_ids.append("")
    print("sleep a bit before getting the next chunk")  
    sleep(20)
song_ids