# Song Recommender: Web scrapping & Spotify API

#### Ironhack August 2022

This project consists in creating a song recommender based on the audio features identified by Spotify. In this notebook, our main goal is to build two song databases (one for the Weekly  Hot 100 songs from Billboard, the other one from Kaggle), get the features for each song and create clusters to categorize and divide the songs.

The following code is divided in two parts:

- Web scrapping from Billboard Hot 100 and clean data from Kaggle dataset
- Retrieving song IDs and features from Spotify

#### Libraries:

In [20]:
from functions import *
import pandas as pd
import requests
from bs4 import BeautifulSoup
import numpy as np
import yaml

from config import *
import spotipy
import time
from spotipy.oauth2 import SpotifyClientCredentials

Getting parameters from YAML file:

In [21]:
try: 
    with open ("params.yaml", 'r') as file:
        config = yaml.safe_load(file)
except Exception as e:
    print('Error reading the config file')

## Web Scrapping and Data Cleaning

We'll start web scrapping the Billboard webpage in order to get the Top 100 Hot Songs. The function scrap_hot100 gets the url from the webpage as a parameter, creates a .csv file and returns the resulting dataframe.

In [3]:
hot_songs = scrap_hot100('https://www.billboard.com/charts/hot-100/')
hot_songs.head()

Unnamed: 0,title,artist
0,Super Freaky Girl,Nicki Minaj
1,As It Was,Harry Styles
2,About Damn Time,Lizzo
3,Break My Soul,Beyonce
4,Running Up That Hill (A Deal With God),Kate Bush


Now, we're going to clean the dataset from Kaggle. We have selected a sample of 20.000 songs that after cleaning, we expect to work with a sample of 6.000 songs

In [4]:
not_hot_songs = NotHotSongs()

(5946, 2)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nothot.drop(['user_id', 'song_id', 'listen_count', 'song'], axis = 1, inplace = True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nothot.drop_duplicates(inplace = True)
  nothot['title']= nothot['title'].str.replace('\\', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nothot['title']= nothot['title'].str.replace('\\', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[

In [None]:
print('Shape of the Kaggle dataframe is:', not_hot_songs.shape)
not_hot_songs.head()

# ----------------------------------------------------------------------------------------------------- #

# Spotify API request

Now we're going to connect with the Spotify API using the Spotipy library. 
This API has a requests rate limit, so we are going to split the data and run queries with a time interval of 30 seconds. 

Introducing our credentials to connect with the Spotify API:

In [5]:
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id= client_id, client_secret= client_secret))

#### Popular songs dataset (100 Hot Billboard)

Getting IDs and audio features of popular songs:

In [6]:
hot_songs = pd.read_csv(config['data']['hot_songs_file'])

Split data and get song IDs.

In [7]:
hot_songs_chunks = split_dataframe_by_position(hot_songs, 2)
hot_songs_ids = search_song(hot_songs_chunks)

Split data again and get song features.

In [8]:
hot_songs_ids_chunks = split_dataframe_by_position(hot_songs_ids, 2)
hot_songs_features = audio_features(hot_songs_ids_chunks, 'hot')

Create .cvs file with resulting dataframe.

In [9]:
hot_songs_features.head()
hot_songs_features.to_csv(config['data']['hot_songs_features'])

#### Not popular songs (Kaggle)

Getting IDs and audio features of not so popular songs:

In [10]:
not_hot_songs = pd.read_csv(config['data']['not_hot_songs_file'])

Split data and get song IDs.

In [11]:
not_hot_songs_chunks = split_dataframe_by_position(not_hot_songs, 100)
not_hot_songs_ids = search_song(not_hot_songs_chunks)

19, Song Television Rules The Nation, Crescendolls from Dafeat.Punk not found 
26, Song Electric Avenue from Eddy Grant not found 
35, Song I CANT GET STARTED from Ron Carter not found 


HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': 'track:Horn Concerto No. 4 in E flat K495: II. Romance (Andante cantabile) artist:Barry Tuckwell,Academy of St Martin-in-the-Fields,Sir Neville Marriner', 'limit': 1, 'offset': 0, 'type': 'track', 'market': 'ES'} returned 404 due to Not found.


52, Song Horn Concerto No. 4 in E flat K495: II. Romance (Andante cantabile) from Barry Tuckwell,Academy of St Martin-in-the-Fields,Sir Neville Marriner not found 
63, Song Short Circuit from Dafeat.Punk not found 
87, Song The Way She Dances from NERD not found 
90, Song Sing, Sing, Sing (Key-E-Premiere Performance Plus w,o Background Vocals) from Chris Tomlin not found 
92, Song The Message, Outro from Dr Dre, Thomas Chong, Mary J Blige, Rell not found 
119, Song Teach Me How To Dougie from California Swag District not found 
162, Song The Queen of Nothing from +, - {Plus,Minus} not found 
175, Song Bossy (Edited) (Feat. Too $hort) from Kelis, Too short not found 
198, Song Brother Against Brother from L'âme Immortelle not found 
202, Song Day N Nite from Kid Cudi Vs Crookers not found 
207, Song Harder Better Faster Stronger from Dafeat.Punk not found 
229, Song Hoy Te Deje De Amar (copia) from Paulina Rubio not found 
246, Song Terre Promise from O'Rosko Raricim not found 
269, Son

HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': 'track:Dysfunctional (feat. Big Scoob & Krizz Kaliko) artist:Tech N9NE Collabos, Big Scoob, Krizz Kaliko', 'limit': 1, 'offset': 0, 'type': 'track', 'market': 'ES'} returned 404 due to Not found.


482, Song Dysfunctional (feat. Big Scoob & Krizz Kaliko) from Tech N9NE Collabos, Big Scoob, Krizz Kaliko not found 
504, Song Make You Smile from +44 not found 
543, Song Bom, Bom - Suenan from Freddy Fader meets Locana not found 
545, Song One More Time, Aerodynamic from Dafeat.Punk not found 
549, Song Demon from London Afeat.r Midnight not found 
551, Song Lycanthrope from +44 not found 
572, Song Circus from Tristania not found 
646, Song Hellbound from J-Black, Masta Ace not found 
710, Song Downfall Of Christ (originally By Merauder) from Heaven Shall Burn not found 
714, Song Clampdown from The Strokes not found 
728, Song Take A Minute from K'Naan not found 
730, Song Heaven from DJ Sammy, Yanou, Do not found 
761, Song Its Time I Go (Jazz) from Joyce Cooling not found 
781, Song Bonkers from Dizzee Rascal and Armand Van Helden not found 
787, Song Taking Over Me (Live in Europe) from Evanescence not found 
802, Song Fuck Tha Police from NWA not found 
829, Song Shadow Journal

HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': 'track:White & Nerdy (Parody of "Ridin" by Chamillionaire featuring Krayzie Bone) artist:Weird Al" Yankovic', 'limit': 1, 'offset': 0, 'type': 'track', 'market': 'ES'} returned 404 due to Not found.


1673, Song White & Nerdy (Parody of "Ridin" by Chamillionaire featuring Krayzie Bone) from Weird Al" Yankovic not found 
1721, Song Mykel And Carli from Weezer not found 
1766, Song Samba De Una Nota So´ from Joa~o Gilberto not found 
1769, Song Cherry Red (Groundhogs Cover) from Earthless not found 
1776, Song Who Wants To Live Forever (With Commentary) from Queen not found 
1777, Song Well Be A Dream (featuring Demi Lovato) from We The Kings, Demi Lovato not found 
1779, Song Over And Over (Maurice Fulton Dub) from Hot Chip not found 
1781, Song Dejame entrar (En vivo ) from Maná not found 
1808, Song Quality Of Mercy from Michelle Shocked not found 


HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': 'track:Love Dont Let Me Go (Walking Away) artist:David Guetta - The Egg - Joachim Garraud - Chris Willis', 'limit': 1, 'offset': 0, 'type': 'track', 'market': 'ES'} returned 404 due to Not found.


1842, Song Love Dont Let Me Go (Walking Away) from David Guetta - The Egg - Joachim Garraud - Chris Willis not found 
1848, Song Breathe from Taylor Swifeat. Colbie Caillat not found 
1853, Song Baby Its You from JoJo, Lil' Bow Wow not found 
1854, Song Dont Let The Sun Go Down On Me from George Michael;George Michael Duet with Elton John not found 
1872, Song Robot Rock from Dafeat.Punk not found 
1893, Song Human After All from Dafeat.Punk not found 
1897, Song Incentive (Bonus Track) from Epica not found 
1899, Song Landlocked Blues from Bright Eyes not found 
1940, Song Rap Game from D-12 not found 
1949, Song Whats The Difference from Dr Dre, Eminem, Alvin Joiner not found 
1973, Song Did You Get My Message? (Live From Montalvo) from Jason Mraz not found 
1976, Song Illan kaunein nainen (Tava-live) from Sir Elwoodin Hiljaiset Varit not found 
1994, Song Im In The House from Steve Aoki Feat [[[Zuper Blahq]]] not found 
2008, Song What They Call Him (Skit) from Cocoa Brovas not foun

HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': 'track:Eton Boating Song,Wyoming Lullaby,The Wiffenpoof Song (Baa Baa Baa) (Medley) artist:Reginald Dixon', 'limit': 1, 'offset': 0, 'type': 'track', 'market': 'ES'} returned 404 due to Not found.


2204, Song Eton Boating Song,Wyoming Lullaby,The Wiffenpoof Song (Baa Baa Baa) (Medley) from Reginald Dixon not found 
2208, Song Happy Ending from Caribou (formerly Dan Snaith's Manitoba) not found 


HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': 'track:That Box artist:TECH N9NE feat Kutt Calhoun, Big Krizz Kaliko, Snug Brim, Greed, and Skatterman', 'limit': 1, 'offset': 0, 'type': 'track', 'market': 'ES'} returned 404 due to Not found.


2220, Song That Box from TECH N9NE feat Kutt Calhoun, Big Krizz Kaliko, Snug Brim, Greed, and Skatterman not found 
2231, Song Now Im High, Really High from Triple Six Mafia not found 
2238, Song Every Lasting Light from The Black Keys not found 
2239, Song Emotion from Dafeat.Punk not found 
2271, Song Im Not The One from The Black Keys not found 
2272, Song Hey There Mr. Brooks (feat. Feat. Shawn Mike of Alesana) from Asking Alexandria not found 
2281, Song Rec & Play from I'm From Barcelona not found 
2283, Song Running Away from Kevin Blechdom not found 
2305, Song Auditorium from Mos Def, Slick Rick not found 
2316, Song Illegal (featuring Carlos Santana) from Shakira, Carlos Santana not found 
2318, Song Ill Try Anything Once from The Strokes not found 
2325, Song Faith Works from Bobby Lee not found 
2343, Song How Come from D-12 not found 
2379, Song Young Folks from Peter, Bjorn and John, Victoria Bergsman not found 
2390, Song Wavin  Flag from K'Naan not found 
2394, Song So

HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': 'track:To All The Girls Ive Loved Before (With Julio Iglesias) artist:Julio Iglesias duet with Willie Nelson', 'limit': 1, 'offset': 0, 'type': 'track', 'market': 'ES'} returned 404 due to Not found.


3302, Song To All The Girls Ive Loved Before (With Julio Iglesias) from Julio Iglesias duet with Willie Nelson not found 
3334, Song Rollin & Scratchin from Dafeat.Punk not found 
3336, Song Time (featuring Corey Harris & Ranking Joe) from Easy Star All-Stars not found 
3343, Song Da Funk from Dafeat.Punk not found 
3349, Song Everybodys Fool (Live in Europe) from Evanescence not found 
3369, Song Holding Out For A Hero from Frou Frou not found 
3388, Song Bust A Move from Various Artists - Delicious Vinyl not found 
3391, Song Packt Like Sardines In A Crushed Tin Box from Radiohead not found 
3401, Song Moonshine from Jack Johnson not found 
3409, Song Crimewave (Crystal Castles vs Health) from Crystal Castles not found 
3412, Song A Little Less Sixteen Candles, A Little More "Touch Me from Fall Out Boy not found 
3416, Song One More Time (Romanthonys Unplugged) from Dafeat.Punk not found 
3420, Song Lights Out from Santogold not found 
3423, Song Monday Morning Cold (band) from Erin 

HTTP Error for GET to https://api.spotify.com/v1/search with Params: {'q': 'track:Theres A Good Reason These Tables Are Numbered Honey, You Just Havent Thought of It Yet [Live In Chicago] artist:Panic At The Disco', 'limit': 1, 'offset': 0, 'type': 'track', 'market': 'ES'} returned 404 due to Not found.


4253, Song Theres A Good Reason These Tables Are Numbered Honey, You Just Havent Thought of It Yet [Live In Chicago] from Panic At The Disco not found 
4256, Song Its Tricky from RUN-DMC not found 
4271, Song Sex Out South from TECH N9NE feat Kutt Calhoun, Big Krizz Kaliko not found 
4284, Song Shifty from Flying Lotus not found 
4309, Song Succubus from Five Finger Death Punch not found 
4334, Song Making Money Off God feat. Bus Driver from 2Mex not found 
4342, Song Deutschland 04 feat. Joerilla from Pyranja not found 
4399, Song I Wanna Love You (Akon Cover) ( Compilation) from The Maine not found 
4410, Song Plumber In Progress #1 from Son Of A Plumber not found 
4416, Song Bang Bang from K'naan, Adam Levine not found 
4455, Song Salt Shakers from Ying Yang Twins feat.Lil Jon, The East Side Boyz not found 
4458, Song Brazil from Arcade Fire not found 
4459, Song MASTERPLAN (ORIGINAL) from Electric Envoy not found 
4469, Song Music Kills Me from rinôçérôse not found 
4470, Song John

Split data again and get song features.

In [12]:
not_hot_songs_ids_chunks = split_dataframe_by_position(not_hot_songs_ids, 100)
not_hot_songs_features = audio_features(not_hot_songs_ids_chunks, 'nothot')
not_hot_songs_features.head()

Unnamed: 0.1,Unnamed: 0,title,artist,id,link,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri,track_href,duration_ms,label
0,542811,Genie In A Bottle,Christina Aguilera,11mwFrKvLXCbcVGNxffGyP,https://open.spotify.com/artist/1l7ZsJRRS8wlW3...,0.633,0.8,1,-6.945,1,0.166,0.209,0.000123,0.137,0.913,175.716,spotify:track:11mwFrKvLXCbcVGNxffGyP,https://api.spotify.com/v1/tracks/11mwFrKvLXCb...,217573,nothot
1,1015427,Swallowed In The Sea,Coldplay,2u2WL5N3KnQnykOZi3fxL6,https://open.spotify.com/artist/4gzpq5DPGxSnKT...,0.265,0.378,11,-10.823,1,0.0316,0.0557,1e-06,0.119,0.153,142.1,spotify:track:2u2WL5N3KnQnykOZi3fxL6,https://api.spotify.com/v1/tracks/2u2WL5N3KnQn...,239002,nothot
2,20039,We Ride,Rihanna,0EIsxWGPSte4cAHZw5aXr4,https://open.spotify.com/artist/5pKCCKE2ajJHZ9...,0.394,0.744,5,-5.465,0,0.276,0.00175,0.0,0.308,0.772,78.696,spotify:track:0EIsxWGPSte4cAHZw5aXr4,https://api.spotify.com/v1/tracks/0EIsxWGPSte4...,236680,nothot
3,96184,The Diary,Hollywood Undead,3D8fT4ExPoHvW6clW9nYLp,https://open.spotify.com/artist/0CEFCo8288kQU7...,0.612,0.666,0,-6.276,1,0.0363,0.0237,0.0,0.662,0.788,148.064,spotify:track:3D8fT4ExPoHvW6clW9nYLp,https://api.spotify.com/v1/tracks/3D8fT4ExPoHv...,275293,nothot
4,675997,Your Star,The All-American Rejects,1FiAliBdsh2bTWn4xHnbUG,https://open.spotify.com/artist/3vAaWhdBR38Q02...,0.49,0.857,2,-3.641,1,0.0471,0.000785,0.00255,0.0953,0.483,148.072,spotify:track:1FiAliBdsh2bTWn4xHnbUG,https://api.spotify.com/v1/tracks/1FiAliBdsh2b...,260827,nothot


Create .csv file with resulting dataframe.

In [17]:
not_hot_songs_features = not_hot_songs_features.drop('Unnamed: 0', axis=1)
not_hot_songs_features.to_csv(config['data']['not_hot_songs_features'])

### Combine dataframes

In [24]:
combined_songs = pd.concat([hot_songs_features, not_hot_songs_features])
combined_songs = combined_songs.drop(['Unnamed: 0', 'link'], axis=1)
combined_songs.reset_index(drop=True).tail()

Unnamed: 0,title,artist,id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri,track_href,duration_ms,label
5617,Incident At Gate 7,Thievery Corporation,7dBAgPJ9krTupzldvpVnef,0.735,0.454,6,-13.911,0,0.0627,0.00487,0.898,0.051,0.4,99.999,spotify:track:7dBAgPJ9krTupzldvpVnef,https://api.spotify.com/v1/tracks/7dBAgPJ9krTu...,389413,nothot
5618,Fire Eyed Boy,Broken Social Scene,3lAz9oK9c8BNeLQMK3mnlY,0.431,0.94,9,-7.574,1,0.0425,0.005,0.862,0.231,0.752,151.267,spotify:track:3lAz9oK9c8BNeLQMK3mnlY,https://api.spotify.com/v1/tracks/3lAz9oK9c8BN...,238920,nothot
5619,Charmless Man,Blur,7lJ9MeQgqHlBrE69omD4rN,0.535,0.876,9,-6.523,1,0.0418,0.00759,0.0,0.035,0.846,116.766,spotify:track:7lJ9MeQgqHlBrE69omD4rN,https://api.spotify.com/v1/tracks/7lJ9MeQgqHlB...,213987,nothot
5620,Communication Part 3,Armin van Buuren,0neva1N08tpWLLVrrXZm2Y,0.567,0.702,1,-10.701,1,0.0393,0.00302,0.898,0.0978,0.439,135.988,spotify:track:0neva1N08tpWLLVrrXZm2Y,https://api.spotify.com/v1/tracks/0neva1N08tpW...,507907,nothot
5621,Rio,Duran Duran,43eBgYRTmu5BJnCJDBU5Hb,0.544,0.873,9,-7.425,1,0.0525,0.0415,1.8e-05,0.0925,0.676,140.902,spotify:track:43eBgYRTmu5BJnCJDBU5Hb,https://api.spotify.com/v1/tracks/43eBgYRTmu5B...,337333,nothot


Create links to listen to the songs.

In [28]:
song_links = []

for i in combined_songs['id']:
    song_links.append('https://open.spotify.com/track/'+i)
    
combined_songs['links'] = song_links

Unnamed: 0,title,artist,id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri,track_href,duration_ms,label,links
0,Super Freaky Girl,Nicki Minaj,4C6Uex2ILwJi9sZXRdmqXp,0.950,0.891,2,-2.653,1,0.2410,0.06450,0.000018,0.3090,0.912,133.010,spotify:track:4C6Uex2ILwJi9sZXRdmqXp,https://api.spotify.com/v1/tracks/4C6Uex2ILwJi...,170977,hot,https://open.spotify.com/track/4C6Uex2ILwJi9sZ...
1,As It Was,Harry Styles,4LRPiXqCikLlN15c3yImP7,0.520,0.731,6,-5.338,0,0.0557,0.34200,0.001010,0.3110,0.662,173.930,spotify:track:4LRPiXqCikLlN15c3yImP7,https://api.spotify.com/v1/tracks/4LRPiXqCikLl...,167303,hot,https://open.spotify.com/track/4LRPiXqCikLlN15...
2,About Damn Time,Lizzo,1PckUlxKqWQs3RlWXVBLw3,0.836,0.743,10,-6.305,0,0.0656,0.09950,0.000000,0.3350,0.722,108.966,spotify:track:1PckUlxKqWQs3RlWXVBLw3,https://api.spotify.com/v1/tracks/1PckUlxKqWQs...,191822,hot,https://open.spotify.com/track/1PckUlxKqWQs3Rl...
3,Break My Soul,Beyonce,2KukL7UlQ8TdvpaA7bY3ZJ,0.687,0.887,1,-5.040,0,0.0826,0.05750,0.000002,0.2700,0.853,114.941,spotify:track:2KukL7UlQ8TdvpaA7bY3ZJ,https://api.spotify.com/v1/tracks/2KukL7UlQ8Td...,278282,hot,https://open.spotify.com/track/2KukL7UlQ8Tdvpa...
4,Running Up That Hill (A Deal With God),Kate Bush,75FEaRjZTKLhTrFGsfMUXR,0.629,0.547,10,-13.123,0,0.0550,0.72000,0.003140,0.0604,0.197,108.375,spotify:track:75FEaRjZTKLhTrFGsfMUXR,https://api.spotify.com/v1/tracks/75FEaRjZTKLh...,298933,hot,https://open.spotify.com/track/75FEaRjZTKLhTrF...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5517,Incident At Gate 7,Thievery Corporation,7dBAgPJ9krTupzldvpVnef,0.735,0.454,6,-13.911,0,0.0627,0.00487,0.898000,0.0510,0.400,99.999,spotify:track:7dBAgPJ9krTupzldvpVnef,https://api.spotify.com/v1/tracks/7dBAgPJ9krTu...,389413,nothot,https://open.spotify.com/track/7dBAgPJ9krTupzl...
5518,Fire Eyed Boy,Broken Social Scene,3lAz9oK9c8BNeLQMK3mnlY,0.431,0.940,9,-7.574,1,0.0425,0.00500,0.862000,0.2310,0.752,151.267,spotify:track:3lAz9oK9c8BNeLQMK3mnlY,https://api.spotify.com/v1/tracks/3lAz9oK9c8BN...,238920,nothot,https://open.spotify.com/track/3lAz9oK9c8BNeLQ...
5519,Charmless Man,Blur,7lJ9MeQgqHlBrE69omD4rN,0.535,0.876,9,-6.523,1,0.0418,0.00759,0.000000,0.0350,0.846,116.766,spotify:track:7lJ9MeQgqHlBrE69omD4rN,https://api.spotify.com/v1/tracks/7lJ9MeQgqHlB...,213987,nothot,https://open.spotify.com/track/7lJ9MeQgqHlBrE6...
5520,Communication Part 3,Armin van Buuren,0neva1N08tpWLLVrrXZm2Y,0.567,0.702,1,-10.701,1,0.0393,0.00302,0.898000,0.0978,0.439,135.988,spotify:track:0neva1N08tpWLLVrrXZm2Y,https://api.spotify.com/v1/tracks/0neva1N08tpW...,507907,nothot,https://open.spotify.com/track/0neva1N08tpWLLV...


In [29]:
combined_songs.to_csv(config['data']['combined_data'], index = False)