# Lab | API wrappers - Create your collection of songs & audio features

### Instructions
To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [None]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
from time import sleep
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)


In [39]:
#Lab 1

html = "http://www.popvortex.com/music/charts/top-100-songs.php"
r = requests.get('http://www.popvortex.com/music/charts/top-100-songs.php')
r.status_code

200

In [40]:
html = r.content
soup = BeautifulSoup(html, 'html.parser')
soup.head()

[<meta charset="utf-8"/>,
 <title>iTunes Top 100 Songs Chart 2022</title>,
 <meta content="width=device-width, initial-scale=1" name="viewport"/>,
 <meta content="iTunes top 100 songs chart list. The most popular hit music and trending songs of 2022. Chart of today's current iTunes top 100 songs is updated daily." name="description"/>,
 <meta content="iTunes Top 100 Songs Chart 2022" property="og:title"><meta content="Chart of the top 100 songs on iTunes. Chart list of the top 100 song downloads of 2022 is updated daily." property="og:description"><meta content="article" property="og:type"><meta content="http://www.popvortex.com/images/logo-facebook.png" property="og:image"/><meta content="PopVortex" property="og:site_name"/><meta content="http://www.popvortex.com/music/charts/top-100-songs.php" property="og:url"/><meta content="100000239962942" property="fb:admins"/><meta content="178831188827052" property="fb:app_id"/><link href="/favicon.png" rel="shortcut icon"/><link href="/apple-

In [41]:
songs = soup.find_all(class_="title")
songs

[<cite class="title">Running Up That Hill (A Deal with God)</cite>,
 <cite class="title">Yet To Come</cite>,
 <cite class="title">You Might Not Like Her</cite>,
 <cite class="title">As It Was</cite>,
 <cite class="title">About Damn Time</cite>,
 <cite class="title">Hold My Hand</cite>,
 <cite class="title">Rock and A Hard Place</cite>,
 <cite class="title">Run BTS</cite>,
 <cite class="title">You Proof</cite>,
 <cite class="title">First Class</cite>,
 <cite class="title">Fuck You</cite>,
 <cite class="title">Wasted On You</cite>,
 <cite class="title">For Youth</cite>,
 <cite class="title">She Had Me At Heads Carolina</cite>,
 <cite class="title">AA</cite>,
 <cite class="title">No Diggity (feat. Dr. Dre &amp; Queen Pen)</cite>,
 <cite class="title">Can't Feel My Face</cite>,
 <cite class="title">Unstoppable</cite>,
 <cite class="title">Holy Water</cite>,
 <cite class="title">So Good</cite>,
 <cite class="title">Danger Zone</cite>,
 <cite class="title">About Damn Time</cite>,
 <cite clas

In [42]:
artists = soup.find_all(class_="artist")
artists

[<em class="artist">Kate Bush</em>,
 <em class="artist">BTS</em>,
 <em class="artist">Maddie Zahm</em>,
 <em class="artist">Harry Styles</em>,
 <em class="artist">Lizzo</em>,
 <em class="artist">Lady Gaga</em>,
 <em class="artist">Bailey Zimmerman</em>,
 <em class="artist">BTS</em>,
 <em class="artist">Morgan Wallen</em>,
 <em class="artist">Jack Harlow</em>,
 <em class="artist">CeeLo Green</em>,
 <em class="artist">Morgan Wallen</em>,
 <em class="artist">BTS</em>,
 <em class="artist">Cole Swindell</em>,
 <em class="artist">Walker Hayes</em>,
 <em class="artist">Blackstreet</em>,
 <em class="artist">The Weeknd</em>,
 <em class="artist">Sia</em>,
 <em class="artist">Noah Davis</em>,
 <em class="artist">Halsey</em>,
 <em class="artist">Kenny Loggins</em>,
 <em class="artist">Lizzo</em>,
 <em class="artist">BTS</em>,
 <em class="artist">Russell Dickerson &amp; Jake Scott</em>,
 <em class="artist">Post Malone</em>,
 <em class="artist">Kane Brown</em>,
 <em class="artist">Camila Cabello</em

In [43]:
tracks = []
for track in soup.find_all(class_="title"):
    tracks.append(track.get_text().strip())
    
artists = []
for artist in soup.find_all(class_="artist"):
    artists.append(artist.get_text().strip())
    
df1 = pd.DataFrame({'artist':artists,'track':tracks})

In [44]:
df1.head(100)

Unnamed: 0,artist,track
0,Kate Bush,Running Up That Hill (A Deal with God)
1,BTS,Yet To Come
2,Maddie Zahm,You Might Not Like Her
3,Harry Styles,As It Was
4,Lizzo,About Damn Time
5,Lady Gaga,Hold My Hand
6,Bailey Zimmerman,Rock and A Hard Place
7,BTS,Run BTS
8,Morgan Wallen,You Proof
9,Jack Harlow,First Class


In [45]:
# Lab 2

url ='https://musicbrainz.org/series/b3484a66-a4de-444d-93d3-c99a73656905'
response = requests.get(url)
response.status_code


200

In [46]:
soup = BeautifulSoup(response.content, 'html.parser')
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <link href="/static/images/favicons/apple-touch-icon-57x57.png" rel="apple-touch-icon" sizes="57x57"/>
  <link href="/static/images/favicons/apple-touch-icon-60x60.png" rel="apple-touch-icon" sizes="60x60"/>
  <link href="/static/images/favicons/apple-touch-icon-72x72.png" rel="apple-touch-icon" sizes="72x72"/>
  <link href="/static/images/favicons/apple-touch-icon-76x76.png" rel="apple-touch-icon" sizes="76x76"/>
  <link href="/static/images/favicons/apple-touch-icon-114x114.png" rel="apple-touch-icon" sizes="114x114"/>
  <link href="/static/images/favicons/apple-touch-icon-120x120.png" rel="apple-touch-icon" sizes="120x120"/>
  <link href="/static/images/favicons/apple-touch-icon-144x144.png" rel="apple-touch-icon" sizes="144x144"/>
  <link href="/static/images/favicons/apple-touch-icon-15

In [47]:
song_name = []
artist_name = []

for page in range(1,6):
    r = requests.get(f'https://musicbrainz.org/series/b3484a66-a4de-444d-93d3-c99a73656905?page={page}')
    soup = BeautifulSoup(r.content, 'html.parser')

    for song in soup.select("a[href*=recording]"):
        song_name.append(song.get_text(strip=True))

    for artist in soup.select("td:nth-of-type(3)"):
        artist_name.append(artist.get_text(strip=True))

print(song_name)
print(artist_name)

['Like a Rolling Stone', 'Strawberry Fields Forever', '(I Can’t Get No) Satisfaction', 'Imagine', 'What’s Going On', 'Respect', 'Good Vibrations', 'Johnny B. Goode', 'Hey Jude', 'Smells Like Teen Spirit', 'What’d I Say', 'My Generation', 'A Change Is Gonna Come', 'Yesterday', 'Blowin’ in the Wind', 'London Calling', 'I Want to Hold Your Hand', 'Purple Haze', 'Maybellene', 'Hound Dog', 'Let It Be', 'Born to Run', 'Be My Baby', 'In My Life', 'People Get Ready', 'God Only Knows', '(Sittin’ on) The Dock of the Bay', 'Layla', 'A Day in the Life', 'Help!', 'I Walk the Line', 'Stairway to Heaven', 'Sympathy for the Devil', 'River Deep—Mountain High', 'You’ve Lost That Lovin’ Feelin’', 'Light My Fire', 'One', 'No Woman, No Cry', 'Gimme Shelter', 'That’ll Be the Day', 'Dancing in the Street', 'The Weight', 'Waterloo Sunset', 'Tutti Frutti', 'Georgia on My Mind', 'Heartbreak Hotel', '“Heroes”', 'Bridge Over Troubled Water', 'All Along the Watchtower', 'Hotel California', 'The Tracks of My Tears'

In [48]:
len(artist_name)

500

In [49]:
d = {'song':song_name, 'artist':artist_name}
print(d)

{'song': ['Like a Rolling Stone', 'Strawberry Fields Forever', '(I Can’t Get No) Satisfaction', 'Imagine', 'What’s Going On', 'Respect', 'Good Vibrations', 'Johnny B. Goode', 'Hey Jude', 'Smells Like Teen Spirit', 'What’d I Say', 'My Generation', 'A Change Is Gonna Come', 'Yesterday', 'Blowin’ in the Wind', 'London Calling', 'I Want to Hold Your Hand', 'Purple Haze', 'Maybellene', 'Hound Dog', 'Let It Be', 'Born to Run', 'Be My Baby', 'In My Life', 'People Get Ready', 'God Only Knows', '(Sittin’ on) The Dock of the Bay', 'Layla', 'A Day in the Life', 'Help!', 'I Walk the Line', 'Stairway to Heaven', 'Sympathy for the Devil', 'River Deep—Mountain High', 'You’ve Lost That Lovin’ Feelin’', 'Light My Fire', 'One', 'No Woman, No Cry', 'Gimme Shelter', 'That’ll Be the Day', 'Dancing in the Street', 'The Weight', 'Waterloo Sunset', 'Tutti Frutti', 'Georgia on My Mind', 'Heartbreak Hotel', '“Heroes”', 'Bridge Over Troubled Water', 'All Along the Watchtower', 'Hotel California', 'The Tracks of 

In [50]:
d = {'song':song_name, 'artist':artist_name}
df2 = pd.DataFrame(d)
df2

Unnamed: 0,song,artist
0,Like a Rolling Stone,Bob Dylan
1,Strawberry Fields Forever,The Beatles
2,(I Can’t Get No) Satisfaction,The Rolling Stones
3,Imagine,John Lennon
4,What’s Going On,Marvin Gaye
5,Respect,Aretha Franklin
6,Good Vibrations,The Beach Boys
7,Johnny B. Goode,Chuck Berry
8,Hey Jude,The Beatles
9,Smells Like Teen Spirit,Nirvana


In [51]:
# Lab 3 start

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

secrets_file = open("secrets.txt","r")
password = secrets_file.read()
password = password.replace(" ", "")


secrets_dict={}
for line in password.split('\n'):
    if len(line) > 0:
        secrets_dict[line.split(':')[0]]=line.split(':')[1]

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['ClientID'], client_secret=secrets_dict['ClientSecret']))

In [52]:
def playlist_scraper(playlist_id):
    
    results = sp.user_playlist_tracks('spotify', playlist_id)
    tracks = results['items']

    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])


    track_ids = [track['track']['id'] for track in tracks]
    artists = [track['track']['artists'][0]['name'] for track in tracks]
    titles = [track['track']['name'] for track in tracks]
    aud_feat = [sp.audio_features(track)[0] for track in track_ids]

   # audio_features = sp.audio_features(track_ids)

    df = pd.DataFrame(aud_feat)
    df['artist'] = artists
    df['title'] = titles
    df['track_ids'] = track_ids

    return df

In [53]:
def get_playlist_tracks(username, playlist_id):
    
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    
    return tracks


In [54]:
playlists = ['12Wbv8sIx84T5uh6iOoJ7V','37i9dQZF1DWXRqgorJj26U','37i9dQZF1DXcBWIGoYBM5M','37i9dQZF1DX4dyzvuaRJ0n','37i9dQZF1DX0XUsuxWHRQd','37i9dQZF1DX4SBhb3fqCJd','37i9dQZF1DWVqfgj8NZEp1', '37i9dQZF1DX1lVhptIYRda','37i9dQZF1DWWEJlAGA9gs0','37i9dQZF1DX10zKzsJ2jva','4rnleEAOdmFAbRcNCgZMpY']

df = pd.DataFrame()

for playlist in playlists:
    df_playlist = playlist_scraper(playlist)
    df = pd.concat([df, df_playlist])
    
df = df.reset_index(drop=True) 

In [55]:
df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,artist,title,track_ids
0,0.291,0.319,7,-10.465,1,0.0306,0.791,0.468,0.0692,0.038,103.793,audio_features,0A0eOcimSNRs2EQQlH7FFJ,spotify:track:0A0eOcimSNRs2EQQlH7FFJ,https://api.spotify.com/v1/tracks/0A0eOcimSNRs...,https://api.spotify.com/v1/audio-analysis/0A0e...,323280,3,Marcelo Camelo,Três Dias,0A0eOcimSNRs2EQQlH7FFJ
1,0.171,0.626,8,-8.677,1,0.0486,0.873,0.0252,0.0681,0.457,180.098,audio_features,0MtVmhAx6CxNuxFIUc6Mj9,spotify:track:0MtVmhAx6CxNuxFIUc6Mj9,https://api.spotify.com/v1/tracks/0MtVmhAx6CxN...,https://api.spotify.com/v1/audio-analysis/0MtV...,348893,3,Beirut,Elephant Gun,0MtVmhAx6CxNuxFIUc6Mj9
2,0.324,0.776,0,-6.784,1,0.0346,0.151,0.917,0.0728,0.317,101.964,audio_features,6aUAF8JOd8zEl41B6I18xL,spotify:track:6aUAF8JOd8zEl41B6I18xL,https://api.spotify.com/v1/tracks/6aUAF8JOd8zE...,https://api.spotify.com/v1/audio-analysis/6aUA...,205040,3,The National,Fake Empire,6aUAF8JOd8zEl41B6I18xL
3,0.627,0.342,2,-12.833,1,0.0394,0.7,0.166,0.082,0.513,80.03,audio_features,2vGvPQNnyybJmiqpr1HiKX,spotify:track:2vGvPQNnyybJmiqpr1HiKX,https://api.spotify.com/v1/tracks/2vGvPQNnyybJ...,https://api.spotify.com/v1/audio-analysis/2vGv...,248480,4,Phill Veras,Sorriso ao Sono,2vGvPQNnyybJmiqpr1HiKX
4,0.486,0.769,1,-5.14,1,0.0341,0.0119,0.00832,0.197,0.393,131.286,audio_features,1PuLHwFZoh5qYK89I5YBdZ,spotify:track:1PuLHwFZoh5qYK89I5YBdZ,https://api.spotify.com/v1/tracks/1PuLHwFZoh5q...,https://api.spotify.com/v1/audio-analysis/1PuL...,276720,4,Mumford & Sons,I Will Wait,1PuLHwFZoh5qYK89I5YBdZ


In [56]:
# Discarding unnecessary columns
spotify_tracks = df[['track_ids','title','artist','danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 
                                 'instrumentalness', 'liveness', 'valence', 'tempo', 'duration_ms', 'time_signature']]

spotify_tracks.head()

Unnamed: 0,track_ids,title,artist,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,0A0eOcimSNRs2EQQlH7FFJ,Três Dias,Marcelo Camelo,0.291,0.319,7,-10.465,1,0.0306,0.791,0.468,0.0692,0.038,103.793,323280,3
1,0MtVmhAx6CxNuxFIUc6Mj9,Elephant Gun,Beirut,0.171,0.626,8,-8.677,1,0.0486,0.873,0.0252,0.0681,0.457,180.098,348893,3
2,6aUAF8JOd8zEl41B6I18xL,Fake Empire,The National,0.324,0.776,0,-6.784,1,0.0346,0.151,0.917,0.0728,0.317,101.964,205040,3
3,2vGvPQNnyybJmiqpr1HiKX,Sorriso ao Sono,Phill Veras,0.627,0.342,2,-12.833,1,0.0394,0.7,0.166,0.082,0.513,80.03,248480,4
4,1PuLHwFZoh5qYK89I5YBdZ,I Will Wait,Mumford & Sons,0.486,0.769,1,-5.14,1,0.0341,0.0119,0.00832,0.197,0.393,131.286,276720,4


In [57]:
spotify_tracks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6356 entries, 0 to 6355
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   track_ids         6356 non-null   object 
 1   title             6356 non-null   object 
 2   artist            6356 non-null   object 
 3   danceability      6356 non-null   float64
 4   energy            6356 non-null   float64
 5   key               6356 non-null   int64  
 6   loudness          6356 non-null   float64
 7   mode              6356 non-null   int64  
 8   speechiness       6356 non-null   float64
 9   acousticness      6356 non-null   float64
 10  instrumentalness  6356 non-null   float64
 11  liveness          6356 non-null   float64
 12  valence           6356 non-null   float64
 13  tempo             6356 non-null   float64
 14  duration_ms       6356 non-null   int64  
 15  time_signature    6356 non-null   int64  
dtypes: float64(9), int64(4), object(3)
memory 

In [60]:
spotify_tracks.to_csv("spotify_tracks.csv")
df1.to_csv("Hot_100.csv")
df2.to_csv("500_songs.csv")