# Lab 1
## Web Scraping a single page
I want to focus on this list https://playback.fm/charts/rock/2021 as I prefer Rock music over the alternatives.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

### Create the request and fetch data

In [2]:
url = 'https://playback.fm/charts/rock/2021'
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [92]:
soup = BeautifulSoup(response.content, "html.parser")
# soup

### Select the information
First I need to inspect the webpage to find the selector:  
``#myTable > tbody > tr:nth-child(1) > td.mobile-hide > a > span.song``

This selector finds the name of the first song on the list, I need to change the selection to something I can iterate:  
``#myTable > tbody > tr:nth-child``

![example of rows](example_rows.png "Example of fetched rows")

In [4]:
soup.select("#myTable > tbody > tr:nth-child(1) > td.mobile-hide > a > span.song")

[]

Apparently, my theoretical work is wrong, so I need to change the way I work on the selector by using the class "song" that includes all the names.

In [93]:
# soup.select(".song > a")


If I want to fetch also the name of the artist I can do two things:  
1- Use the class "artist"  
2- Go one level above these classes and fetch the whole row (as I tried to do in the example)  

In [94]:
# 1
# soup.select(".artist")

In [95]:
# 2
# soup.select(".chartTbl")

I prefer option one as it is much easier to work with, and if I want to get the rankings, the results are already ordered so I just have to look at the indexes.

### Creating the dataframe

In [8]:
#Empty lists
song = []
artist = []

length = len(soup.select('.artist'))

for i in range(length):
    song.append(soup.select('.song > a')[i].get_text())
    artist.append(soup.select('.artist')[i].get_text())

In [9]:
# Create a dictionary
songs = pd.DataFrame({'song':song,
                      'artist':artist
                    })

In [10]:
songs

Unnamed: 0,song,artist
0,\nWaiting On A War\n,\nFoo Fighters\n
1,\nMonsters\n,\nAll Time Low featuring blackbear\n
2,\nMood\n,\n24kGoldn featuring iann Dior\n
3,\nFollow You\n,\nImagine Dragons\n
4,\nHeat Waves\n,\nGlass Animals\n
...,...,...
95,\nKnow That I Know\n,\nLake Street Dive\n
96,\nSugarCrash!\n,\nElyOtto\n
97,"\nI Am Not A Woman, I'm A God\n",\nHalsey\n
98,\nTrouble’s Coming\n,\nRoyal Blood\n


In [11]:
#Cleanup
songs['song'] = songs['song'].str.replace('\n', '')
songs['artist'] = songs['artist'].str.replace('\n', '')

In [12]:
songs

Unnamed: 0,song,artist
0,Waiting On A War,Foo Fighters
1,Monsters,All Time Low featuring blackbear
2,Mood,24kGoldn featuring iann Dior
3,Follow You,Imagine Dragons
4,Heat Waves,Glass Animals
...,...,...
95,Know That I Know,Lake Street Dive
96,SugarCrash!,ElyOtto
97,"I Am Not A Woman, I'm A God",Halsey
98,Trouble’s Coming,Royal Blood


# Lab 2
## Multipage scraping
Just one year worth of songs is not enough, I need to add more songs to the list.
Luckily, the url for different years within the Rock genre have the same structure, just changing the actual year:  
https://playback.fm/charts/rock/2020  
https://playback.fm/charts/rock/2019

I will create a function that takes a year and returns a dataframe like the one above, and then I will concatenate those dataframes.

In [13]:
def song_scraping(year):
    url = 'https://playback.fm/charts/rock/'+str(year)
    response = requests.get(url)

    soup = BeautifulSoup(response.content, "html.parser")

    song = []
    artist = []

    length = len(soup.select('.artist'))

    for i in range(length):
        song.append(soup.select('.song > a')[i].get_text())
        artist.append(soup.select('.artist')[i].get_text())

    songs = pd.DataFrame({'song':song,
                          'artist':artist
                        })  
    songs['song'] = songs['song'].str.replace('\n', '')
    songs['artist'] = songs['artist'].str.replace('\n', '')

    return songs  
    
    

In [14]:
songs2 = song_scraping(2020)
songs2

Unnamed: 0,song,artist
0,Level Of Concern,twenty one pilots
1,Bang!,AJR
2,Mood,24kGoldn featuring iann Dior
3,Hallucinogenics,Matt Maeson
4,Come & Go,Juice WRLD featuring Marshmello
...,...,...
95,This Forgotten Town,The Jayhawks
96,Keep My Name Outta Your Mouth,The Black Keys
97,Fire For You,Cannons
98,Angels & Demons,jxdn


The function works, but I will need more songs, time for a loop. To do that a bit more easier, I will return the lists instead of the dataframes I was returning so far.

In [16]:
def song_scraping_bulk(year):
    url = 'https://playback.fm/charts/rock/'+str(year)
    response = requests.get(url)

    soup = BeautifulSoup(response.content, "html.parser")

    song = []
    artist = []

    length = len(soup.select('.artist'))

    for i in range(length):
        song.append(soup.select('.song > a')[i].get_text())
        artist.append(soup.select('.artist')[i].get_text())

    return song, artist

In [17]:
final_songs = []
final_artists = []


for i in range(1980, 2022):
    songs,artists = song_scraping_bulk(i)
    # print(i)

    final_songs = final_songs+songs
    final_artists = final_artists+artists



In [96]:
# final_songs


I want even more songs, so I will look at another web: http://www.popvortex.com/music/charts/2017.php

In [19]:
def itunes_scrapper(year):
    url = 'http://www.popvortex.com/music/charts/'+str(year)+'.php'

    response = requests.get(url)

    soup = BeautifulSoup(response.content, "html.parser")

    song = []
    artist = []

    length = len(soup.select('p.title-artist > em.title > a'))

    for i in range(length):
        song.append(soup.select('p.title-artist > em.title > a')[i].get_text())
        artist.append(soup.select('p.title-artist > em.artist')[i].get_text())

    return song,artist


In [20]:
for i in range(2003, 2018):
    songs,artists = itunes_scrapper(i)
    # print(i)

    final_songs = final_songs+songs
    final_artists = final_artists+artists
    # print(songs[0])

data = pd.DataFrame({'song':final_songs,
                      'artist':final_artists
                    })  
data['song'] = data['song'].str.replace('\n', '')
data['artist'] = data['artist'].str.replace('\n', '')

And more songs (thanks to Daniel) from https://www.billboard.com/charts/billboard-global-200/:

In [21]:
url = 'https://www.billboard.com/charts/billboard-global-200/'

response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

song = []
artist = []

length = len(soup.select('#title-of-a-story.c-title.a-no-trucate'))

for i in range(length):
    song.append(soup.select('#title-of-a-story.c-title.a-no-trucate')[i].get_text())
    artist.append(soup.select('span.c-label.a-no-trucate')[i].get_text())


final_songs = final_songs+songs
final_artists = final_artists+artists

data = pd.DataFrame({'song':final_songs,
                      'artist':final_artists
                    })  
data['song'] = data['song'].str.replace('\n', '')
data['artist'] = data['artist'].str.replace('\n', '')

In [22]:
len(soup.select('#title-of-a-story.c-title.a-no-trucate'))

200

In [27]:
top_lists = data.copy()
top_lists

Unnamed: 0,song,artist
0,Keep On Loving You,REO Speedwagon
1,Don't Stand So Close to Me,The Police
2,Another Brick in the Wall,Pink Floyd
3,Love Stinks,The J. Geils Band
4,Funkytown,"Lipps, Inc"
...,...,...
4515,4 Your Eyez Only,J. Cole
4516,24K Magic,Bruno Mars
4517,Culture,Migos
4518,FUTURE,Future


In [28]:
top_lists.to_csv('top_lists.csv', index=False)

# Lab 3
## Spotify API
I want to start testing the Spotify API fetching a certain playlist.

In [29]:
import json
import getpass


In [30]:
# base URL of all Spotify API endpoints
BASE_URL = 'https://api.spotify.com/v1/'

In [31]:
response = requests.get(BASE_URL)
response


<Response [401]>

In [32]:
secrets_file = open("SpotifySecret.txt","r")
string = secrets_file.read()
secret_string = string.split('\n')

secrets_dict={}
for line in secret_string:
    if len(line) > 0:
        secrets_dict[line.split(':')[0]]=line.split(':')[1]

In [33]:
def spotify_token():
    AUTH_URL = 'https://accounts.spotify.com/api/token'

    # POST
    auth_response = requests.post(AUTH_URL, {
        'grant_type': 'client_credentials',
        'client_id': secrets_dict['cid'],
        'client_secret': secrets_dict['cs'],
    })

    # convert the response to JSON
    auth_response_data = auth_response.json()

    # save the access token
    access_token = auth_response_data['access_token']
    return access_token

In [34]:
access_token = spotify_token()

In [35]:
headers = {
    'Authorization': 'Bearer {token}'.format(token=access_token)
}

In [36]:
playlist_id = '5pQVH0EJssqt4KDXPQVwv3'
r = requests.get(BASE_URL + 'playlists/' + playlist_id + '/tracks',headers=headers)
r = r.json()

In [37]:
# r = r.json()
rdf = pd.DataFrame(r)
rdf['items']

0     {'added_at': '2022-01-20T08:46:10Z', 'added_by...
1     {'added_at': '2022-01-20T08:49:13Z', 'added_by...
2     {'added_at': '2022-01-20T08:49:51Z', 'added_by...
3     {'added_at': '2022-01-20T08:51:16Z', 'added_by...
4     {'added_at': '2022-01-20T08:52:22Z', 'added_by...
5     {'added_at': '2022-01-20T08:54:28Z', 'added_by...
6     {'added_at': '2022-01-20T08:55:49Z', 'added_by...
7     {'added_at': '2022-01-20T08:58:43Z', 'added_by...
8     {'added_at': '2022-01-20T08:59:31Z', 'added_by...
9     {'added_at': '2022-01-20T09:00:20Z', 'added_by...
10    {'added_at': '2022-01-20T09:02:52Z', 'added_by...
11    {'added_at': '2022-01-20T09:04:15Z', 'added_by...
12    {'added_at': '2022-01-20T09:06:35Z', 'added_by...
13    {'added_at': '2022-01-20T09:10:35Z', 'added_by...
14    {'added_at': '2022-01-20T09:10:56Z', 'added_by...
Name: items, dtype: object

In [38]:
flattened_data = pd.json_normalize(rdf['items'])

In [39]:
flattened_data

Unnamed: 0,added_at,is_local,primary_color,added_by.external_urls.spotify,added_by.href,added_by.id,added_by.type,added_by.uri,track.album.album_type,track.album.artists,...,track.id,track.is_local,track.name,track.popularity,track.preview_url,track.track,track.track_number,track.type,track.uri,video_thumbnail.url
0,2022-01-20T08:46:10Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,album,[{'external_urls': {'spotify': 'https://open.s...,...,2IHaGyfxNoFPLJnaEg4GTs,False,"What Is Love - 7"" Mix",73,https://p.scdn.co/mp3-preview/659374a67e87fcdc...,True,1,track,spotify:track:2IHaGyfxNoFPLJnaEg4GTs,
1,2022-01-20T08:49:13Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,album,[{'external_urls': {'spotify': 'https://open.s...,...,6pb5BBnIM5IM7R1cqag6rE,False,Big Me,62,https://p.scdn.co/mp3-preview/1b1f3d6280dc7b88...,True,3,track,spotify:track:6pb5BBnIM5IM7R1cqag6rE,
2,2022-01-20T08:49:51Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,album,[{'external_urls': {'spotify': 'https://open.s...,...,3d9DChrdc6BOeFsbrZ3Is0,False,Under the Bridge,83,https://p.scdn.co/mp3-preview/90e41778392f27b6...,True,11,track,spotify:track:3d9DChrdc6BOeFsbrZ3Is0,
3,2022-01-20T08:51:16Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,album,[{'external_urls': {'spotify': 'https://open.s...,...,4ZtqsOdBbS6GoedzzRGSo9,False,Breathe,54,https://p.scdn.co/mp3-preview/e47d0d04d2105337...,True,2,track,spotify:track:4ZtqsOdBbS6GoedzzRGSo9,
4,2022-01-20T08:52:22Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,album,[{'external_urls': {'spotify': 'https://open.s...,...,1G391cbiT3v3Cywg8T7DM1,False,Scar Tissue,80,https://p.scdn.co/mp3-preview/8602533a3ae6da93...,True,3,track,spotify:track:1G391cbiT3v3Cywg8T7DM1,
5,2022-01-20T08:54:28Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,album,[{'external_urls': {'spotify': 'https://open.s...,...,5lDriBxJd22IhOH9zTcFrV,False,Dirty Little Secret,73,,True,1,track,spotify:track:5lDriBxJd22IhOH9zTcFrV,
6,2022-01-20T08:55:49Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,compilation,[{'external_urls': {'spotify': 'https://open.s...,...,1ofhfV90EnYhEr7Un2fWiv,False,Changes,72,,True,5,track,spotify:track:1ofhfV90EnYhEr7Un2fWiv,
7,2022-01-20T08:58:43Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,album,[{'external_urls': {'spotify': 'https://open.s...,...,6wpGqhRvJGNNXwWlPmkMyO,False,I Still Haven't Found What I'm Looking For,79,,True,2,track,spotify:track:6wpGqhRvJGNNXwWlPmkMyO,
8,2022-01-20T08:59:31Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,album,[{'external_urls': {'spotify': 'https://open.s...,...,3v1dCP3hk2djfWryqfp7sx,False,Caminando por la vida,67,https://p.scdn.co/mp3-preview/0a7c5118e8f979dd...,True,1,track,spotify:track:3v1dCP3hk2djfWryqfp7sx,
9,2022-01-20T09:00:20Z,False,,https://open.spotify.com/user/31v65vzsgovbogh3...,https://api.spotify.com/v1/users/31v65vzsgovbo...,31v65vzsgovbogh3savco6gwecw4,user,spotify:user:31v65vzsgovbogh3savco6gwecw4,album,[{'external_urls': {'spotify': 'https://open.s...,...,29699gQVPOc6cLocZJMsRT,False,Adolescentes,49,https://p.scdn.co/mp3-preview/2a5bce61d392f19c...,True,3,track,spotify:track:29699gQVPOc6cLocZJMsRT,


In [40]:
songs = flattened_data['track.name'].to_list()
songs #list to add to the big table

['What Is Love - 7" Mix',
 'Big Me',
 'Under the Bridge',
 'Breathe',
 'Scar Tissue',
 'Dirty Little Secret',
 'Changes',
 "I Still Haven't Found What I'm Looking For",
 'Caminando por la vida',
 'Adolescentes',
 'Appreciate It',
 'Bohemian Rhapsody',
 'Points of Authority',
 'You Give Love A Bad Name',
 'Wishing on a Star']

In [41]:
flattened_data2 = pd.json_normalize(flattened_data['track.album.artists'])
flattened_data2

Unnamed: 0,0,1
0,{'href': 'https://api.spotify.com/v1/artists/0...,
1,{'href': 'https://api.spotify.com/v1/artists/7...,
2,{'href': 'https://api.spotify.com/v1/artists/0...,
3,{'href': 'https://api.spotify.com/v1/artists/4...,
4,{'href': 'https://api.spotify.com/v1/artists/0...,
5,{'href': 'https://api.spotify.com/v1/artists/3...,
6,{'href': 'https://api.spotify.com/v1/artists/1...,
7,{'href': 'https://api.spotify.com/v1/artists/5...,
8,{'href': 'https://api.spotify.com/v1/artists/1...,
9,{'href': 'https://api.spotify.com/v1/artists/6...,{'href': 'https://api.spotify.com/v1/artists/6...


In [42]:
flattened_data3 = pd.json_normalize(flattened_data2[0])
flattened_data3

Unnamed: 0,href,id,name,type,uri,external_urls.spotify
0,https://api.spotify.com/v1/artists/0Suv0tRrNrU...,0Suv0tRrNrUlRzAy8aXjma,Haddaway,artist,spotify:artist:0Suv0tRrNrUlRzAy8aXjma,https://open.spotify.com/artist/0Suv0tRrNrUlRz...
1,https://api.spotify.com/v1/artists/7jy3rLJdDQY...,7jy3rLJdDQY21OgRLCZ9sD,Foo Fighters,artist,spotify:artist:7jy3rLJdDQY21OgRLCZ9sD,https://open.spotify.com/artist/7jy3rLJdDQY21O...
2,https://api.spotify.com/v1/artists/0L8ExT028jH...,0L8ExT028jH3ddEcZwqJJ5,Red Hot Chili Peppers,artist,spotify:artist:0L8ExT028jH3ddEcZwqJJ5,https://open.spotify.com/artist/0L8ExT028jH3dd...
3,https://api.spotify.com/v1/artists/4k1ELeJKT1I...,4k1ELeJKT1ISyDv8JivPpB,The Prodigy,artist,spotify:artist:4k1ELeJKT1ISyDv8JivPpB,https://open.spotify.com/artist/4k1ELeJKT1ISyD...
4,https://api.spotify.com/v1/artists/0L8ExT028jH...,0L8ExT028jH3ddEcZwqJJ5,Red Hot Chili Peppers,artist,spotify:artist:0L8ExT028jH3ddEcZwqJJ5,https://open.spotify.com/artist/0L8ExT028jH3dd...
5,https://api.spotify.com/v1/artists/3vAaWhdBR38...,3vAaWhdBR38Q02ohXqaNHT,The All-American Rejects,artist,spotify:artist:3vAaWhdBR38Q02ohXqaNHT,https://open.spotify.com/artist/3vAaWhdBR38Q02...
6,https://api.spotify.com/v1/artists/1ZwdS5xdxER...,1ZwdS5xdxEREPySFridCfh,2Pac,artist,spotify:artist:1ZwdS5xdxEREPySFridCfh,https://open.spotify.com/artist/1ZwdS5xdxEREPy...
7,https://api.spotify.com/v1/artists/51Blml2LZPm...,51Blml2LZPmy7TTiAg47vQ,U2,artist,spotify:artist:51Blml2LZPmy7TTiAg47vQ,https://open.spotify.com/artist/51Blml2LZPmy7T...
8,https://api.spotify.com/v1/artists/1EXjXQpDx2p...,1EXjXQpDx2pROygh8zvHs4,Melendi,artist,spotify:artist:1EXjXQpDx2pROygh8zvHs4,https://open.spotify.com/artist/1EXjXQpDx2pROy...
9,https://api.spotify.com/v1/artists/6A9B0s7mgGz...,6A9B0s7mgGzm1fY0Vg8Skw,Kiko y Shara,artist,spotify:artist:6A9B0s7mgGzm1fY0Vg8Skw,https://open.spotify.com/artist/6A9B0s7mgGzm1f...


In [43]:
artists = flattened_data3['name'].to_list()
artists

['Haddaway',
 'Foo Fighters',
 'Red Hot Chili Peppers',
 'The Prodigy',
 'Red Hot Chili Peppers',
 'The All-American Rejects',
 '2Pac',
 'U2',
 'Melendi',
 'Kiko y Shara',
 '2Baba',
 'Queen',
 'Linkin Park',
 'Bon Jovi',
 'Rose Royce']

In [44]:
final_songs = final_songs+songs
final_artists = final_artists+artists

data = pd.DataFrame({'song':final_songs,
                      'artist':final_artists
                    })  
data['song'] = data['song'].str.replace('\n', '')
data['artist'] = data['artist'].str.replace('\n', '')

data

Unnamed: 0,song,artist
0,Keep On Loving You,REO Speedwagon
1,Don't Stand So Close to Me,The Police
2,Another Brick in the Wall,Pink Floyd
3,Love Stinks,The J. Geils Band
4,Funkytown,"Lipps, Inc"
...,...,...
4530,Appreciate It,2Baba
4531,Bohemian Rhapsody,Queen
4532,Points of Authority,Linkin Park
4533,You Give Love A Bad Name,Bon Jovi


### Extra testing for the future
I want to try to find extra info from the name and the artist of a song using Spotify API.
I will use the 'search' endpoint to find the id for a given song.

In [97]:
access_token = spotify_token()

headers = {
    'Authorization': 'Bearer {token}'.format(token=access_token)
}

r = requests.get(BASE_URL + 'search?type=track&q=track:Heathens&artist:Twenty%20One%20Pilots',headers=headers)
r = r.json()
# r

The test works for an explicit query but accessing the id (or the URI of the song) requires several transformations.   
Let's see if I can make the query with variables:

In [98]:
access_token = spotify_token()

headers = {
    'Authorization': 'Bearer {token}'.format(token=access_token)
}
song='Heathens'
artist='Twenty One Pilots'


r = requests.get(BASE_URL + 'search?type=track&limit=1&q=track:'+song+'&artist:'+artist,headers=headers)
r = r.json()
# r

In [47]:
df = pd.DataFrame(r).transpose()
df = pd.json_normalize(df['items'])
df = pd.json_normalize(df[0])
df['uri'][0]

'spotify:track:6i0V12jOa3mr6uu4WYhUBr'

With both tests, I think I can add the URI to the dataframe.

In [48]:
from random import randint
from time import sleep

def gathering_uri(data):
    id = []
    for i, j in data.iterrows():
        song = j['song']
        # print(j)
        artist = j['artist']

        access_token = spotify_token()

        headers = {
            'Authorization': 'Bearer {token}'.format(token=access_token)
        }
        
        try:
            r = requests.get(BASE_URL + 'search?type=track&limit=1&q=track:'+song+'&artist:'+artist,headers=headers)
            r = r.json()
            df = pd.DataFrame(r).transpose()
            df = pd.json_normalize(df['items'])
        
            df = pd.json_normalize(df[0])

            id.append(df['uri'][0])
        
        except:
            id.append(0)

        sleep(randint(1,3))
            
    data['id'] = id
    return data



In [51]:
datasmall = data.head(2)


datasmall = gathering_uri(datasmall)

song      Keep On Loving You
artist        REO Speedwagon
Name: 0, dtype: object
song      Don't Stand So Close to Me
artist                    The Police
Name: 1, dtype: object


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['id'] = id


In [52]:
datasmall

Unnamed: 0,song,artist,id
0,Keep On Loving You,REO Speedwagon,spotify:track:4rcHWl68ai6KvpXlc8vbnE
1,Don't Stand So Close to Me,The Police,spotify:track:4ONpHdSfzrrzMQZaOkYDQ0


In [53]:
# data = gathering_uri(data)
# This will take too long

In [54]:
data

Unnamed: 0,song,artist
0,Keep On Loving You,REO Speedwagon
1,Don't Stand So Close to Me,The Police
2,Another Brick in the Wall,Pink Floyd
3,Love Stinks,The J. Geils Band
4,Funkytown,"Lipps, Inc"
...,...,...
4530,Appreciate It,2Baba
4531,Bohemian Rhapsody,Queen
4532,Points of Authority,Linkin Park
4533,You Give Love A Bad Name,Bon Jovi


# Lab 4
## Spotipy and audio features
I will gather the URIs from Spotipy instead of the function I already used.

In [55]:
spotify_songs = data[['song', 'artist']]
spotify_songs

Unnamed: 0,song,artist
0,Keep On Loving You,REO Speedwagon
1,Don't Stand So Close to Me,The Police
2,Another Brick in the Wall,Pink Floyd
3,Love Stinks,The J. Geils Band
4,Funkytown,"Lipps, Inc"
...,...,...
4530,Appreciate It,2Baba
4531,Bohemian Rhapsody,Queen
4532,Points of Authority,Linkin Park
4533,You Give Love A Bad Name,Bon Jovi


In [56]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

from random import randint
from time import sleep


In [57]:
#Initialize SpotiPy with user credentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['cid'],
                                                           client_secret=secrets_dict['cs']))

### Adding more songs to the dataframe
For this I will use the example playlist

In [59]:
# results = sp.user_playlist_tracks("spotify",'1fKJQ8SddFYDNRetwubZAP')
results = sp.playlist_tracks('1fKJQ8SddFYDNRetwubZAP')

In [60]:
results["items"][0]["track"].keys()

dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'episode', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track', 'track_number', 'type', 'uri'])

In [61]:
results["items"][0]["track"]['name']

'Crowd Chant'

In [62]:
results["items"][0]["track"]['artists'][0]['name']

'Joe Satriani'

In [63]:
results["items"][0]["track"]['uri']

'spotify:track:0bz67HYKfiuUj1xhsK5ofT'

In [64]:
# Creates a dataframe from a given playlist
def get_df_from_small_playlist(playlist_id):
    songs = []
    artists = []
    uri = []

    results = sp.playlist_tracks(playlist_id)
    for item in results["items"]:
        songs.append(item["track"]['name'])
        names = []
        for name in item["track"]['artists']:
            names.append(name['name'])
        artists.append(names)
        uri.append(item["track"]['uri'])

    df = pd.DataFrame({'song':songs,
                      'artist':artists,
                      'uri':uri
                    })

    return df

In [65]:
df1 = get_df_from_small_playlist('1fKJQ8SddFYDNRetwubZAP')

In [66]:
df1

Unnamed: 0,song,artist,uri
0,Crowd Chant,[Joe Satriani],spotify:track:0bz67HYKfiuUj1xhsK5ofT
1,Dearly Beloved (acoustic),[Bad Religion],spotify:track:12hcRD1krqVzhx19EWwH2e
2,Here We Are Juggernaut,[Coheed and Cambria],spotify:track:39Tv0jWHgfCYgWD2sqaqqK
3,When Skeletons Live,[Coheed and Cambria],spotify:track:3bXgZT3y4NbtoKXqJRkXfD
4,Black Betty,[Ram Jam],spotify:track:4FFKYMQcqGIKLp4pJRdkbm
...,...,...,...
70,Iron,[Woodkid],spotify:track:4rPCgwmCgef78nrsqwjA7G
71,Bones,[Young Guns],spotify:track:2BOfbXKOlLwVXqUrCdQAfF
72,Not Strong Enough (feat. Doug Robb),"[Apocalyptica, Doug Robb]",spotify:track:6BpU4wDm79Tx4mOzmmzrUy
73,Lateralus,[Break of Reality],spotify:track:3QNSOch1GUP0m5GVZ1F2Fp


In [69]:
def get_df_from_playlist(playlist_id):
    songs = []
    artists = []
    uri = []

    results = sp.playlist_tracks(playlist_id)
    while results['next']!=None:
        for item in results["items"]:
            songs.append(item["track"]['name'])
            names = []
            for name in item["track"]['artists']:
                names.append(name['name'])
            artists.append(names)
            uri.append(item["track"]['uri'])
        sleep(randint(1,3))
        results = sp.next(results)

    df = pd.DataFrame({'song':songs,
                      'artist':artists,
                      'uri':uri
                    })

    return df

    #I need to check the paging part of the function because it looses the last page

In [70]:
df2 = get_df_from_playlist('4rnleEAOdmFAbRcNCgZMpY')

In [71]:
df2

Unnamed: 0,song,artist,uri
0,Take Me To Church,[Hozier],spotify:track:7dS5EaCoMnN7DzlpT6aRn2
1,Cooler Than Me - Single Mix,"[Mike Posner, Gigamesh]",spotify:track:2V4bv1fNWfTcyRJKmej6Sj
2,See You Again (feat. Kali Uchis),"[Tyler, The Creator, Kali Uchis]",spotify:track:7KA4W4McWYRpgf0fWsJZWB
3,Pompeii,[Bastille],spotify:track:3gbBpTdY8lnQwqxNCcf795
4,Hips Don't Lie (feat. Wyclef Jean),"[Shakira, Wyclef Jean]",spotify:track:3ZFTkvIE7kyPt6Nu3PEa7V
...,...,...,...
5195,Are You Ready - 2008 Remix Radio Edit,[ABREU],spotify:track:50Vy67K5C3MyHTG7CtDBvL
5196,How Could You Do It,[ABREU],spotify:track:2R3fHXsY6e7eRVUVcDCle0
5197,Faster,[Within Temptation],spotify:track:28HX0PibeXSL6cfAsvwSgD
5198,Ethän unoha mua,[Aaro630],spotify:track:4XR0QLl3tYshaqLKsWNhNP


And now it can work with playlist bigger than 100.  
Now I can fetch the Audio features.

In [72]:
features = sp.audio_features('spotify:track:3SawmGBjjq8EOYZJV11cJm')
features[0]


{'danceability': 0.47,
 'energy': 0.709,
 'key': 7,
 'loudness': -4.563,
 'mode': 1,
 'speechiness': 0.0299,
 'acousticness': 0.00124,
 'instrumentalness': 0.00121,
 'liveness': 0.302,
 'valence': 0.247,
 'tempo': 98.036,
 'type': 'audio_features',
 'id': '3SawmGBjjq8EOYZJV11cJm',
 'uri': 'spotify:track:3SawmGBjjq8EOYZJV11cJm',
 'track_href': 'https://api.spotify.com/v1/tracks/3SawmGBjjq8EOYZJV11cJm',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/3SawmGBjjq8EOYZJV11cJm',
 'duration_ms': 241360,
 'time_signature': 4}

In [73]:
def find_features(uris):
    features = []
    for uri in uris:
        try:
            features.append(sp.audio_features(uri)[0])
        except:
            features.append(0)
    return features
    

In [None]:
hide = find_features(df1['uri'])
# hide

I need to be able to find the URI from a song name

In [77]:
def song_uri(song):
    q='track:'+ song# +'&artist:'+ artists
    uri = sp.search(q=q, limit=1)['tracks']['items'][0]['uri']

    return uri

In [78]:
song_uri('Bohemian Rhapsody')

'spotify:track:7tFiyTwD0nx5a1eklYtX2J'

In [79]:
def find_uris(df):
    for row in df:
        if pd.isna(df.uri):
            df.uri=song_uri(df.song)

    return df

I will use this function at a later time (if ever, it takes too long)

### Due to issues with the old data, I need to work only with data downloaded from spotipy.

In [80]:
df3 = get_df_from_playlist('4tImrheWTyHdC4igyEKybb')
df4 = get_df_from_playlist('31ymdYCITDnZRtkKzP3Itp')
df5 = get_df_from_playlist('2NfTM2df5tHVUquNwet0yB')
df6 = get_df_from_playlist('37i9dQZF1DWXNFSTtym834')

In [81]:
spotify = pd.concat([df, df1, df2, df3, df4, df5, df6], ignore_index=True)

In [82]:
spotify

Unnamed: 0,song,artist,uri
0,Take Me To Church,[Hozier],spotify:track:7dS5EaCoMnN7DzlpT6aRn2
1,Cooler Than Me - Single Mix,"[Mike Posner, Gigamesh]",spotify:track:2V4bv1fNWfTcyRJKmej6Sj
2,See You Again (feat. Kali Uchis),"[Tyler, The Creator, Kali Uchis]",spotify:track:7KA4W4McWYRpgf0fWsJZWB
3,Pompeii,[Bastille],spotify:track:3gbBpTdY8lnQwqxNCcf795
4,Hips Don't Lie (feat. Wyclef Jean),"[Shakira, Wyclef Jean]",spotify:track:3ZFTkvIE7kyPt6Nu3PEa7V
...,...,...,...
12870,Unsafe,"[Apashe, Phace]",spotify:track:3FY2AnmOs09L5S1b6E9JMt
12871,Puttin on the Ritz,"[Apashe, Ariane Zita]",spotify:track:6YBLGysCR4l3x4IJ3J0PbT
12872,Fire Inside - Funky VIP,"[Apashe, RIOT]",spotify:track:7KCa3tur8bgVQYu043eEZ5
12873,When The Lights Go Down (feat. Cody Simpson),"[DVBBS, Galantis, Cody Simpson]",spotify:track:3J3EdH1ZgXqk1ROTzQOF0U


In [83]:
features = find_features(spotify['uri'])

Expected id of type track but found type Apologize spotify:local:OneRepublic+%26+Timbaland:Dreaming+Out+Loud:Apologize:185
Expected id of type track but found type Circles spotify:local:Hollywood+Undead:Swan+Songs+%28Collector%E2%80%99s+Edition%29:Circles:206
Expected id of type track but found type M.I.A.+-+DBT+%28XIUS+LI%D0%98K+Remix%29 spotify:local:::M.I.A.+-+DBT+%28XIUS+LI%D0%98K+Remix%29:237


In [99]:
# features

In [85]:
spotify['features'] = features

In [86]:
spotify

Unnamed: 0,song,artist,uri,features
0,Take Me To Church,[Hozier],spotify:track:7dS5EaCoMnN7DzlpT6aRn2,"{'danceability': 0.566, 'energy': 0.664, 'key'..."
1,Cooler Than Me - Single Mix,"[Mike Posner, Gigamesh]",spotify:track:2V4bv1fNWfTcyRJKmej6Sj,"{'danceability': 0.768, 'energy': 0.82, 'key':..."
2,See You Again (feat. Kali Uchis),"[Tyler, The Creator, Kali Uchis]",spotify:track:7KA4W4McWYRpgf0fWsJZWB,"{'danceability': 0.558, 'energy': 0.559, 'key'..."
3,Pompeii,[Bastille],spotify:track:3gbBpTdY8lnQwqxNCcf795,"{'danceability': 0.679, 'energy': 0.715, 'key'..."
4,Hips Don't Lie (feat. Wyclef Jean),"[Shakira, Wyclef Jean]",spotify:track:3ZFTkvIE7kyPt6Nu3PEa7V,"{'danceability': 0.778, 'energy': 0.824, 'key'..."
...,...,...,...,...
12870,Unsafe,"[Apashe, Phace]",spotify:track:3FY2AnmOs09L5S1b6E9JMt,"{'danceability': 0.647, 'energy': 0.94, 'key':..."
12871,Puttin on the Ritz,"[Apashe, Ariane Zita]",spotify:track:6YBLGysCR4l3x4IJ3J0PbT,"{'danceability': 0.849, 'energy': 0.883, 'key'..."
12872,Fire Inside - Funky VIP,"[Apashe, RIOT]",spotify:track:7KCa3tur8bgVQYu043eEZ5,"{'danceability': 0.732, 'energy': 0.945, 'key'..."
12873,When The Lights Go Down (feat. Cody Simpson),"[DVBBS, Galantis, Cody Simpson]",spotify:track:3J3EdH1ZgXqk1ROTzQOF0U,"{'danceability': 0.738, 'energy': 0.713, 'key'..."


In [87]:
spotify.features.isna().sum()

4

In [88]:
spotify.dropna(subset = ["features"], inplace=True)

In [89]:
spotify = pd.concat([spotify.drop(['features'], axis=1), spotify['features'].apply(pd.Series)], axis=1)       

In [90]:
spotify

Unnamed: 0,song,artist,uri,danceability,energy,key,loudness,mode,speechiness,acousticness,...,liveness,valence,tempo,type,id,uri.1,track_href,analysis_url,duration_ms,time_signature
0,Take Me To Church,[Hozier],spotify:track:7dS5EaCoMnN7DzlpT6aRn2,0.566,0.664,4,-5.303,0,0.0464,0.634000,...,0.1160,0.437,128.945,audio_features,7dS5EaCoMnN7DzlpT6aRn2,spotify:track:7dS5EaCoMnN7DzlpT6aRn2,https://api.spotify.com/v1/tracks/7dS5EaCoMnN7...,https://api.spotify.com/v1/audio-analysis/7dS5...,241688,4
1,Cooler Than Me - Single Mix,"[Mike Posner, Gigamesh]",spotify:track:2V4bv1fNWfTcyRJKmej6Sj,0.768,0.820,7,-4.630,0,0.0474,0.179000,...,0.6890,0.625,129.965,audio_features,2V4bv1fNWfTcyRJKmej6Sj,spotify:track:2V4bv1fNWfTcyRJKmej6Sj,https://api.spotify.com/v1/tracks/2V4bv1fNWfTc...,https://api.spotify.com/v1/audio-analysis/2V4b...,213293,4
2,See You Again (feat. Kali Uchis),"[Tyler, The Creator, Kali Uchis]",spotify:track:7KA4W4McWYRpgf0fWsJZWB,0.558,0.559,6,-9.222,1,0.0959,0.371000,...,0.1090,0.620,78.558,audio_features,7KA4W4McWYRpgf0fWsJZWB,spotify:track:7KA4W4McWYRpgf0fWsJZWB,https://api.spotify.com/v1/tracks/7KA4W4McWYRp...,https://api.spotify.com/v1/audio-analysis/7KA4...,180387,4
3,Pompeii,[Bastille],spotify:track:3gbBpTdY8lnQwqxNCcf795,0.679,0.715,9,-6.383,1,0.0407,0.075500,...,0.2710,0.571,127.435,audio_features,3gbBpTdY8lnQwqxNCcf795,spotify:track:3gbBpTdY8lnQwqxNCcf795,https://api.spotify.com/v1/tracks/3gbBpTdY8lnQ...,https://api.spotify.com/v1/audio-analysis/3gbB...,214148,4
4,Hips Don't Lie (feat. Wyclef Jean),"[Shakira, Wyclef Jean]",spotify:track:3ZFTkvIE7kyPt6Nu3PEa7V,0.778,0.824,10,-5.892,0,0.0707,0.284000,...,0.4050,0.758,100.024,audio_features,3ZFTkvIE7kyPt6Nu3PEa7V,spotify:track:3ZFTkvIE7kyPt6Nu3PEa7V,https://api.spotify.com/v1/tracks/3ZFTkvIE7kyP...,https://api.spotify.com/v1/audio-analysis/3ZFT...,218093,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12870,Unsafe,"[Apashe, Phace]",spotify:track:3FY2AnmOs09L5S1b6E9JMt,0.647,0.940,7,-4.505,1,0.1530,0.000617,...,0.1140,0.166,172.071,audio_features,3FY2AnmOs09L5S1b6E9JMt,spotify:track:3FY2AnmOs09L5S1b6E9JMt,https://api.spotify.com/v1/tracks/3FY2AnmOs09L...,https://api.spotify.com/v1/audio-analysis/3FY2...,249500,4
12871,Puttin on the Ritz,"[Apashe, Ariane Zita]",spotify:track:6YBLGysCR4l3x4IJ3J0PbT,0.849,0.883,2,-4.464,1,0.1710,0.020400,...,0.1000,0.611,115.028,audio_features,6YBLGysCR4l3x4IJ3J0PbT,spotify:track:6YBLGysCR4l3x4IJ3J0PbT,https://api.spotify.com/v1/tracks/6YBLGysCR4l3...,https://api.spotify.com/v1/audio-analysis/6YBL...,251478,4
12872,Fire Inside - Funky VIP,"[Apashe, RIOT]",spotify:track:7KCa3tur8bgVQYu043eEZ5,0.732,0.945,1,-3.379,1,0.0331,0.000245,...,0.0940,0.557,125.982,audio_features,7KCa3tur8bgVQYu043eEZ5,spotify:track:7KCa3tur8bgVQYu043eEZ5,https://api.spotify.com/v1/tracks/7KCa3tur8bgV...,https://api.spotify.com/v1/audio-analysis/7KCa...,187619,4
12873,When The Lights Go Down (feat. Cody Simpson),"[DVBBS, Galantis, Cody Simpson]",spotify:track:3J3EdH1ZgXqk1ROTzQOF0U,0.738,0.713,1,-5.977,0,0.0444,0.099900,...,0.0542,0.490,126.077,audio_features,3J3EdH1ZgXqk1ROTzQOF0U,spotify:track:3J3EdH1ZgXqk1ROTzQOF0U,https://api.spotify.com/v1/tracks/3J3EdH1ZgXqk...,https://api.spotify.com/v1/audio-analysis/3J3E...,189266,4


In [91]:
spotify_df = spotify.copy()
spotify_df.to_csv('spotify.csv', index=False)