# <center>Spotify Audio Features Extraction</center>

Our dataset currently has some information about each song title such as artists, Spotify id, and popularity, but there's a lot more features that we can add on to each of the songs in our dataset. Luckily, Spotify has audio features available for each song, so we will extract the audio features with Spotify's Web API.

More info on Spotify's audio features: https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/

# Setup

In [48]:
import pandas as pd

import spotipy # for Spotify's Web API
from spotipy.oauth2 import SpotifyClientCredentials

In [2]:
file = 'Spotify.txt'

# store client id and client secret
with open(file,'r') as f: 
    f = f.read().splitlines()
    cid = f[0]
    secret = f[1]
    
# authenticate requests to the Spotify Web API
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

## Read csv file

In [51]:
# read the csv file containing each song information
music_data = pd.read_csv('spotify_ids.csv', encoding='ISO-8859-1')
music_data.drop('Unnamed: 0',axis=1,inplace=True)

In [52]:
music_data

Unnamed: 0,rank,title,artist,other artists,year,Spotify id,Popularity
0,1,Theme from A Summer Place,Percy Faith,,1960,6zwvB879PJSpTyFcg2wwnL,57
1,2,He'll Have to Go,Jim Reeves,,1960,7dDE59NX0n466e705E8Itz,18
2,3,Cathy's Clown,The Everly Brothers,,1960,1MA9StLzlFftLbuqOmoWij,52
3,4,Running Bear,Johnny Preston,,1960,1RYznli2VNO7FCbW1Hq4KM,39
4,5,Teen Angel,Mark Dinning,,1960,36NPEs4S7ik50NrlzaqoIJ,31
...,...,...,...,...,...,...,...
6093,96,More Than My Hometown,Morgan Wallen,,2020,0eBXyY4SatzpE7opnzgXvz,83
6094,97,Lovin' on You,Luke Combs,,2020,0nYvjcSlCgjcwogQAwIwNp,76
6095,98,Said Sum,Moneybagg Yo,,2020,3sKz6Sd72K0ofPWcJPPk6H,75
6096,99,Slide,H.E.R.,YG,2020,2rTnVB1bvwxHtaIl4uVu7f,77


# Extracting Audio Features

## Setup

Now that we have everything set up, let's take a look at what audio features are available from Spotify. We'll test this out on a song from our dataframe. We'll choose the first song from our dataframe called "Theme from a Summer Place" by Percy Faith. To get the audio features, we'll need to use its Spotify id.

In [55]:
# grab spotify id of song
song_id = music_data.loc[0]['Spotify id']
print(song_id)

6zwvB879PJSpTyFcg2wwnL


In [57]:
# extract audio features for our chosen song
sp.audio_features(song_id)

[{'danceability': 0.326,
  'energy': 0.326,
  'key': 0,
  'loudness': -15.144,
  'mode': 1,
  'speechiness': 0.0297,
  'acousticness': 0.591,
  'instrumentalness': 0.918,
  'liveness': 0.494,
  'valence': 0.87,
  'tempo': 186.232,
  'type': 'audio_features',
  'id': '6zwvB879PJSpTyFcg2wwnL',
  'uri': 'spotify:track:6zwvB879PJSpTyFcg2wwnL',
  'track_href': 'https://api.spotify.com/v1/tracks/6zwvB879PJSpTyFcg2wwnL',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/6zwvB879PJSpTyFcg2wwnL',
  'duration_ms': 142693,
  'time_signature': 3}]

Next, we will grab the names of the audio features and insert an empty list for storing the values to each of the audio features.

In [90]:
# create list of names of the audio features
features_keys = list(features[0].keys())

In [91]:
features_keys

['danceability',
 'energy',
 'key',
 'loudness',
 'mode',
 'speechiness',
 'acousticness',
 'instrumentalness',
 'liveness',
 'valence',
 'tempo',
 'type',
 'id',
 'uri',
 'track_href',
 'analysis_url',
 'duration_ms',
 'time_signature']

In [102]:
# initialize empty dictionary 
features_dict = {} 

# add empty list to each key
for key in features_keys:
    features_dict[key] = []

In [103]:
features_dict

{'danceability': [],
 'energy': [],
 'key': [],
 'loudness': [],
 'mode': [],
 'speechiness': [],
 'acousticness': [],
 'instrumentalness': [],
 'liveness': [],
 'valence': [],
 'tempo': [],
 'type': [],
 'id': [],
 'uri': [],
 'track_href': [],
 'analysis_url': [],
 'duration_ms': [],
 'time_signature': []}

## Using Spotify Web API

We were sucessfully able to get the audio features for a song, along with initializing a dictionary that will store each audio feature and its value. Now let's get the audio features for all the songs in our data.

In [99]:
# initialize empty list of audio features of each song
features = []

# get audio features for every song in data
for id_ in music_data['Spotify id']:
    features.append(sp.audio_features(id_)[0])

In [101]:
# taking a look at the features for a song
features[0]

{'danceability': 0.326,
 'energy': 0.326,
 'key': 0,
 'loudness': -15.144,
 'mode': 1,
 'speechiness': 0.0297,
 'acousticness': 0.591,
 'instrumentalness': 0.918,
 'liveness': 0.494,
 'valence': 0.87,
 'tempo': 186.232,
 'type': 'audio_features',
 'id': '6zwvB879PJSpTyFcg2wwnL',
 'uri': 'spotify:track:6zwvB879PJSpTyFcg2wwnL',
 'track_href': 'https://api.spotify.com/v1/tracks/6zwvB879PJSpTyFcg2wwnL',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/6zwvB879PJSpTyFcg2wwnL',
 'duration_ms': 142693,
 'time_signature': 3}

In [126]:
item = None

# insert the values for each song feature
for song in features:
    item = song.items()
    for key,value in item:
        features_dict[key].append(value)

## Adding audio features to dataframe

We now have all the audio features, so let's add these to our dataframe.

In [131]:
for key in features_dict.keys():
    music_data[key] = features_dict[key]

In [132]:
music_data

Unnamed: 0,rank,title,artist,other artists,year,Spotify id,Popularity,danceability,energy,key,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,1,Theme from A Summer Place,Percy Faith,,1960,6zwvB879PJSpTyFcg2wwnL,57,0.326,0.3260,0,...,0.4940,0.870,186.232,audio_features,6zwvB879PJSpTyFcg2wwnL,spotify:track:6zwvB879PJSpTyFcg2wwnL,https://api.spotify.com/v1/tracks/6zwvB879PJSp...,https://api.spotify.com/v1/audio-analysis/6zwv...,142693,3
1,2,He'll Have to Go,Jim Reeves,,1960,7dDE59NX0n466e705E8Itz,18,0.469,0.2440,1,...,0.8450,0.577,88.271,audio_features,7dDE59NX0n466e705E8Itz,spotify:track:7dDE59NX0n466e705E8Itz,https://api.spotify.com/v1/tracks/7dDE59NX0n46...,https://api.spotify.com/v1/audio-analysis/7dDE...,134333,3
2,3,Cathy's Clown,The Everly Brothers,,1960,1MA9StLzlFftLbuqOmoWij,52,0.498,0.5820,7,...,0.3720,0.866,119.809,audio_features,1MA9StLzlFftLbuqOmoWij,spotify:track:1MA9StLzlFftLbuqOmoWij,https://api.spotify.com/v1/tracks/1MA9StLzlFft...,https://api.spotify.com/v1/audio-analysis/1MA9...,144013,4
3,4,Running Bear,Johnny Preston,,1960,1RYznli2VNO7FCbW1Hq4KM,39,0.772,0.2970,5,...,0.1250,0.822,119.987,audio_features,1RYznli2VNO7FCbW1Hq4KM,spotify:track:1RYznli2VNO7FCbW1Hq4KM,https://api.spotify.com/v1/tracks/1RYznli2VNO7...,https://api.spotify.com/v1/audio-analysis/1RYz...,158200,4
4,5,Teen Angel,Mark Dinning,,1960,36NPEs4S7ik50NrlzaqoIJ,31,0.570,0.0636,0,...,0.1220,0.282,101.517,audio_features,36NPEs4S7ik50NrlzaqoIJ,spotify:track:36NPEs4S7ik50NrlzaqoIJ,https://api.spotify.com/v1/tracks/36NPEs4S7ik5...,https://api.spotify.com/v1/audio-analysis/36NP...,159893,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6093,96,More Than My Hometown,Morgan Wallen,,2020,0eBXyY4SatzpE7opnzgXvz,83,0.621,0.8820,6,...,0.1320,0.574,126.014,audio_features,0eBXyY4SatzpE7opnzgXvz,spotify:track:0eBXyY4SatzpE7opnzgXvz,https://api.spotify.com/v1/tracks/0eBXyY4Satzp...,https://api.spotify.com/v1/audio-analysis/0eBX...,216573,4
6094,97,Lovin' on You,Luke Combs,,2020,0nYvjcSlCgjcwogQAwIwNp,76,0.572,0.9490,4,...,0.1630,0.530,118.974,audio_features,0nYvjcSlCgjcwogQAwIwNp,spotify:track:0nYvjcSlCgjcwogQAwIwNp,https://api.spotify.com/v1/tracks/0nYvjcSlCgjc...,https://api.spotify.com/v1/audio-analysis/0nYv...,194867,4
6095,98,Said Sum,Moneybagg Yo,,2020,3sKz6Sd72K0ofPWcJPPk6H,75,0.929,0.6670,8,...,0.1000,0.274,126.998,audio_features,3sKz6Sd72K0ofPWcJPPk6H,spotify:track:3sKz6Sd72K0ofPWcJPPk6H,https://api.spotify.com/v1/tracks/3sKz6Sd72K0o...,https://api.spotify.com/v1/audio-analysis/3sKz...,155168,4
6096,99,Slide,H.E.R.,YG,2020,2rTnVB1bvwxHtaIl4uVu7f,77,0.827,0.4690,10,...,0.2070,0.196,97.028,audio_features,2rTnVB1bvwxHtaIl4uVu7f,spotify:track:2rTnVB1bvwxHtaIl4uVu7f,https://api.spotify.com/v1/tracks/2rTnVB1bvwxH...,https://api.spotify.com/v1/audio-analysis/2rTn...,238321,4


In [135]:
music_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6098 entries, 0 to 6097
Data columns (total 25 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   rank              6098 non-null   object 
 1   title             6098 non-null   object 
 2   artist            6098 non-null   object 
 3   other artists     6098 non-null   object 
 4   year              6098 non-null   int64  
 5   Spotify id        6098 non-null   object 
 6   Popularity        6098 non-null   int64  
 7   danceability      6098 non-null   float64
 8   energy            6098 non-null   float64
 9   key               6098 non-null   int64  
 10  loudness          6098 non-null   float64
 11  mode              6098 non-null   int64  
 12  speechiness       6098 non-null   float64
 13  acousticness      6098 non-null   float64
 14  instrumentalness  6098 non-null   float64
 15  liveness          6098 non-null   float64
 16  valence           6098 non-null   float64


## Cleaning Data

There are some interesting audio features shown, but there are also some that aren't as useful such as id and uri. We'll drop the columns that we won't need.

### Dropping columns

In [139]:
music_data = music_data.drop(['Spotify id','type','id','uri','track_href','analysis_url'], axis = 1)

In [140]:
music_data

Unnamed: 0,rank,title,artist,other artists,year,Spotify id,Popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,1,Theme from A Summer Place,Percy Faith,,1960,6zwvB879PJSpTyFcg2wwnL,57,0.326,0.3260,0,-15.144,1,0.0297,0.59100,0.918000,0.4940,0.870,186.232,142693,3
1,2,He'll Have to Go,Jim Reeves,,1960,7dDE59NX0n466e705E8Itz,18,0.469,0.2440,1,-15.702,1,0.0357,0.97400,0.016200,0.8450,0.577,88.271,134333,3
2,3,Cathy's Clown,The Everly Brothers,,1960,1MA9StLzlFftLbuqOmoWij,52,0.498,0.5820,7,-8.961,1,0.0339,0.41200,0.000000,0.3720,0.866,119.809,144013,4
3,4,Running Bear,Johnny Preston,,1960,1RYznli2VNO7FCbW1Hq4KM,39,0.772,0.2970,5,-14.679,0,0.0530,0.85400,0.000008,0.1250,0.822,119.987,158200,4
4,5,Teen Angel,Mark Dinning,,1960,36NPEs4S7ik50NrlzaqoIJ,31,0.570,0.0636,0,-18.548,1,0.0459,0.93600,0.000000,0.1220,0.282,101.517,159893,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6093,96,More Than My Hometown,Morgan Wallen,,2020,0eBXyY4SatzpE7opnzgXvz,83,0.621,0.8820,6,-5.010,1,0.0459,0.60100,0.000000,0.1320,0.574,126.014,216573,4
6094,97,Lovin' on You,Luke Combs,,2020,0nYvjcSlCgjcwogQAwIwNp,76,0.572,0.9490,4,-4.865,1,0.0600,0.00165,0.000195,0.1630,0.530,118.974,194867,4
6095,98,Said Sum,Moneybagg Yo,,2020,3sKz6Sd72K0ofPWcJPPk6H,75,0.929,0.6670,8,-6.789,0,0.3530,0.01850,0.000000,0.1000,0.274,126.998,155168,4
6096,99,Slide,H.E.R.,YG,2020,2rTnVB1bvwxHtaIl4uVu7f,77,0.827,0.4690,10,-9.141,0,0.3410,0.08070,0.000008,0.2070,0.196,97.028,238321,4


### Creating and adding new column

There's one more thing I'd like to add to this column, and that's converting the duration of the song from ms to seconds. The column duration_ms has the duration length of the song in milliseconds, so converting to seconds would be much easier to interpret.

In [151]:
# convert ms to s
music_data['duration_ms'] = music_data['duration_ms'] / 1000.0

# rename column
music_data = music_data.rename(columns={'duration_ms': 'duration'})

## Final look at dataframe

In [152]:
music_data

Unnamed: 0,rank,title,artist,other artists,year,Spotify id,Popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,time_signature
0,1,Theme from A Summer Place,Percy Faith,,1960,6zwvB879PJSpTyFcg2wwnL,57,0.326,0.3260,0,-15.144,1,0.0297,0.59100,0.918000,0.4940,0.870,186.232,142.693,3
1,2,He'll Have to Go,Jim Reeves,,1960,7dDE59NX0n466e705E8Itz,18,0.469,0.2440,1,-15.702,1,0.0357,0.97400,0.016200,0.8450,0.577,88.271,134.333,3
2,3,Cathy's Clown,The Everly Brothers,,1960,1MA9StLzlFftLbuqOmoWij,52,0.498,0.5820,7,-8.961,1,0.0339,0.41200,0.000000,0.3720,0.866,119.809,144.013,4
3,4,Running Bear,Johnny Preston,,1960,1RYznli2VNO7FCbW1Hq4KM,39,0.772,0.2970,5,-14.679,0,0.0530,0.85400,0.000008,0.1250,0.822,119.987,158.200,4
4,5,Teen Angel,Mark Dinning,,1960,36NPEs4S7ik50NrlzaqoIJ,31,0.570,0.0636,0,-18.548,1,0.0459,0.93600,0.000000,0.1220,0.282,101.517,159.893,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6093,96,More Than My Hometown,Morgan Wallen,,2020,0eBXyY4SatzpE7opnzgXvz,83,0.621,0.8820,6,-5.010,1,0.0459,0.60100,0.000000,0.1320,0.574,126.014,216.573,4
6094,97,Lovin' on You,Luke Combs,,2020,0nYvjcSlCgjcwogQAwIwNp,76,0.572,0.9490,4,-4.865,1,0.0600,0.00165,0.000195,0.1630,0.530,118.974,194.867,4
6095,98,Said Sum,Moneybagg Yo,,2020,3sKz6Sd72K0ofPWcJPPk6H,75,0.929,0.6670,8,-6.789,0,0.3530,0.01850,0.000000,0.1000,0.274,126.998,155.168,4
6096,99,Slide,H.E.R.,YG,2020,2rTnVB1bvwxHtaIl4uVu7f,77,0.827,0.4690,10,-9.141,0,0.3410,0.08070,0.000008,0.2070,0.196,97.028,238.321,4


In [153]:
music_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6098 entries, 0 to 6097
Data columns (total 20 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   rank              6098 non-null   object 
 1   title             6098 non-null   object 
 2   artist            6098 non-null   object 
 3   other artists     6098 non-null   object 
 4   year              6098 non-null   int64  
 5   Spotify id        6098 non-null   object 
 6   Popularity        6098 non-null   int64  
 7   danceability      6098 non-null   float64
 8   energy            6098 non-null   float64
 9   key               6098 non-null   int64  
 10  loudness          6098 non-null   float64
 11  mode              6098 non-null   int64  
 12  speechiness       6098 non-null   float64
 13  acousticness      6098 non-null   float64
 14  instrumentalness  6098 non-null   float64
 15  liveness          6098 non-null   float64
 16  valence           6098 non-null   float64


Our data cleaning is complete, and we are ready to convert this into a csv file.

In [154]:
music_data.to_csv('spotify_complete.csv')