![]()

# Who Do You Sound Like?
### Notebook 2: Feature Engineering & Metric Comparisons
#### Adam Zucker

---

## Contents

---
- **Section 1:** Package and data imports
- **Section 2:** Comparing Spotify metrics with Librosa metrics
- **Section 3:** Engineering conversions between Librosa and Spotify
- **Section 4:** Data exports

---
### Section 1
#### Imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy as sp
import seaborn as sns

import os
import IPython.display as ipd

import spotipy as sp
import tekore as tk
import librosa as lib
import librosa.display as libd

---

**BELOW:** Importing cleaned Spotify song dataset, sourced from [Kaggle](https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks?select=data.csv). Some brief descriptions of less tangible features, as defined by [Spotify](https://developer.spotify.com/documentation/web-api/reference/):
- **Acousticness:** A confidence measure from 0.0 to 1.0 of whether the track is acoustic.
- **Danceability:** How suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. 
- **Energy:** A measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
- **Instrumentalness:** Predicts whether a track contains vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
- **Liveness:** Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
- **Loudness:** The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 dB.
- **Popularity:** The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g., the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.
- **Speechiness:** Detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g., talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
- **Valence:** A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g., happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g., sad, depressed, angry).

---

In [2]:
df = pd.read_csv('../data_clean/spotify_kg_master.csv')
df.head()

Unnamed: 0,name,artists,tempo,key,mode,full_key,A minor,A# major,A# minor,B major,...,energy,instrumentalness,speechiness,acousticness,danceability,valence,popularity,liveness,year,id
0,Thunderstruck,['AC/DC'],133.5,4,1,E major,0,0,0,0,...,0.89,0.0117,0.0364,0.000147,0.502,0.259,83,0.217,1990,57bgtoPSgt236HzfBOd8kj
1,Blue Blooded Woman,['Alan Jackson'],164.4,4,1,E major,0,0,0,0,...,0.715,0.00169,0.0306,0.124,0.599,0.947,25,0.0855,1990,7D8gA10pibmtMDpVf1rlic
2,The Gift of Love,['Bette Midler'],157.5,8,1,G# major,0,0,0,0,...,0.467,0.0,0.0287,0.359,0.486,0.286,38,0.11,1990,7FUc1xVSKvABmVwI6kS5Y4
3,Binibi Rocha - Live,['Andrew E.'],120.1,7,1,G major,0,0,0,0,...,0.837,0.0,0.136,0.484,0.978,0.966,41,0.701,1990,7MwGWKdDGeop9D8bZN37hc
4,Thelma - Bonus Track,['Paul Simon'],94.0,5,1,F major,0,0,0,0,...,0.529,0.0845,0.077,0.872,0.71,0.882,29,0.093,1990,7pcEC5r1jVqWGRypo9D7f7


In [3]:
df.shape

(57453, 42)

In [4]:
df.isnull().sum()

name                0
artists             0
tempo               0
key                 0
mode                0
full_key            0
A minor             0
A# major            0
A# minor            0
B major             0
B minor             0
C major             0
C minor             0
C# major            0
C# minor            0
D major             0
D minor             0
D# major            0
D# minor            0
E major             0
E minor             0
F major             0
F minor             0
F# major            0
F# minor            0
G major             0
G minor             0
G# major            0
G# minor            0
loudness            0
duration_s          0
duration_ms         0
energy              0
instrumentalness    0
speechiness         0
acousticness        0
danceability        0
valence             0
popularity          0
liveness            0
year                0
id                  0
dtype: int64

In [5]:
df['year'].value_counts()

2018    1966
2002    1923
2001    1911
2010    1904
1998    1895
2011    1891
1996    1890
2006    1889
2014    1886
1994    1881
1997    1880
1990    1880
1995    1875
2017    1874
1993    1860
2007    1857
2005    1857
1992    1851
2000    1851
2009    1849
2004    1846
1991    1842
2003    1829
2008    1828
2019    1821
2013    1815
1999    1812
2020    1807
2015    1751
2012    1741
2016    1691
Name: year, dtype: int64

---
---
### Section 2
#### Metric Comparison

In [31]:
# Creating a small dataframe of songs I know well to test Spotify metrics against those generated by Librosa.
spotify_metrics_test_df = pd.concat((df[52475:52476], df[52069:52070], df[56517:56518], df[39083:39084],
                                     df[55789:55790], df[44530:44531], df[54740:54741], df[48862:48863],
                                     df[18115:18116]))

In [32]:
spotify_metrics_test_df

Unnamed: 0,name,artists,tempo,key,mode,full_key,A minor,A# major,A# minor,B major,...,energy,instrumentalness,speechiness,acousticness,danceability,valence,popularity,liveness,year,id
52475,NO FUN,['Joji'],97.0,5,1,F major,0,0,0,0,...,0.483,0.0,0.0487,0.8,0.809,0.715,70,0.221,2018,4sbtM9ORGwmxGkXfctXbJq
52069,SLOW DANCING IN THE DARK,['Joji'],89.0,3,1,D# major,0,0,0,0,...,0.479,0.00598,0.0261,0.544,0.515,0.284,85,0.191,2018,0rKtyWc8bvkriBthvHKY8d
56517,Levitating,['Dua Lipa'],103.0,6,0,F# minor,0,0,0,0,...,0.884,0.0,0.0753,0.0561,0.695,0.914,78,0.213,2020,39LLxExYz6ewLAcYrzQQyP
39083,Tighten Up,['The Black Keys'],109.0,6,0,F# minor,0,0,0,0,...,0.705,4e-06,0.0665,0.00121,0.504,0.567,62,0.453,2010,2MVwrvjmcdt4MsYYLCYMt8
55789,Broken Glass,"['Kygo', 'Kim Petras']",171.0,7,1,G major,0,0,0,0,...,0.633,0.0,0.134,0.372,0.526,0.272,71,0.129,2020,78ldtCaBRJVp2i91B715L0
44530,Retrograde,['James Blake'],77.5,7,0,G minor,0,0,0,0,...,0.251,0.104,0.0372,0.873,0.533,0.186,66,0.134,2013,2IqjKEBiz0CdLKdkXhxw84
54740,Distance (feat. Issa Gold & Erick The Architect),"['Beast Coast', 'Joey Bada$$', 'Flatbush Zombi...",119.0,2,1,D major,0,0,0,0,...,0.606,0.0,0.125,0.003,0.784,0.267,60,0.101,2019,5jJYJthaXUdkHKmiS6TuXe
48862,Way down We Go,['KALEO'],163.3,10,0,A# minor,0,0,1,0,...,0.505,0.000333,0.117,0.579,0.489,0.337,78,0.104,2016,0y1QJc3SJVPKJ1OvFmFqe6
18115,"Rêverie, L. 68: Rêverie","['Claude Debussy', 'Julian Lloyd Webber', 'Roy...",63.8,5,1,F major,0,0,0,0,...,0.0428,0.924,0.0483,0.988,0.135,0.0394,47,0.134,1999,0wgqbmYhQyoL2TXGzQEg4k


---

#### Librosa