# **"Hungry Guys Radio" Playlist Analysis**

---


***Who:*** *Hungry Guys Radio is a collaborative Spotify playlist contributed to by Jack, Conor, Ian and myself.*

***What:*** *The goal of this analysis is to sharpen my data science skills but more importantly highlight how mine and my pals' music preferences may differ and profile our (fantastic) playlist as a whole.*

***Why:*** *Whenever I listen to Hungry Guys Radio I feel I'm able to guess which of us added the currently playing song to the playlist with decent accuracy. This led me to think about our different tastes in music and how I could use my data science skills to actually quantify and visualize these differences.*

***How:*** *For this project I will of course need to utilize Spotipy (Spotify's API) to access the data as well as R for EDA and Python for machine learning.*


---



## **Data Extraction**

Connect To Spotify API (Spotipy) & Autenticate

In [None]:
#!pip install spotipy --upgrade -q
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd

# authorization
cid = 'INSERT CLIENT ID'
secret = 'INSERT SECRET ID'

client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

We'll need a function to do a few things...

*   Call the Spotipy API and access a given playlist URL
*   Iterate over each song within the playlist and extract wanted song and playlist metrics
*   In order to work around Spotipy's limit of pulling a maximum of 100 songs at once, the function will need to accept "limit" and "offset" paramaters to shift the starting point at each call




In [None]:
# function to loop through a playlist and grab information on each song
def call_playlist(creator_id, playlist_url, limit, offset):
    # NOTE: use playlist_url instead of playlist_id. playlist_id doesn't accept limit and offset parameters for some reason?
    
    # create an empty list and an empty df
    playlist_features_list = ['artist', 'album', 'track_name', 'track_id', 'danceability', 'energy', 'key', 'loudness', 'mode', 
                              'speechiness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'duration_ms', 'time_signature', 'acousticness']
    playlist_df = pd.DataFrame(columns = playlist_features_list)
    
    # loop through the specified playlist and extract wanted features
    playlist = sp.user_playlist_tracks(creator_id, playlist_url, limit=limit, offset=offset)["items"]
    for track in playlist:

        # create empty dict
        playlist_features = {}

        # get metadata
        playlist_features['artist'] = track['track']['album']['artists'][0]['name']
        playlist_features['album'] = track['track']['album']['name']
        playlist_features["track_name"] = track["track"]["name"]
        playlist_features["track_id"] = track["track"]["id"]
        playlist_features["explicit"] = track["track"]["explicit"]
        playlist_features["popularity"] = track["track"]["popularity"]
        playlist_features["album_release_date"] = track["track"]["album"]["release_date"]
        playlist_features["duration_ms"] = track["track"]["duration_ms"]
        playlist_features['added_by'] = track["added_by"]["id"]
        playlist_features['added_at'] = track["added_at"]
        
        # get audio features
        audio_features = sp.audio_features(playlist_features["track_id"])[0]
        for feature in playlist_features_list[4:]:
            playlist_features[feature] = audio_features[feature]
        
        # concat dfs
        track_df = pd.DataFrame(playlist_features, index = [0])
        playlist_df = pd.concat([playlist_df, track_df], ignore_index = True)

    # return df
    return playlist_df

Now we can call the function however many times we need to in order to pull all songs! (Have yet for come up with a loop that could do this automatically, sorry). In the case of Hungry Guys Radio there are currently 270 songs so we'll need to call 3 times, each time adding 100 to the offset.

In [None]:
# songs 1-100 requires an offset of 0 since we want to start at the first song
first100 = call_playlist('nik-linnane', 'https://open.spotify.com/playlist/4PNAiZyW8o758BtoIRwwFR', 100, 0)

# songs 101-200 requires an offset of 100
second100 = call_playlist('nik-linnane', 'https://open.spotify.com/playlist/4PNAiZyW8o758BtoIRwwFR', 100, 100)

# songs 201-270 requires an offset of 200
third100 = call_playlist('nik-linnane', 'https://open.spotify.com/playlist/4PNAiZyW8o758BtoIRwwFR', 100, 200)

Now that we have all songs within the playlist stored in 3 dataframes we can concat them to create our complete dataset

In [None]:
hungryGuysRadio = pd.concat([first100, second100, third100], ignore_index = True)
hungryGuysRadio.describe()

Unnamed: 0,danceability,energy,loudness,speechiness,liveness,valence,tempo,acousticness,popularity
count,283.0,283.0,283.0,283.0,283.0,283.0,283.0,283.0,283.0
mean,0.630357,0.7123,-6.503004,0.071042,0.169195,0.638849,117.999527,0.199993,45.222615
std,0.126841,0.139701,2.227613,0.073769,0.138684,0.221271,27.549994,0.218542,24.100482
min,0.221,0.33,-17.032,0.0228,0.0222,0.0758,67.055,1.5e-05,0.0
25%,0.5495,0.626,-7.667,0.03475,0.08895,0.482,98.045,0.02745,37.0
50%,0.635,0.718,-6.254,0.0452,0.113,0.668,114.033,0.109,53.0
75%,0.712,0.816,-4.858,0.06975,0.2005,0.834,128.036,0.3385,62.0
max,0.974,0.982,-1.866,0.567,0.912,0.978,210.164,0.891,93.0


## **Data Cleaning**


In [None]:
# convert existing columns to datetime
hungryGuysRadio['added_at'] = pd.to_datetime(hungryGuysRadio['added_at'], errors='coerce')
hungryGuysRadio['album_release_date'] = pd.to_datetime(hungryGuysRadio['album_release_date'], errors='coerce')

# create new date_added column by removing time zone and time info
hungryGuysRadio['date_added'] = hungryGuysRadio['added_at'].dt.tz_localize(None)
hungryGuysRadio['date_added'] = pd.to_datetime(hungryGuysRadio['added_at']).dt.date

# create new duration_min column by converting ms to minutes
hungryGuysRadio['duration_min'] = hungryGuysRadio['duration_ms']/60000

# create new user columns by converting id to names
hungryGuysRadio['user'] = hungryGuysRadio['added_by'].apply(lambda x : 'Jack' if x == 'murphycfp' else (
                                                                       'Nik' if x == 'nik-linnane' else (
                                                                       'Ian' if x == '65jhgcs8ihtvs5pl09xqrgul5' else 
                                                                       'Conor')))

# convert explicit to binary
hungryGuysRadio['explicit'] = hungryGuysRadio['explicit'].apply(lambda x : 1 if x == True else 0)

In [None]:
hungryGuysRadio.head(3)

Unnamed: 0,artist,album,track_name,track_id,danceability,energy,key,loudness,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,acousticness,explicit,popularity,album_release_date,added_by,added_at,date_added,duration_min,user
0,BØRNS,Dopamine,Electric Love,2GiJYvgVaD2HtM8GqD9EgQ,0.611,0.797,6,-7.627,0,0.0533,0.00137,0.26,0.518,120.041,218107,4,0.00543,0,83.0,2015-10-16,murphycfp,2019-09-19 03:46:17+00:00,2019-09-19,3.63512,Jack
1,Edward Sharpe & The Magnetic Zeros,Up from Below,Janglin,3TgopD2FOwjZBq0YrhbEN4,0.578,0.688,6,-6.551,0,0.054,5.98e-05,0.305,0.36,123.91,229600,4,0.632,0,37.0,2009-07-14,murphycfp,2019-09-19 03:50:02+00:00,2019-09-19,3.82667,Jack
2,Goth Babe,Weekend Friend,Weekend Friend,2d3QlXE6FXFDeodiS66yjM,0.409,0.594,1,-9.725,1,0.031,0.0391,0.0904,0.212,110.187,209455,4,0.032,0,67.0,2019-01-19,murphycfp,2019-09-19 04:01:40+00:00,2019-09-19,3.49092,Jack


In [None]:
# export file to local
from google.colab import files
hungryGuysRadio.to_csv('hungryGuysRadio.csv')
files.download('hungryGuysRadio.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## **Pull Additional Playlists**

In [None]:
# Spotify's Today's Top Hits
topHits = call_playlist('spotify', 'https://open.spotify.com/playlist/37i9dQZF1DXcBWIGoYBM5M?si=e00cef0ea9b9434f&nd=1', 100, 0)

Now we can clean the data the same way we did earlier...

In [None]:
# convert existing columns to datetime
topHits['added_at'] = pd.to_datetime(topHits['added_at'], errors='coerce')
topHits['album_release_date'] = pd.to_datetime(topHits['album_release_date'], errors='coerce')

# create new date_added column by removing time zone and time info
topHits['date_added'] = topHits['added_at'].dt.tz_localize(None)
topHits['date_added'] = pd.to_datetime(topHits['added_at']).dt.date

# create new duration_min column by converting ms to minutes
topHits['duration_min'] = topHits['duration_ms']/60000

# make Spotify the User
topHits['added_by'] = 'Spotify'
topHits['user'] = 'Spotify'

# convert explicit to binary
topHits['explicit'] = topHits['explicit'].apply(lambda x : 1 if x == True else 0)

In [None]:
# export file to local
from google.colab import files
topHits.to_csv('topHits.csv')
files.download('topHits.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Pull more playlists for ML...

In [None]:
# Spotify's It's ALT Good playlist
itsAltGood = call_playlist('Spotify', 'https://open.spotify.com/playlist/37i9dQZF1DX2SK4ytI2KAZ?si=bee50e8ecc5547a9&nd=1', 100, 0)

# convert existing columns to datetime
itsAltGood['added_at'] = pd.to_datetime(itsAltGood['added_at'], errors='coerce')
itsAltGood['album_release_date'] = pd.to_datetime(itsAltGood['album_release_date'], errors='coerce')

# create new date_added column by removing time zone and time info
itsAltGood['date_added'] = itsAltGood['added_at'].dt.tz_localize(None)
itsAltGood['date_added'] = pd.to_datetime(itsAltGood['added_at']).dt.date

# create new duration_min column by converting ms to minutes
itsAltGood['duration_min'] = itsAltGood['duration_ms']/60000

# make Spotify the User
itsAltGood['added_by'] = 'Spotify'
itsAltGood['user'] = 'Spotify'
itsAltGood['PLAYLIST'] = "It's ALT Good!"

# convert explicit to binary
itsAltGood['explicit'] = itsAltGood['explicit'].apply(lambda x : 1 if x == True else 0)

In [None]:
# Spotify's All New Jazz playlist
allNewJazz = call_playlist('Spotify', 'https://open.spotify.com/playlist/37i9dQZF1DWW2c0C8Vb2IR?si=afe2bab37df64629', 100, 0)

# convert existing columns to datetime
allNewJazz['added_at'] = pd.to_datetime(allNewJazz['added_at'], errors='coerce')
allNewJazz['album_release_date'] = pd.to_datetime(allNewJazz['album_release_date'], errors='coerce')

# create new date_added column by removing time zone and time info
allNewJazz['date_added'] = allNewJazz['added_at'].dt.tz_localize(None)
allNewJazz['date_added'] = pd.to_datetime(allNewJazz['added_at']).dt.date

# create new duration_min column by converting ms to minutes
allNewJazz['duration_min'] = allNewJazz['duration_ms']/60000

# make Spotify the User
allNewJazz['added_by'] = 'Spotify'
allNewJazz['user'] = 'Spotify'
allNewJazz['PLAYLIST'] = "All New Jazz"

# convert explicit to binary
allNewJazz['explicit'] = allNewJazz['explicit'].apply(lambda x : 1 if x == True else 0)

In [None]:
morganFreeman = call_playlist('murphycfp', 'https://open.spotify.com/playlist/2BOfQ6vSaZn288Cyjo3uZa?si=a239c6f3ae814b03&nd=1', 100, 0)

# convert existing columns to datetime
morganFreeman['added_at'] = pd.to_datetime(morganFreeman['added_at'], errors='coerce')
morganFreeman['album_release_date'] = pd.to_datetime(morganFreeman['album_release_date'], errors='coerce')

# create new date_added column by removing time zone and time info
morganFreeman['date_added'] = morganFreeman['added_at'].dt.tz_localize(None)
morganFreeman['date_added'] = pd.to_datetime(morganFreeman['added_at']).dt.date

# create new duration_min column by converting ms to minutes
morganFreeman['duration_min'] = morganFreeman['duration_ms']/60000

# make Spotify the User
morganFreeman['added_by'] = 'murphycfp'
morganFreeman['user'] = 'Jack'
morganFreeman['PLAYLIST'] = "Morgan Freeman"

# convert explicit to binary
morganFreeman['explicit'] = morganFreeman['explicit'].apply(lambda x : 1 if x == True else 0)

In [None]:
# Spotify's Hot Country playlist
hotCountry = call_playlist('Spotify', 'https://open.spotify.com/playlist/37i9dQZF1DX1lVhptIYRda?si=202199d6ef494637&nd=1', 100, 0)

# convert existing columns to datetime
hotCountry['added_at'] = pd.to_datetime(hotCountry['added_at'], errors='coerce')
hotCountry['album_release_date'] = pd.to_datetime(hotCountry['album_release_date'], errors='coerce')

# create new date_added column by removing time zone and time info
hotCountry['date_added'] = hotCountry['added_at'].dt.tz_localize(None)
hotCountry['date_added'] = pd.to_datetime(hotCountry['added_at']).dt.date

# create new duration_min column by converting ms to minutes
hotCountry['duration_min'] = hotCountry['duration_ms']/60000

# make Spotify the User
hotCountry['added_by'] = 'Spotify'
hotCountry['user'] = 'Spotify'
hotCountry['PLAYLIST'] = "Hot Country"

# convert explicit to binary
hotCountry['explicit'] = hotCountry['explicit'].apply(lambda x : 1 if x == True else 0)

In [None]:
# export file
from google.colab import files
hotCountry.to_csv('hotCountry.csv')
files.download('hotCountry.csv')

In [None]:
cantNotSing = call_playlist('nik-linnane', 'https://open.spotify.com/playlist/1zAeTuJcED6BBCQnqXwobB?si=06db335c69234672', 100, 0)

# convert existing columns to datetime
cantNotSing['added_at'] = pd.to_datetime(cantNotSing['added_at'], errors='coerce')
cantNotSing['album_release_date'] = pd.to_datetime(cantNotSing['album_release_date'], errors='coerce')

# create new date_added column by removing time zone and time info
cantNotSing['date_added'] = cantNotSing['added_at'].dt.tz_localize(None)
cantNotSing['date_added'] = pd.to_datetime(cantNotSing['added_at']).dt.date

# create new duration_min column by converting ms to minutes
cantNotSing['duration_min'] = cantNotSing['duration_ms']/60000

# make Spotify the User
cantNotSing['added_by'] = 'nik-linnane'
cantNotSing['user'] = 'Nik'
cantNotSing['PLAYLIST'] = "Can't Not Sing Along"

# convert explicit to binary
cantNotSing['explicit'] = cantNotSing['explicit'].apply(lambda x : 1 if x == True else 0)

In [None]:
# export file
from google.colab import files
cantNotSing.to_csv('cantNotSing.csv')
files.download('cantNotSing.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
testingData = call_playlist('nik-linnane', 'https://open.spotify.com/playlist/63dFknrgW7U477XsboNCNM?si=9738b8044bf44b3e', 100, 0)

# convert existing columns to datetime
testingData['added_at'] = pd.to_datetime(testingData['added_at'], errors='coerce')
testingData['album_release_date'] = pd.to_datetime(testingData['album_release_date'], errors='coerce')

# create new date_added column by removing time zone and time info
testingData['date_added'] = testingData['added_at'].dt.tz_localize(None)
testingData['date_added'] = pd.to_datetime(testingData['added_at']).dt.date

# create new duration_min column by converting ms to minutes
testingData['duration_min'] = testingData['duration_ms']/60000

# make Spotify the User
testingData['added_by'] = 'nik-linnane'
testingData['user'] = 'Nik'
testingData['PLAYLIST'] = "ML Test"

# convert explicit to binary
testingData['explicit'] = testingData['explicit'].apply(lambda x : 1 if x == True else 0)

In [None]:
# export file
from google.colab import files
testingData.to_csv('testingData.csv')
files.download('testingData.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>