# **"Hungry Guys Radio" Playlist Analysis**

---


***Who:*** *Hungry Guys Radio is a collaborative Spotify playlist contributed to by Jack Murphy, Conor Wylock, Ian Hinojosa and myself.*

***What:*** *The goal of this analysis is to sharpen my data science skills but more importantly highlight how mine and my pals' music preferences may differ and profile our (fantastic) playlist as a whole.*

***Why:*** *Whenever I listen to Hungry Guys Radio I feel I'm able to guess which of us added the currently playing song to the playlist with close to 80% accuracy (never actually tested this accuracy). This led me to think about our different tastes in music and how I could use my data science skills to actually quantify and visualize these differences.*

***How:*** *For this project I will of course need to utilize Spotipy (Spotify's API) to access the data and Python to clean and visualize the dataset.*


---



## **Phase 1: Data Extraction**

Connect To Spotify API (Spotipy) & Autenticate

In [None]:
#!pip install spotipy --upgrade -q
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd

# authorization
cid = '13f76c00d24446ecba764958bf07fc9e'
secret = 'c2bac37f194441efb4c607a7f1ffe20f'

client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

We'll need a function to do a few things...

*   Call the Spotipy API and access a given playlist URL
*   Iterate over each song within the playlist and extract wanted song and playlist metrics
*   In order to work around Spotipy's limit of pulling a maximum of 100 songs at once, the function will need to accept "limit" and "offset" paramaters to shift the starting point at each call




In [None]:
# function to loop through a playlist and grab information on each song
def call_playlist(creator_id, playlist_url, limit, offset):
    # NOTE: use playlist_url instead of playlist_id. playlist_id doesn't accept limit and offset parameters for some reason?
    
    # create an empty list and an empty df
    playlist_features_list = ['artist', 'album', 'track_name', 'track_id', 'danceability', 'energy', 'key', 'loudness', 'mode', 
                              'speechiness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'duration_ms', 'time_signature', 'acousticness']
    playlist_df = pd.DataFrame(columns = playlist_features_list)
    
    # loop through the specified playlist and extract wanted features
    playlist = sp.user_playlist_tracks(creator_id, playlist_url, limit=limit, offset=offset)["items"]
    for track in playlist:

        # create empty dict
        playlist_features = {}

        # get metadata
        playlist_features['artist'] = track['track']['album']['artists'][0]['name']
        playlist_features['album'] = track['track']['album']['name']
        playlist_features["track_name"] = track["track"]["name"]
        playlist_features["track_id"] = track["track"]["id"]
        playlist_features["explicit"] = track["track"]["explicit"]
        playlist_features["popularity"] = track["track"]["popularity"]
        playlist_features["album_release_date"] = track["track"]["album"]["release_date"]
        playlist_features["duration_ms"] = track["track"]["duration_ms"]
        playlist_features['added_by'] = track["added_by"]["id"]
        playlist_features['added_at'] = track["added_at"]
        
        # get audio features
        audio_features = sp.audio_features(playlist_features["track_id"])[0]
        for feature in playlist_features_list[4:]:
            playlist_features[feature] = audio_features[feature]
        
        # concat dfs
        track_df = pd.DataFrame(playlist_features, index = [0])
        playlist_df = pd.concat([playlist_df, track_df], ignore_index = True)

    # return df
    return playlist_df

Now we can call the function however many times we need to in order to pull all songs! In the case of Hungry Guys Radio there are currently 270 songs so we'll need to call 3 times, each time adding 100 to the offset.

In [None]:
# songs 1-100 requires an offset of 0 since we want to start at the first song
first100 = call_playlist('nik-linnane', 'https://open.spotify.com/playlist/4PNAiZyW8o758BtoIRwwFR', 100, 0)

# songs 101-200 requires an offset of 100
second100 = call_playlist('nik-linnane', 'https://open.spotify.com/playlist/4PNAiZyW8o758BtoIRwwFR', 100, 100)

# songs 201-270 requires an offset of 200
third100 = call_playlist('nik-linnane', 'https://open.spotify.com/playlist/4PNAiZyW8o758BtoIRwwFR', 100, 200)

Now that we have all songs within the playlist stored in 3 dataframes we can concat them to create our complete dataset

In [None]:
hungryGuysRadio = pd.concat([first100, second100, third100], ignore_index = True)
hungryGuysRadio.describe()

Unnamed: 0,danceability,energy,loudness,speechiness,liveness,valence,tempo,acousticness,popularity
count,270.0,270.0,270.0,270.0,270.0,270.0,270.0,270.0,270.0
mean,0.630241,0.713848,-6.503252,0.071135,0.16733,0.642364,117.524193,0.203233,43.696296
std,0.126242,0.139359,2.259506,0.074803,0.135131,0.225001,27.709648,0.220262,24.613525
min,0.221,0.33,-17.032,0.0228,0.0222,0.0758,67.055,1.5e-05,0.0
25%,0.55625,0.63375,-7.61675,0.034925,0.086575,0.482,97.94425,0.029625,34.25
50%,0.6355,0.7205,-6.2615,0.04505,0.111,0.672,112.9615,0.109,51.0
75%,0.7135,0.81675,-4.8525,0.068225,0.206,0.83875,126.96775,0.3505,61.0
max,0.974,0.982,-1.866,0.567,0.912,0.978,210.164,0.891,90.0


## **Phase 2: Data Cleaning**

*   added_date: clean formatting
*   album_release_date: ensure formatting
*   duration_ms: convert to minutes
*   added_by: convert id to name


In [None]:
# convert existing columns to datetime
hungryGuysRadio['added_at'] = pd.to_datetime(hungryGuysRadio['added_at'], errors='coerce')
hungryGuysRadio['album_release_date'] = pd.to_datetime(hungryGuysRadio['album_release_date'], errors='coerce')

# create new date_added column by removing time zone and time info
hungryGuysRadio['date_added'] = hungryGuysRadio['added_at'].dt.tz_localize(None)
hungryGuysRadio['date_added'] = pd.to_datetime(hungryGuysRadio['added_at']).dt.date

# create new duration_min column by converting ms to minutes
hungryGuysRadio['duration_min'] = hungryGuysRadio['duration_ms']/60000

# create new user columns by converting id to names
hungryGuysRadio['user'] = hungryGuysRadio['added_by'].apply(lambda x : 'Jack' if x == 'murphycfp' else (
                                                                       'Nik' if x == 'nik-linnane' else (
                                                                       'Ian' if x == '65jhgcs8ihtvs5pl09xqrgul5' else 
                                                                       'Conor')))

# convert explicit to binary
hungryGuysRadio['explicit'] = hungryGuysRadio['explicit'].apply(lambda x : 1 if x == True else 0)

In [None]:
hungryGuysRadio.head(3)

Unnamed: 0,artist,album,track_name,track_id,danceability,energy,key,loudness,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,acousticness,explicit,popularity,album_release_date,added_by,added_at,date_added,duration_min,user
0,BØRNS,Dopamine,Electric Love,2GiJYvgVaD2HtM8GqD9EgQ,0.611,0.797,6,-7.627,0,0.0533,0.00137,0.26,0.518,120.041,218107,4,0.00543,0,82.0,2015-10-16,murphycfp,2019-09-19 03:46:17+00:00,2019-09-19,3.63512,Jack
1,Edward Sharpe & The Magnetic Zeros,Up from Below,Janglin,3TgopD2FOwjZBq0YrhbEN4,0.578,0.688,6,-6.551,0,0.054,5.98e-05,0.305,0.36,123.91,229600,4,0.632,0,37.0,2009-07-14,murphycfp,2019-09-19 03:50:02+00:00,2019-09-19,3.82667,Jack
2,Goth Babe,Weekend Friend,Weekend Friend,2d3QlXE6FXFDeodiS66yjM,0.409,0.594,1,-9.725,1,0.031,0.0391,0.0904,0.212,110.187,209455,4,0.032,0,67.0,2019-01-19,murphycfp,2019-09-19 04:01:40+00:00,2019-09-19,3.49092,Jack


In [None]:
from google.colab import files
hungryGuysRadio.to_csv('hungryGuysRadio.csv')
files.download('hungryGuysRadio.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## **Phase 3: Exploratory Data Analysis**
Now we can start digging into the data and looking for insights we might want to look into deeper!

**Data Dictionary...**

***Artist:*** *artist of the given track.*

***Album:*** *album that the given track is off of.*

***Track_name:*** *name of the given track.*

***Track_id:*** *Spotify's identifier for the given track.*

***Danceability:*** *describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.*

***Energy:*** *represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.*

**Key:** *the key that the given track is played in.*

***Loudness:*** *overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track. Values typical range between -60 and 0 db.*

***Mode:*** *unsure.*

***Speechiness:*** *detects the presence of spoken words in a track. If the speechiness of a song is above 0.66, it is probably made of spoken words, a score between 0.33 and 0.66 is a song that may contain both music and words, and a score below 0.33 means the song does not have any speech.*

***Instrumentalness:*** *represents the amount of vocals in the song. The closer it is to 1.0, the more instrumental the song is.*

***Liveness:*** *describes the probability that the song was recorded with a live audience. According to the official documentation “a value above 0.8 provides strong likelihood that the track is live”.*

***Valence:*** *measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).*

***Tempo:*** *the tempo of the given track in bpm.*

***Duration_ms:*** *duration of the song in milliseconds.*

***Time_signature:*** *time signature of the given track.*

***Acousticness:*** *describes how acoustic a song is. A score of 1.0 means the song is most likely to be an acoustic one.*

***Explicit:*** *whether or not the given track contains explicit language.*

***Popularity:*** *how popular the given track is. Scores range from 0 to 100.*

***Album_release_date:*** *date the album of the given track was released.*

***Added_by:*** *who added the given track to the playlist.*

***Date_added:*** *the date the given track was added to the playlist.*

***Duration_min:*** *duration of the given track in minutes.*

***User:*** *username of who added the given track to the playlist.*











In [None]:
# who has added the most songs to the playlist?
  # makes sense it's Jack since he did initially create the playlist
  # however, why has Ian only added 7 songs? maybe he's very thoughtful behind how he contributes to the vibes? how do these 7 songs compares to the playlist as a whole?
hungryGuysRadio.user.value_counts()

Jack     159
Conor     63
Nik       41
Ian        7
Name: user, dtype: int64

In [None]:
# how many different artists are on this playlist?
print('Unique artists: ', len(pd.unique(hungryGuysRadio.artist)))

# how many different albums?
print('Unique albums: ', len(pd.unique(hungryGuysRadio.album)))
print()

# mostly clean or explicit?
print('Explicit songs: ', hungryGuysRadio.explicit.value_counts()[1])
print('Clean songs: ', hungryGuysRadio.explicit.value_counts()[0])
print()

# on average how long is each song?
print('Average song duration: ', hungryGuysRadio.duration_min.mean().round(2), 'minutes')

# how long is the entire playlist?
print('Playlist duration: ', round(hungryGuysRadio.duration_min.sum(), 2), 'minutes')

# how many songs are on the playlist?
print('Number of songs: ', len(hungryGuysRadio))

Unique artists:  221
Unique albums:  257

Explicit songs:  30
Clean songs:  240

Average song duration:  3.78 minutes
Playlist duration:  1021.02 minutes
Number of songs:  270


In [None]:
# which artists have the most popular tracks within the playlist?
hungryGuysRadio.groupby(['artist', 'track_name'], as_index=False)['popularity'].mean().sort_values(by='popularity', ascending=False)[:5]

Unnamed: 0,artist,track_name,popularity
60,Elton John,Cold Heart - PNAU Remix,90.0
81,Glass Animals,Heat Waves,87.0
49,Dua Lipa,Levitating,85.0
27,BØRNS,Electric Love,82.0
179,Ritt Momney,Put Your Records On,80.0


In [None]:
# which artists have the least popular tracks within the playlist?
hungryGuysRadio.groupby(['artist', 'track_name'], as_index=False)['popularity'].mean().sort_values(by='popularity', ascending=True)[:5]

Unnamed: 0,artist,track_name,popularity
194,Spoon,The Underdog,0.0
227,The Raconteurs,"Steady, As She Goes",0.0
75,Future Islands,Ancient Water,0.0
145,Morningsiders,Somewhere In Between,0.0
206,Sundara Karma,Flame (Roosevelt Remix),0.0
