In [1]:
pip install spotipy pandas


Note: you may need to restart the kernel to use updated packages.


## Project Description:

Extract data from my own Spotify playlists (will only use the playlists with 100+ songs), analyze the data and create an interactive dashboard in Microsoft BI to provide insights into my own music preferences 

## Theory

API: Application programming interface, enables different software entities to communicate with each other

Take playlist data, number of songs, artists, duration and preprocess it in Python using libararies. 
Then import it as a csv into Microsoft BI to create visualisations- egs- most popular genres, most popular artist, most listened playlist, which genre has the longest average duration. Map graph with each artist location 

References are stored as a dictionary 


## Workflow


In [2]:
#Import libararies
import spotipy
from spotipy.oauth2 import SpotifyOAuth
import pandas as pd
import webbrowser
import time

In [3]:
# Set up the credentials
client_id = 'Not Showing'
client_secret = 'Not Showing'
redirect_uri = 'http://localhost:8080'
scopes = 'user-read-recently-played playlist-read-private playlist-modify-public user-top-read'


# Create a SpotifyOAuth object
sp_oauth = SpotifyOAuth(client_id=client_id,
                        client_secret=client_secret,
                        redirect_uri=redirect_uri,
                        scope=scopes)

# Get the authorization URL and open it in a web browser
auth_url = sp_oauth.get_authorize_url()
webbrowser.open(auth_url)

True

In [4]:
redirect_response = input("Enter the redirect URL: ")
# Extract the authorization code from the URL
from urllib.parse import urlparse, parse_qs

url_components = urlparse(redirect_response)
query_params = parse_qs(url_components.query)
authorization_code = query_params.get('code')[0]

# Get the access token using the authorization code
token_info = sp_oauth.get_access_token(authorization_code)
access_token = token_info['access_token']

# Use the access token to authenticate and create sp object to access methods
sp = spotipy.Spotify(auth=access_token)

Enter the redirect URL: http://localhost:8080/?code=AQCucjwszAPIORdZJtH5GJhEe0xQO2Tn_lo_rqoTt0ykIYwKvmTI_R99UBYKv6EDckCYe5nUH8ZMQtmzwWIakoA5DAa7m8x5dV1GCehsRuiyAQelegyikgqf-aamANQjDkoGnCnAgn1lHDe-aAL5nfi1zoFm1yIjSFOyUEurhzzIvai-mAwFUo5kkx2qBnsmaGy4OKF-Xnl7z89J5TYSrJA705nEoDrNm5_jUxKZwJtCpHT1YATWJVa2MxZkIewZC7Qf2DhFi0AWt4Mhgwc9rfc0


  token_info = sp_oauth.get_access_token(authorization_code)


The methods of the spotify object return nested dictionarires, with each main key having its own subsequent keys.

The name of each playlist is contained in the items key


## 1. Retrieve the names and URI's of each playlist

This process is necessary to get the songs in each playlist using the URI

In [5]:
#Retrieve the names of each playlist 
playlists = sp.current_user_playlists()
#Empyty list 
playlist_names = []

#Loop through the dict
for details in playlists['items']:  
    playlist_name = details['name'] 
    playlist_uri = details['uri']
    playlist_names.append({'playlist_name': playlist_name,
                          'playlist_uri' : playlist_uri})


dataframe_playlist_details = pd.DataFrame(playlist_names)

dataframe_playlist_details

Unnamed: 0,playlist_name,playlist_uri
0,Rainforest Sounds,spotify:playlist:37i9dQZF1DXaw68inx4UiN
1,Anime OST's,spotify:playlist:1Mv7VRDuRgNOvgoZlT1UAi
2,DJ,spotify:playlist:37i9dQZF1EYkqdzj48dyYq
3,EUROBEAT,spotify:playlist:2X9mtWVk6w6xmT8gHja23X
4,Indian,spotify:playlist:00jgAVgS37OaCJD3Ol3mPk
5,Melodic 🚀🎶,spotify:playlist:4wTBk1GNvCREdlrdsQt00M
6,J & K-Pop,spotify:playlist:1OLkuGNwtf25e2q6odgLVR
7,Pensive,spotify:playlist:5NaW4ktSpoRnxAmJJTuGVh
8,Slowed and Reverbed,spotify:playlist:6NT2wGPeNuyrZ4yMxw15yb
9,Fantasy,spotify:playlist:4La7p5Uzh2BabrUBkeXTLD


### Playlists to be used:


Gym

Sophisticated Bangers

Barz

Turnt 

Groovy

Chill Tunes

## 2. Extract the tracks in each playlist and store them in a dataframe
Ideas for dashboard:

Number of unique artists in each playlist, artist genre composition (how many unique genres are there in each playlist)

Average number of songs in each playlist

The artist URI so i can get the genres as well

Spotify has only genres for artists, not for individual tracks or albums.

Hip Hop is the overarching genre so 

In [6]:
def get_playlist_tracks(playlist_uri):
    # Empty list to store song name & artist
    song_details = []
    
    # Get the initial set of tracks
    tracks = sp.playlist_tracks(playlist_uri)
    
    while tracks:
        # Store the values of the key items in songs (this key contains important track details)
        # The datatype of songs is a list that contains numerous dictionaries
        songs = tracks['items']
        
        # Iterate over each track in the songs list
        for song_detail in songs:
            # Store the details of the track in a variable
            track = song_detail['track']
            # Stores the song name of the track 
            song_name = track['name']
            
            # Empty lists to store the artist name & artist genre
            artist_names = []
            artist_genres = []
            
            # Iterate through the artists
            for artist in track['artists']:
                artist_name = artist['name']
                artist_names.append(artist_name)
                
                # Add a sleep to avoid reaching API call limit
                time.sleep(0.2)
                
                #Sometimes an artist will have no genre, which will throw an error as we require a string to be appended.
                #Therefore this loop basically loops through the genre dictionarary and appends the genre to the list
                #However if a genre doesn't exist, instead it appends Unknown
                if artist['id']:
                    try:
                        artist_info = sp.artist(artist['id'])
                        for genre in artist_info['genres']:
                            artist_genres.append(genre)
                    except Exception as e:
                        #print()
                        artist_genres.append('Unknown')
                else:
                    artist_genres.append('Unknown')
                
              
            #join the artist name & genres together as a comma seperated string
            
            artist_name_full = ', '.join(artist_names)
            artist_genre_full = ', '.join(artist_genres)
            
            #Append the details to the empty lists
            song_details.append({
                'Song Name': song_name,
                'Artist Names': artist_name_full,
                'Genres': artist_genre_full
            })
        
        # Get the next page of results, if available
        if tracks['next']:
            tracks = sp.next(tracks)
        else:
            tracks = None
    
    song_dataframe = pd.DataFrame(song_details)
    return song_dataframe

In [18]:
indian = get_playlist_tracks('00jgAVgS37OaCJD3Ol3mPk')



[{'Songs': 'Jhoome Jo Pathaan'}, {'Songs': 'Besharam Rang (From "Pathaan")'}]


### Playlist Name: Turnt (Electronic Music)

In [6]:
#Melodic
edm_playlist = get_playlist_tracks('6uBUYL8THnuYHd7AGXuYuA')


edm_playlist.to_csv('EDM_playlist.csv', index = False)
#Convert it to a csv 


In [1]:
edm_playlist.tail()

NameError: name 'edm_playlist' is not defined

### Playlist Name: Groovy

In [22]:
#groovy = get_playlist_tracks('4DnwPYRiWmFVwFIT9fyzGG')

#groovy.to_csv('Groovy.csv',index = False)


### Playlist Name: Sophisticated Bangers

In [10]:
indie_alt_playlist = get_playlist_tracks('73MsTBeIWzrxYzOtlmG4zK')


indie_alt_playlist.to_csv('Sophisticated_Bangers.csv', index = False)

In [11]:
indie_alt_playlist.tail()

Unnamed: 0,Song Name,Artist Names,Genres
147,Out Of Luck,Old Mervs,"australian indie rock, australian surf rock"
148,Fly Like A Bird,Mariah Carey,"dance pop, pop, urban contemporary"
149,Not Strong Enough,"boygenius, Julien Baker, Phoebe Bridgers, Lucy...","indie pop, ambient folk, indie pop, indie pop,..."
150,Partner In Crime,Ocean Alley,"australian reggae fusion, australian surf rock"
151,Hold the Line,TOTO,"album rock, classic rock, hard rock, mellow go..."


### Playlist Name: Chill Tunes

In [12]:
chill_playlist = get_playlist_tracks('6Cw0MtBmHuvATdgGCAHXOF')

chill_playlist.to_csv('Chill_Playlist.csv', index = False)

In [13]:
chill_playlist.tail()

Unnamed: 0,Song Name,Artist Names,Genres
339,Too Fast,Sonder,experimental r&b
340,FACE,BROCKHAMPTON,"boy band, rap"
341,DEAD MAN WALKING,Brent Faiyaz,"r&b, rap"
342,Dark Red,Steve Lacy,afrofuturism
343,Jealousy,RUBII,


### Playlist Name: R&B

In [14]:
r_b_playlist = get_playlist_tracks('5g6OgYruke2lj4n3LQ0Zut')
r_b_playlist.to_csv('R&B_playlist.csv')

### Playlist Name: Hip Hop

In [7]:
rap_playlist = get_playlist_tracks('4RPMuohEPDgdytNG9EcD8t')
rap_playlist.to_csv('Rap_Playlist.csv')

In [8]:
rap_playlist.tail()

Unnamed: 0,Song Name,Artist Names,Genres
900,Flawlëss (feat. Lil Uzi Vert),"Yeat, Lil Uzi Vert","pluggnb, rage rap, hip hop, melodic rap, phill..."
901,MILLION DOLLAR BABY,Tommy Richman,chill abstract hip hop
902,Everyday Hustle,"Future, Metro Boomin, Rick Ross","atl hip hop, hip hop, rap, southern hip hop, t..."
903,Flooded The Face,Lil Uzi Vert,"hip hop, melodic rap, philly rap, rage rap, ra..."
904,Codeine Crazy,Future,"atl hip hop, hip hop, rap, southern hip hop, trap"


## User Analysis

This section will contain, code related to top tracks, artists and albulms. 



### User Analysis: Top Artists


In [5]:
#Returns a dictionarary, containing artist name, artist genre and stats like popularity
#Set a limit so it returns the top 10
#Important note: The default duration for this function is medimum term, which is the past 6 months
top_items = sp.current_user_top_artists(limit = 10)


In [6]:
#Extract my top 10 artists and their popularity score
#The popularity of the artist. The value will be between 0 and 100, with 100 being the most popular.
#The artist's popularity is calculated from the popularity of all the artist's tracks.
#Important note: popularity is a general equation, it's not a personalized metric

top_artists = []

#For loop through the first key
for artist_info in top_items['items']:
    #access artist names
    artist_name = artist_info['name']
    #Access artist popularity
    artist_popularity = artist_info['popularity']
    top_artists.append({
        'Artist Name' : artist_name,
        'Artist Popularity' : artist_popularity
    })
df_top_artists = pd.DataFrame(top_artists)
df_top_artists.to_csv('Top Artist & Popularity.csv')

In [7]:
df_top_artists.head()

Unnamed: 0,Artist Name,Artist Popularity
0,Travis Scott,92
1,Future,91
2,$uicideboy$,86
3,Fred again..,78
4,Kid Cudi,78


### User Analysis: Top Tracks


In [8]:
#Top 10 Tracks over the past 6 months
top_10_tracks = sp.current_user_top_tracks(limit = 10)

In [9]:
top_tracks = []

for track_info in top_10_tracks['items']:
    track_name = track_info['name']
    track_popularity = track_info['popularity']
    top_tracks.append({'Track Name' : track_name,
                      'Track Popularity' : track_popularity})
    
df_top_tracks = pd.DataFrame(top_tracks)
df_top_tracks.to_csv('Top Tracks & Pop.csv')

In [10]:
df_top_tracks.head()

Unnamed: 0,Track Name,Track Popularity
0,Down With Me,64
1,The Last Cherry Blossom,44
2,Need Me,55
3,E.T. (Hardstyle),55
4,"Are You Going to See the Rose in the Vase, or ...",73


### User Analysis: Most Popular Genres over the past 6 months



In [11]:
#To get most popular genres, we can use the most popular artists and find the genres associated with them
#That technically, is the most popular genres over the past 6 months
top_genres = sp.current_user_top_artists(limit = 50)

In [12]:
artist_and_genre_list = []

for info in top_genres['items']:
    
    artist_name = info['name']
    artist_genre = info['genres']
    
    artist_and_genre_list.append({
        'Artist Name' : artist_name,
        'Artist Genre' : artist_genre
    })

df_genre_breakdown = pd.DataFrame(artist_and_genre_list)
    
#CSV

In [13]:
#The the artists genres HAS square brackets because the genres are stored as a list within the dictionary 
df_genre_breakdown.head()

#To get rid of strings
df_genre_breakdown['Artist Genre'] = df_genre_breakdown['Artist Genre'].apply(lambda x: ', '.join(x))


In [14]:
df_genre_breakdown.head()
df_genre_breakdown.to_csv('Genre Breakdown.csv')

### User Analysis: Mood and energy 

Listening history for the past 6 months, along with energy and valence scores.

Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.

A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)

Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.




In [15]:
#Get top tracks over 6 months
top_tracks_6_months = sp.current_user_top_tracks(limit = 50)
#Then I need to get the song name + the song URI, so I can get the audio features for each track

In [16]:
songs_and_uris = []

for details in top_tracks_6_months['items']:
    song_name = details['name']
    song_id = details['id']
    
    songs_and_uris.append({'Song Name' : song_name,
                          'Song ID' : song_id})

song_and_uri_df = pd.DataFrame(songs_and_uris)
song_ids = song_and_uri_df['Song ID'].tolist()

In [17]:
#Get Several Track Audio Features takes a maximum of 100ids, therefore we can pass in a list of ids
#And get a dictionary with a length of 50
song_features = sp.audio_features(song_ids)

In [18]:
audio_features = []

for features in song_features:
    song_ids = features['id']
    song_danceability = features['danceability']
    song_energy = features['energy']
    song_valence = features['valence']
    
    audio_features.append({'Song ID' : song_ids,'Danceability' : song_danceability,
                          'Energy' : song_energy,
                          'Valence' : song_valence})

audio_features_df = pd.DataFrame(audio_features)
audio_features_df.head()

Unnamed: 0,Song ID,Danceability,Energy,Valence
0,5ZDIwpR2JMwuRM8xumtmxn,0.712,0.588,0.471
1,6jUIdbS9JKg3wRHhNSfzkC,0.664,0.591,0.376
2,6hAsChsFiKALkmIishhyVT,0.697,0.635,0.72
3,2ncCB2DOggeQEDOb8HVKSy,0.432,0.934,0.541
4,78RdTKcI2CediltTnGLIqr,0.749,0.505,0.258


In [19]:
audio_feature_analysis_df = pd.merge(song_and_uri_df, audio_features_df, on = 'Song ID')
audio_feature_analysis_df.head()

audio_feature_analysis_df.to_csv('Audio Feature Analysis.csv')

In [10]:
# Get top tracks over 6 months and the album on which the track appears on
top_tracks_6_months_and_album = sp.current_user_top_tracks(limit=50)

# Create a list to store song names, song IDs, and album names
songs_and_albums = []

for track in top_tracks_6_months_and_album['items']:
    song_name = track['name']
    song_id = track['id']
    album_name = track['album']['name']  # Extracting album name from track object
    
    songs_and_albums.append({
        'Song Name': song_name,
        'Song ID': song_id,
        'Album Name': album_name
    })

# Create a DataFrame to store the data
songs_and_albums_df = pd.DataFrame(songs_and_albums)
songs_and_albums_df.head()
songs_and_albums_df.to_csv('Top Albums.csv')

In [11]:
songs_and_albums_df


Unnamed: 0,Song Name,Song ID,Album Name
0,Down With Me,5ZDIwpR2JMwuRM8xumtmxn,Down With Me
1,The Last Cherry Blossom,6jUIdbS9JKg3wRHhNSfzkC,The Last Cherry Blossom
2,E.T. (Hardstyle),2ncCB2DOggeQEDOb8HVKSy,E.T. (Hardstyle)
3,Need Me,6hAsChsFiKALkmIishhyVT,TEC
4,"Are You Going to See the Rose in the Vase, or ...",78RdTKcI2CediltTnGLIqr,"Are You Going to See the Rose in the Vase, or ..."
5,1000 Blunts,09riz9pAPJyYYDVynE5xxY,"Sing Me a Lullaby, My Sweet Temptation"
6,Lights Out,1kDX084EZutVgxAJCaxwIA,USB
7,Antarctica,5UGAXwbA17bUC0K9uquGY2,Antarctica
8,So What (We Go Jim Finale Edit),5vbGfYkPk8JHssJX2wUWWW,So What (We Go Jim Finale Edit)
9,Firework,1U4Mw7gjMfM4GZh9mYgw16,Firework
