# DATA ENGINEERING INDIVIDUAL COURSEWORK
## SPOTIFY PLAYLISTS: A Sentimental Analysis


This notebook contains the codes and markup texts for the data collection, processing, storage, exploratory analysis and machine learning processes of my individual project for MSIN0166 Data Engineering. 

# 1. Spotify Data 
Spotipy - Retrieving playlist and track data from Spotify

## 1.1 Workspace Preparation

On Spotify's Developer website, a python library called **Spotipy** is recommended. I decided to utilise this library for my data collection from Spotify.

Library Source/Documentation: https://github.com/plamere/spotipy

In [4]:
# Imports the library for utilising Spotify's API by Python
!pip install spotipy

[1m
         .:::.     .::.       
        ....yy:    .yy.       
        :.  .yy.    y.        
             :y:   .:         
             .yy  .:          
              yy..:           
              :y:.            
              .y.             
             .:.              
        ....:.                
        :::.                  
[0;33m
• Project files and data should be stored in /project. This is shared among everyone
  in the project.
• Personal files and configuration should be stored in /home/faculty.
• Files outside /project and /home/faculty will be lost when this server is terminated.
• Create custom environments to setup your servers reproducibly.
[0m


In [5]:
# Token initialisation
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

# Initialises the spotipy method by my own Spotify API tokens
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id="7b1fa7a7eb25461f8d3a4a66e1966de5",
                                                           client_secret="cd8fa0032d964a1bbf6381b3b471d74e"))


In [180]:
import sys
import pprint
from pprint import pprint
import pandas as pd

## 1.2 Which Playlist?

First of all, I want to retrieve a list of featured playlists (Editor's Picks) on Spotify to see if there is a playlist that looks like it is worthy of investigation. These playlists usually contain popular tracks and are curated by Spotify's in-house team (Occhino, 2020). 

In [25]:
response = sp.featured_playlists()
print(response['message'])

while response:
    playlists = response['playlists']
    for i, item in enumerate(playlists['items']):
        print(playlists['offset'] + i, item['name'])

    if playlists['next']:
        response = sp.next(playlists)
    else:
        response = None

Editor's picks
0 New Music Friday
1 Feel Good Friday
2 RapCaviar
3 Main Stage
4 I Love My '90s Hip-Hop
5 Mood Booster
6 Dance Hits
7 Today's Top Hits
8 just hits
9 Dance Party
10 Happy 80s
11 young & free


The sixth result is called **"Mood Booster"**, which is highly related to my research objective, which is trying to analyse the mood of a song with sentimental analysis. I am curious about how the songs in this playlist are like.

To dig deeper into this playlist, I need to know the playlist's ID on Spotify. Unfortunately, the only feasible way to get a playlist's ID is through getting a user's current playlist. Here, I have to manually follow this playlist on Spotify and add it to my profile using my own Spotify account.

In [27]:
# Shows a user's playlists

if len(sys.argv) > 1:
    username = sys.argv[1]
else:
    print("Whoops, need a username!")
    print("usage: python user_playlists.py [username]")
    sys.exit()

# Using my own username as input
playlists = sp.user_playlists("214hpvbrd65hxnr2sq2rmwmxa")

for playlist in playlists['items']:
    print(playlist['id'],playlist['name'])


37i9dQZF1DX3rxVfibe1L0 Mood Booster


We now get the Spotify ID of the playlist. With this result as the the parameter of our next methods, we will be able to retrieve more information about this the playlist.

## 1.3 Tracks in the Playlist

We will utilise the Spotify **playlist_tracks** method to get a fixed output of the tracks in a certain playlist.

### 1.3.1 Track List 

In [194]:
# Retrieves the track names in the playlist

# Input the playlist ID we got in the last step
pl_id = '37i9dQZF1DX3rxVfibe1L0'
offset = 0

# Creates an empty list to store the track lists
playlist_content_list = []

while True:
    playlist_content = sp.playlist_items(pl_id,
                                 offset=offset,
                                 fields='items.track.name,total',
                                 additional_types=['track'])
    
    if len(playlist_content['items']) == 0:
        break
    
    pprint(playlist_content['items'])
    
    # Appends the content to the empty list I created earlier
    playlist_content_list.append(playlist_content['items'])
    offset = offset + len(playlist_content['items'])

    # Shows the length of the playlist
    print(offset, "/", playlist_content['total'])

[{'track': {'name': 'Little Bit of Love'}},
 {'track': {'name': 'Can I Get It'}},
 {'track': {'name': 'Dancing Feet (feat. DNCE)'}},
 {'track': {'name': 'Better Days (NEIKED x Mae Muller x Polo G)'}},
 {'track': {'name': "Let's Fall in Love for the Night"}},
 {'track': {'name': 'Meet Me At Our Spot'}},
 {'track': {'name': 'Heat Waves'}},
 {'track': {'name': 'When I’m Gone (with Katy Perry)'}},
 {'track': {'name': 'Glad You Exist'}},
 {'track': {'name': 'Wild (feat. Gary Clark Jr.)'}},
 {'track': {'name': 'Butterflies'}},
 {'track': {'name': 'Lil Bit'}},
 {'track': {'name': 'Overpass Graffiti'}},
 {'track': {'name': 'You (with Marshmello & Vance Joy)'}},
 {'track': {'name': 'The Bones - with Hozier'}},
 {'track': {'name': 'You Were Loved (with OneRepublic)'}},
 {'track': {'name': 'dancing in the kitchen'}},
 {'track': {'name': 'Love Again'}},
 {'track': {'name': 'Know Your Worth'}},
 {'track': {'name': 'Blueberry Eyes (feat. SUGA of BTS)'}},
 {'track': {'name': 'Heartbreak Anthem (with 

There are 76 songs in the playlist. I now need to clean it into a list that only contains indices and the track names for better future usability.

In [195]:
# The current track list 
playlist_content_list

[[{'track': {'name': 'Little Bit of Love'}},
  {'track': {'name': 'Can I Get It'}},
  {'track': {'name': 'Dancing Feet (feat. DNCE)'}},
  {'track': {'name': 'Better Days (NEIKED x Mae Muller x Polo G)'}},
  {'track': {'name': "Let's Fall in Love for the Night"}},
  {'track': {'name': 'Meet Me At Our Spot'}},
  {'track': {'name': 'Heat Waves'}},
  {'track': {'name': 'When I’m Gone (with Katy Perry)'}},
  {'track': {'name': 'Glad You Exist'}},
  {'track': {'name': 'Wild (feat. Gary Clark Jr.)'}},
  {'track': {'name': 'Butterflies'}},
  {'track': {'name': 'Lil Bit'}},
  {'track': {'name': 'Overpass Graffiti'}},
  {'track': {'name': 'You (with Marshmello & Vance Joy)'}},
  {'track': {'name': 'The Bones - with Hozier'}},
  {'track': {'name': 'You Were Loved (with OneRepublic)'}},
  {'track': {'name': 'dancing in the kitchen'}},
  {'track': {'name': 'Love Again'}},
  {'track': {'name': 'Know Your Worth'}},
  {'track': {'name': 'Blueberry Eyes (feat. SUGA of BTS)'}},
  {'track': {'name': 'Hea

In [196]:
# Removes the outter list
playlist_content_list = playlist_content_list[0]
playlist_content_list

[{'track': {'name': 'Little Bit of Love'}},
 {'track': {'name': 'Can I Get It'}},
 {'track': {'name': 'Dancing Feet (feat. DNCE)'}},
 {'track': {'name': 'Better Days (NEIKED x Mae Muller x Polo G)'}},
 {'track': {'name': "Let's Fall in Love for the Night"}},
 {'track': {'name': 'Meet Me At Our Spot'}},
 {'track': {'name': 'Heat Waves'}},
 {'track': {'name': 'When I’m Gone (with Katy Perry)'}},
 {'track': {'name': 'Glad You Exist'}},
 {'track': {'name': 'Wild (feat. Gary Clark Jr.)'}},
 {'track': {'name': 'Butterflies'}},
 {'track': {'name': 'Lil Bit'}},
 {'track': {'name': 'Overpass Graffiti'}},
 {'track': {'name': 'You (with Marshmello & Vance Joy)'}},
 {'track': {'name': 'The Bones - with Hozier'}},
 {'track': {'name': 'You Were Loved (with OneRepublic)'}},
 {'track': {'name': 'dancing in the kitchen'}},
 {'track': {'name': 'Love Again'}},
 {'track': {'name': 'Know Your Worth'}},
 {'track': {'name': 'Blueberry Eyes (feat. SUGA of BTS)'}},
 {'track': {'name': 'Heartbreak Anthem (with 

In [197]:
# For loops to retrieves the inner information
playlist_content_list = [info[name] 
                         for i in playlist_content_list 
                         for track,info in i.items() 
                         for name in info]
playlist_content_list

['Little Bit of Love',
 'Can I Get It',
 'Dancing Feet (feat. DNCE)',
 'Better Days (NEIKED x Mae Muller x Polo G)',
 "Let's Fall in Love for the Night",
 'Meet Me At Our Spot',
 'Heat Waves',
 'When I’m Gone (with Katy Perry)',
 'Glad You Exist',
 'Wild (feat. Gary Clark Jr.)',
 'Butterflies',
 'Lil Bit',
 'Overpass Graffiti',
 'You (with Marshmello & Vance Joy)',
 'The Bones - with Hozier',
 'You Were Loved (with OneRepublic)',
 'dancing in the kitchen',
 'Love Again',
 'Know Your Worth',
 'Blueberry Eyes (feat. SUGA of BTS)',
 'Heartbreak Anthem (with David Guetta & Little Mix)',
 'My Universe',
 'Wave of You',
 'seaside_demo',
 'Acapulco',
 'WHERE WE ARE',
 "Let's go to Hell",
 'Dandelions',
 'Chasing Stars (feat. James Bay)',
 'West Coast',
 'Shivers',
 'Where Are You Now',
 'Share That Love (feat. G-Eazy)',
 'Way Less Sad',
 'Make You Mine',
 'Cloudy Day',
 'Catching Feelings (feat. Six60)',
 'Big Energy',
 'Sunshine',
 'Lost',
 'A-O-K',
 'Levitating (feat. DaBaby)',
 'I AM WOMAN

We now get a list that only contains the song name.

### 1.3.2 ID List

In [148]:
# Retrieves the id names of the tracks in the playlist

# Input the playlist ID we got in the last step
pl_id = '37i9dQZF1DX3rxVfibe1L0'
offset = 0

# Creates an empty list to store the id lists
playlist_id_list = []

while True:
    playlist_id = sp.playlist_items(pl_id,
                                 offset=offset,
                                 fields='items.track.id,total',
                                 additional_types=['track'])
    
    if len(playlist_id['items']) == 0:
        break
    
    pprint(playlist_id['items'])
    
    # Appends the content to the empty list I created earlier
    playlist_id_list.append(playlist_id['items'])
    offset = offset + len(playlist_id['items'])

    # Shows the length of the playlist
    print(offset, "/", playlist_id['total'])

[{'track': {'id': '78q4ESvMkPVJzHAV11LAGE'}},
 {'track': {'id': '6w8ZPYdnGajyfPddTWdthN'}},
 {'track': {'id': '4RAR8g8fZNB106ezUurnE0'}},
 {'track': {'id': '6f5ExP43esnvdKPddwKXJH'}},
 {'track': {'id': '7kQkmyoHCEqwe7QwDbkSXM'}},
 {'track': {'id': '07MDkzWARZaLEdKxo6yArG'}},
 {'track': {'id': '02MWAaffLxlfxAUY7c5dvx'}},
 {'track': {'id': '5902W4uHWzhtOff1UK7the'}},
 {'track': {'id': '472vIK1ldetTxRxG3ovaiY'}},
 {'track': {'id': '4rVW6XqAsSaf5vOwc8FREW'}},
 {'track': {'id': '7eQHxigpuDJjCG50JyzU8v'}},
 {'track': {'id': '0NmuYnjETG3u3qx0OmEJev'}},
 {'track': {'id': '4btFHqumCO31GksfuBLLv3'}},
 {'track': {'id': '1GkHyypTFkUf0QQKwYoXH4'}},
 {'track': {'id': '1yTTMcUhL7rtz08Dsgb7Qb'}},
 {'track': {'id': '4W1JavoraGzh83nluQHY6C'}},
 {'track': {'id': '0ohcCrxZkBfFbkuRPOZQZX'}},
 {'track': {'id': '1imMjt1YGNebtrtTAprKV7'}},
 {'track': {'id': '0TrPqhAMoaKUFLR7iYDokf'}},
 {'track': {'id': '5dn6QANKbf76pANGjMBida'}},
 {'track': {'id': '5K6Ssv4Z3zRvxt0P6EKUAP'}},
 {'track': {'id': '3FeVmId7tL5YN8B

In [150]:
# Same process to get a clean list that only contains song IDs

playlist_id_list
playlist_id_list = playlist_id_list[0]

playlist_id_list = [info[id] 
                         for i in playlist_id_list 
                         for track,info in i.items() 
                         for id in info]
playlist_id_list

['78q4ESvMkPVJzHAV11LAGE',
 '6w8ZPYdnGajyfPddTWdthN',
 '4RAR8g8fZNB106ezUurnE0',
 '6f5ExP43esnvdKPddwKXJH',
 '7kQkmyoHCEqwe7QwDbkSXM',
 '07MDkzWARZaLEdKxo6yArG',
 '02MWAaffLxlfxAUY7c5dvx',
 '5902W4uHWzhtOff1UK7the',
 '472vIK1ldetTxRxG3ovaiY',
 '4rVW6XqAsSaf5vOwc8FREW',
 '7eQHxigpuDJjCG50JyzU8v',
 '0NmuYnjETG3u3qx0OmEJev',
 '4btFHqumCO31GksfuBLLv3',
 '1GkHyypTFkUf0QQKwYoXH4',
 '1yTTMcUhL7rtz08Dsgb7Qb',
 '4W1JavoraGzh83nluQHY6C',
 '0ohcCrxZkBfFbkuRPOZQZX',
 '1imMjt1YGNebtrtTAprKV7',
 '0TrPqhAMoaKUFLR7iYDokf',
 '5dn6QANKbf76pANGjMBida',
 '5K6Ssv4Z3zRvxt0P6EKUAP',
 '3FeVmId7tL5YN8B7R3imoM',
 '5Ne1q9Hv3l2NHBA3Agt8WT',
 '73M0rMVx5CWE8M4uATSsto',
 '3eJH2nAjvNXdmPfBkALiPZ',
 '4MTmAFWHpvB9kPMSRgLFRp',
 '38XLUjlR84JEwK0SOvX77a',
 '2eAvDnpXP5W0cVtiI0PUxV',
 '6y6xhAgZjvxy5kR5rigpY3',
 '0sBJA2OCEECMs0HsdIQhvR',
 '6bQfNiqyCX7UaQSvVVGo4I',
 '3uUuGVFu1V7jTQL60S1r8z',
 '44l9nnCVvOQBbWG6tDViKl',
 '4jbtL4tjkqghUvJknUqU1s',
 '5iFwAOB2TFkPJk8sMlxP8g',
 '0mA7zotmg2ZFMRALljdZsS',
 '02VHspkXhhH1QCInRWWIfr',
 

### 1.3.3 Artist List

We also need to know the artists of each track to enable further investigation. By using the track IDs as inputs, we can retrieve a number of attributes of the track by using Spotify's API, including the artist information.

In [163]:
# Using a random track as the input to see the result example
track = sp.track('2LwH6T39A5IODRgPv9XitR')
pprint(track)

{'album': {'album_type': 'single',
           'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2ZmXexIJAD7PgABrj0qQRb'},
                        'href': 'https://api.spotify.com/v1/artists/2ZmXexIJAD7PgABrj0qQRb',
                        'id': '2ZmXexIJAD7PgABrj0qQRb',
                        'name': 'N.Flying',
                        'type': 'artist',
                        'uri': 'spotify:artist:2ZmXexIJAD7PgABrj0qQRb'}],
           'available_markets': ['AD',
                                 'AE',
                                 'AG',
                                 'AL',
                                 'AM',
                                 'AO',
                                 'AR',
                                 'AT',
                                 'AU',
                                 'AZ',
                                 'BA',
                                 'BB',
                                 'BD',
                                 'BE'

In [179]:
# Creates a list of artists of the tracks in the playlist

# Creates an empty list to store the values
playlist_artist_list = []

# For loops based on the example output
for i in range(len(playlist_id_list)):
    track = sp.track(playlist_id_list[i])
    playlist_artist_list.append(track['album']['artists'][0]['name'])

playlist_artist_list                              

['Tom Grennan',
 'Adele',
 'Kygo',
 'NEIKED',
 'FINNEAS',
 'THE ANXIETY',
 'Glass Animals',
 'Alesso',
 'Dan + Shay',
 'John Legend',
 'MAX',
 'Nelly',
 'Ed Sheeran',
 'benny blanco',
 'Maren Morris',
 'Gryffin',
 'LANY',
 'Dua Lipa',
 'Khalid',
 'MAX',
 'Galantis',
 'Coldplay',
 'Surfaces',
 'SEB',
 'Jason Derulo',
 'The Lumineers',
 'Tai Verdes',
 'Ruth B.',
 'Alesso',
 'OneRepublic',
 'Ed Sheeran',
 'Lost Frequencies',
 'Lukas Graham',
 'AJR',
 'PUBLIC',
 'Tones And I',
 'Drax Project',
 'Latto',
 'OneRepublic',
 'Maroon 5',
 'Tai Verdes',
 'Dua Lipa',
 'Emmy Meli',
 'Gryffin',
 'Camila Cabello',
 'The Weeknd',
 'Niall Horan',
 'Marshmello',
 'Lil Nas X',
 'The Kid LAROI',
 'Vance Joy',
 'Andy Grammer',
 'P!nk',
 'MisterWives',
 'The Weeknd',
 'Tones And I',
 'Justin Bieber',
 'Quinn XCII',
 'Bazzi',
 'Kane Brown',
 'Marshmello',
 'Surfaces',
 'Walker Hayes',
 'BØRNS',
 'John Legend',
 'Harry Styles',
 'BANNERS',
 'Post Malone',
 'Joel Corry',
 'Dominic Fike',
 'Charlie Puth',
 'Pea

### 1.3.4 Other track info
The **track** method contains much more information than the artist name. I will scrape some more attributes of the tracks in the playlist to allow a more extensive database.

#### Album

In [220]:
# Creates a list of album info of the tracks in the playlist

# Creates an empty list to store the values
playlist_album_list = []

# For loops based on the example output
for i in range(len(playlist_id_list)):
    track = sp.track(playlist_id_list[i])
    playlist_album_list.append(track['album']['name'])

playlist_album_list  

['Little Bit of Love (Acoustic)',
 '30',
 'Dancing Feet (feat. DNCE)',
 'Better Days (NEIKED x Mae Muller x Polo G)',
 'Let’s Fall In Love For The Night',
 'THE ANXIETY',
 'Dreamland (+ Bonus Levels)',
 'When I’m Gone (with Katy Perry)',
 'Glad You Exist',
 'Bigger Love',
 'Butterflies',
 'Lil Bit',
 '=',
 'You (with Marshmello & Vance Joy)',
 'The Bones (with Hozier)',
 'You Were Loved (with OneRepublic)',
 'dancing in the kitchen',
 'Future Nostalgia',
 'Know Your Worth',
 'Blueberry Eyes (feat. SUGA of BTS)',
 'Heartbreak Anthem (with David Guetta & Little Mix)',
 'My Universe',
 'Wave of You',
 'seaside_demo',
 'Acapulco',
 'BRIGHTSIDE',
 "Let's go to Hell",
 'Safe Haven',
 'Chasing Stars (feat. James Bay)',
 'West Coast',
 'Shivers',
 'Where Are You Now',
 'Share That Love (feat. G-Eazy)',
 'Way Less Sad',
 'Make You Mine',
 'Cloudy Day',
 'Catching Feelings (feat. Six60)',
 'Big Energy',
 'Sunshine',
 'JORDI (Deluxe)',
 'A-O-K',
 'Levitating (feat. DaBaby)',
 'I AM WOMAN',
 'Safe

#### Release Date

In [221]:
# Creates a list of release dates of the tracks in the playlist

# Creates an empty list to store the values
playlist_date_list = []

# For loops based on the example output
for i in range(len(playlist_id_list)):
    track = sp.track(playlist_id_list[i])
    playlist_date_list.append(track['album']['release_date'])

playlist_date_list

['2021-01-29',
 '2021-11-19',
 '2022-02-25',
 '2021-09-24',
 '2018-10-19',
 '2020-03-13',
 '2020-08-06',
 '2021-12-29',
 '2021-02-05',
 '2020-06-19',
 '2021-06-25',
 '2020-10-23',
 '2021-10-29',
 '2021-01-29',
 '2019-10-04',
 '2022-04-01',
 '2021-06-25',
 '2020-03-27',
 '2020-02-04',
 '2020-09-15',
 '2021-05-20',
 '2021-09-24',
 '2021-04-09',
 '2021-05-17',
 '2021-09-03',
 '2022-01-14',
 '2021-11-03',
 '2017-05-05',
 '2021-08-20',
 '2022-02-25',
 '2021-09-10',
 '2021-07-30',
 '2020-08-21',
 '2021-02-17',
 '2019-08-09',
 '2021-06-10',
 '2019-09-04',
 '2021-09-24',
 '2021-11-10',
 '2021-06-11',
 '2021-05-06',
 '2020-10-01',
 '2021-11-19',
 '2020-11-19',
 '2022-03-04',
 '2021-04-23',
 '2020-03-13',
 '2021-05-21',
 '2021-09-17',
 '2021-07-09',
 '2022-04-06',
 '2022-01-28',
 '2021-05-07',
 '2020-07-24',
 '2021-08-06',
 '2020-11-13',
 '2021-01-01',
 '2020-12-04',
 '2021-07-22',
 '2020-07-10',
 '2020-09-10',
 '2021-08-20',
 '2021-06-04',
 '2015-10-16',
 '2020-02-14',
 '2019-12-13',
 '2019-10-

#### Popularity

In [222]:
# Creates a list of popularity count of the tracks in the playlist provided by Spotify

# Creates an empty list to store the values
playlist_popularity_list = []

# For loops based on the example output
for i in range(len(playlist_id_list)):
    track = sp.track(playlist_id_list[i])
    playlist_popularity_list.append(track['popularity'])

playlist_popularity_list

[73,
 82,
 86,
 90,
 77,
 90,
 96,
 87,
 76,
 76,
 80,
 77,
 87,
 75,
 76,
 77,
 76,
 74,
 78,
 75,
 86,
 92,
 75,
 2,
 88,
 69,
 76,
 94,
 79,
 82,
 20,
 96,
 73,
 70,
 81,
 74,
 58,
 71,
 86,
 81,
 52,
 85,
 87,
 76,
 96,
 92,
 78,
 88,
 97,
 96,
 56,
 52,
 77,
 75,
 83,
 62,
 82,
 23,
 76,
 81,
 79,
 75,
 81,
 87,
 72,
 92,
 83,
 92,
 87,
 84,
 92,
 79,
 1,
 71,
 86,
 85]

#### Duration (in millisecond)

In [224]:
# Creates a list of duration in ms of the tracks in the playlist 

# Creates an empty list to store the values
playlist_duration_list = []

# For loops based on the example output
for i in range(len(playlist_id_list)):
    track = sp.track(playlist_id_list[i])
    playlist_duration_list.append(track['duration_ms'])

playlist_duration_list

[226268,
 210384,
 215203,
 160656,
 190348,
 162680,
 238805,
 161266,
 144533,
 196906,
 191250,
 195962,
 236906,
 169632,
 197298,
 221885,
 208599,
 258004,
 181436,
 172244,
 183725,
 228000,
 213842,
 132000,
 139672,
 172800,
 152181,
 233720,
 170457,
 192947,
 207853,
 148197,
 172398,
 206108,
 232906,
 185303,
 218854,
 173182,
 163854,
 172597,
 173640,
 203064,
 232813,
 205164,
 206070,
 191013,
 193089,
 154983,
 143901,
 141805,
 227240,
 193621,
 277413,
 213546,
 220196,
 178156,
 190779,
 206045,
 158083,
 191406,
 159862,
 148846,
 161853,
 218106,
 210236,
 174000,
 219801,
 193506,
 166028,
 177666,
 185680,
 210000,
 210463,
 225960,
 184104,
 163025]

#### Is it explicit?
Most music streaming services differentiate between music that is suitable for mainstream consumption, and those songs that may contain a parental advisory or may be considered explicit content. On Spotify, a track with explicit content will have a "E" or "Explicit" symbol next to its name. With the Spotify API, I am able to know if a song is explicit on Spotify, with boolean values of True of False.

In [226]:
# Creates a list of explicit boolean of the tracks in the playlist 

# Creates an empty list to store the values
playlist_explicit_list = []

# For loops based on the example output
for i in range(len(playlist_id_list)):
    track = sp.track(playlist_id_list[i])
    playlist_explicit_list.append(track['explicit'])

playlist_explicit_list

[False,
 False,
 False,
 False,
 True,
 True,
 False,
 False,
 False,
 False,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 True,
 True,
 False,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False]

### 1.4 Dataframe

Now that we have these lists in hand, we can now create a dataframe out of them.

In [228]:
# Creates a Python dictionary first
spotify_df = {'track_name': playlist_content_list, 
              'spotify_id': playlist_id_list, 
              'artist_name': playlist_artist_list, 
              'album': playlist_album_list,
              'spotify_popularity': playlist_popularity_list,
              'release_date': playlist_date_list,
              'duration': playlist_duration_list,
              'explicit_content': playlist_explicit_list
             }
 
# Creates a DataFrame from the dictionary
spotify_df = pd.DataFrame(spotify_df)
spotify_df

Unnamed: 0,track_name,spotify_id,artist_name,album,spotify_popularity,release_date,duration,explicit_content
0,Little Bit of Love,78q4ESvMkPVJzHAV11LAGE,Tom Grennan,Little Bit of Love (Acoustic),73,2021-01-29,226268,False
1,Can I Get It,6w8ZPYdnGajyfPddTWdthN,Adele,30,82,2021-11-19,210384,False
2,Dancing Feet (feat. DNCE),4RAR8g8fZNB106ezUurnE0,Kygo,Dancing Feet (feat. DNCE),86,2022-02-25,215203,False
3,Better Days (NEIKED x Mae Muller x Polo G),6f5ExP43esnvdKPddwKXJH,NEIKED,Better Days (NEIKED x Mae Muller x Polo G),90,2021-09-24,160656,False
4,Let's Fall in Love for the Night,7kQkmyoHCEqwe7QwDbkSXM,FINNEAS,Let’s Fall In Love For The Night,77,2018-10-19,190348,True
...,...,...,...,...,...,...,...,...
71,Mariposa,4ja2gzrNh9VNigzoXfmbwD,Peach Tree Rascals,Mariposa,79,2019-08-28,210000,False
72,Put Your Records On,1fah1uAs7HeTYDlNftKr3K,Ritt Momney,Put Your Records On,1,2020-04-24,210463,False
73,Message In A Bottle (Taylor's Version) (From T...,6PdCbJwSOeovMX7kfwiAxb,Taylor Swift,Red (Taylor's Version),71,2021-11-12,225960,False
74,Summer of Love (Shawn Mendes & Tainy),0z8hI3OPS8ADPWtoCjjLl6,Shawn Mendes,Summer Of Love,86,2021-08-20,184104,False


I now have my first dataframe, which is about the tracks in the Spotify playlist, **Mood Booster** and the tracks' corresponding IDs and other 5 pieces of information.

In [229]:
# Creates a CSV file for local storage of this Dataframe
spotify_df.to_csv('spotify_df.csv')

# 2. MusixMatch Data

Musixmatch is an Italian music data company which has a database of 14 millions lyrics in many languages (Baydeer, 2021).

## 2.1 Workspace Preparation

In [203]:
# Imports the requests library to submit the http request in Python
import requests

In [205]:
# Initialises the base url of the MusixMatch API
url = "https://api.musixmatch.com/ws/1.1/matcher.lyrics.get"

In [206]:
# Token initialisation

# Initialise the api_key with my own API token from MusixMatch's developer website
musixmatch_key = "016b5f69ece527e9ec8f1e2ff6e9f27c"

## 2.2 Getting the lyrics

I now want to get the lyrics of the 76 songs in our **Mood Booster** playlist. To get this information, I need both of the track name and the artist's name as the inputs. With my current free API plan on MusixMatch, my account is limited to 2k API Calls daily, and I only have access to 30% of the lyrics.

#### Random Example

I will first use a random song as the input to see the example output of the requests.

In [209]:
# Uses a random song - "Drive" by Halsey as the input
req = requests.get(url,params = {
    "apikey": musixmatch_key,
    "q_track": "Drive",
    "q_artist": "Halsey"
})


# Outputs in JSON
Drive = req.json()

Drive

{'message': {'header': {'status_code': 200, 'execute_time': 0.046715974807739},
  'body': {'lyrics': {'lyrics_id': 27157087,
    'explicit': 0,
    'lyrics_body': 'My hands wrapped around a stick shift\nSwerving on the 405, I can never keep my eyes off this\n\nMy neck, the feeling of your soft lips\nIlluminated in the light, bouncing off the exit signs I missed\n\nAll we do is drive\nAll we do is think about the feelings that we hide\nAll we do is sit in silence waiting for a sign\nSick and full of pride\nAll we do is drive\n...\n\n******* This Lyrics is NOT for Commercial use *******\n(1409622496242)',
    'script_tracking_url': 'https://tracking.musixmatch.com/t1.0/m_js/e_1/sn_0/l_27157087/su_0/rs_0/tr_3vUCAE7EMPi5_RZUyWb2YHSilsfzrTou2_VWwzFHkfSlfgxLPYU8P5OjC9aWLgn4MLxU-nmmsE_hhM-PDDl2dKRXV5XGwpNoZIiyurVx2lvJMa0wmokN16UHlEyr3GphvbyOPPAqsqu3G7VXYRa4zfybrIMgALcAOx0Dbts8UkJJ94S37WMo0VfVDmrrtARCCvCBLRRaPheMqSevYKbswqb-RzTlZVwL8gFp3e-Gsgmz5NcITlnmqJq8hBHJA72aeZmtN4LrZ2aYf1yrRX3PuemLbhu67r

#### Cleaning the output

I only need the lyrics of the song. Therefore here I am going through the nested dictionary to only retain the useful part.

In [211]:
Drive['message']['body']['lyrics']['lyrics_body'].strip('\t\n\r').replace('\n',' ')\
.replace('******* This Lyrics is NOT for Commercial use *******','')\
.replace('(1409622496242)','')

'My hands wrapped around a stick shift Swerving on the 405, I can never keep my eyes off this  My neck, the feeling of your soft lips Illuminated in the light, bouncing off the exit signs I missed  All we do is drive All we do is think about the feelings that we hide All we do is sit in silence waiting for a sign Sick and full of pride All we do is drive ...   '

Now I have a clean output which contains the lyrics (30%) strings only. 

#### Getting lyrics for the songs in my dataframe

Now it is time to request the lyrics for the 76 songs in my dataframe.

In [217]:
# Creates a list of lyrics strings for the track in the playlist

# Creates an empty list to store the values
playlist_lyrics_list = []

# For loops based on the example output
for i in range(len(playlist_content_list)):
    req = requests.get(url,params = {
    "apikey": musixmatch_key,
    "q_track": playlist_content_list[i],
    "q_artist": playlist_artist_list[i]
})
    lyrics = req.json()
    lyrics = lyrics['message']['body']['lyrics']['lyrics_body'].strip('\t\n\r').replace('\n',' ')\
.replace('******* This Lyrics is NOT for Commercial use *******','')\
.replace('(1409622496242)','')
    playlist_lyrics_list.append(lyrics)

playlist_lyrics_list

["I've been holding onto pieces Swimming in the deep end Tryna find my way back to you 'cause I'm needin' A little bit of love A little bit of love, a little bit of love  Lately I've been counting stars And I'm sorry that I broke your heart It's something that I didn't want for you But I'm stepping on broken glass And I know this is my final chance All I'm tryna do is find my path to you  I've got voices in my head And there's a deafening silence I've got voices in my head And I can't lie  I've been holding onto pieces Swimming in the deep end Tryna find my way back to you 'cause I'm needin' A little bit of love A little bit of love, I need a little love ...   ",
 "Pave me a path to follow And I'll tread any dangerous road I will beg and I'll steal, I will borrow If I can make it, if I can make your heart my home  Throw me to the water (water) I don't care how deep or shallow (water) Because my heart can pound like thunder (water) And your love, and your love can set me free (water)  O

## References

Occhino, L., 2020. How to get your music featured on Spotify playlists. [online] Bandzoogle.com. Available at: <https://bandzoogle.com/blog/how-to-get-your-music-featured-on-spotify-playlists> [Accessed 8 April 2022].

Baydeer, J., 2021. Let the Music Speak. [online] Medium. Available at: <https://medium.com/swlh/let-the-music-speak-8c524ed45809> [Accessed 9 April 2022].