<a href="https://colab.research.google.com/github/DomMc97/Running-Playlist/blob/main/spotify_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Spotify Data
A notebook which sources my Liked song data from Spotify.

## Set Up

Imports libraries.

In [2]:
# data analytics library
import pandas as pd

# sklearn functions
from sklearn.preprocessing import StandardScaler

In [3]:
# spotify library
!pip install Spotipy
import spotipy 
from spotipy.oauth2 import SpotifyClientCredentials

Collecting Spotipy
  Downloading https://files.pythonhosted.org/packages/7a/cd/e7d9a35216ea5bfb9234785f3d8fa7c96d0e33999c2cb72394128f6b4cce/spotipy-2.16.1-py3-none-any.whl
Installing collected packages: Spotipy
Successfully installed Spotipy-2.16.1


Mounts Google Drive.

In [4]:
# libraries for saving/loading data
from google.colab import drive
import pickle

# mount drive
drive.mount('/content/drive')

# root of all file paths used in notebook
root = '/content/drive/My Drive/Spotify/Data/'

Mounted at /content/drive


## Load Spotify Data


Aquires access to my Spotify account using a Spotify API.

In [5]:
# user credentials
cid ='<My CID>'
secret = '<My Secret ID>'
username = '<My Username>'
scope = 'user-library-read'
redirect_uri='http://localhost:8888/callback/'

# access token
client_credentials_manager = spotipy.oauth2.SpotifyOAuth(
    scope=scope,
    username=username,
    client_id=cid,
    client_secret=secret,
    redirect_uri=redirect_uri,
    open_browser=False)

# aquires access to data
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Creates a dataframe of all my liked songs and their audio features.

**Key:**
The estimated overall key of the track.

**Acousticness:**
A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

**Danceability:**
Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. 

**Energy:**
Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity.

**Loudness:**
The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks.


**Spechiness**
Speechiness detects the presence of spoken words in a track.

**Tempo:**
The overall estimated tempo of a track in beats per minute (BPM). 



In [None]:
offset = 0
songs = []
names = []
ids = []

# acquries all my songs 
while True:
    content = sp.current_user_saved_tracks(limit=50, offset=offset)
    songs += content['items']
    if content['next'] is not None:
        offset += 50
    else:
        break

# gets song names and ids 
for i in songs:
    names.append(i['track']['name'])
    ids.append(i['track']['id'])

# gets all the audio features for all songs
index=0
audio_features=[]
while index < len(ids):
    audio_features += sp.audio_features(ids[index:index + 50])
    index += 50

# creates a list of features for each song
features_list = []
for features in audio_features:
    features_list.append([features['key'], features['acousticness'],
                          features['danceability'],features['energy'],
                          features['loudness'], features['speechiness'],
                          features['tempo']])

# creates a dataframe of my song library
songs_df = pd.DataFrame(features_list, columns=['key', 'acousticness', 
                                            'danceability', 'energy', 
                                            'loudness', 'speechiness', 
                                            'tempo'])
songs_df.insert(0, 'id', ids)
songs_df.insert(0, 'name', names)

## Process Data

Removes songs with no attribute data. 

In [7]:
songs_df = songs_df[songs_df['tempo'] != 0]

Displays the dataframe.

In [8]:
N, M = songs_df.shape
print('Columns: ', M, '\nRows:', N)
songs_df.head()

Columns:  9 
Rows: 2602


Unnamed: 0,name,id,key,acousticness,danceability,energy,loudness,speechiness,tempo
0,Sleep Walker,7KLikopJc6LCkohWF5QnHS,5,0.00497,0.452,0.611,-6.933,0.0336,125.018
1,Sex Hair,0zHDpXbBpamkXVfXkTQeUT,4,0.0282,0.495,0.935,-3.957,0.0495,169.917
2,Caught Up,645hr7BS1p0a2tT5A4rj1b,1,8.9e-05,0.352,0.787,-7.776,0.0472,167.896
3,Girl Like You,5ZsuhbdQSRS4wqggUrCz9e,1,0.35,0.629,0.823,-5.744,0.0597,169.961
4,The Difference,4nlvKIIetOWGIMyhjQXgOZ,1,0.255,0.56,0.865,-1.148,0.0909,159.96


Standardises the features to a balanced common scale.


In [10]:
# scaling function
scaler = StandardScaler()

# features 
features = songs_df.iloc[:,2:]

# fit scaler
scaler.fit(features)

# transform features according to this scale
scaled_features = scaler.transform(features)

# creates a scaled df
scaled_df = pd.DataFrame(scaled_features, columns=['key', 'acousticness', 
                                                   'danceability', 'energy', 
                                                   'loudness', 'speechiness', 
                                                   'tempo'])
scaled_df.insert(0, 'id', list(songs_df['id']))
scaled_df.insert(0, 'name', list(songs_df['name']))

Displays scaled data.

In [11]:
N, M = scaled_df.shape
print('Columns: ', M, '\nRows:', N)
scaled_df.head()

Columns:  9 
Rows: 2602


Unnamed: 0,name,id,key,acousticness,danceability,energy,loudness,speechiness,tempo
0,Sleep Walker,7KLikopJc6LCkohWF5QnHS,-0.102865,-0.464839,-0.438657,-0.984089,-0.35554,-0.567698,-0.130965
1,Sex Hair,0zHDpXbBpamkXVfXkTQeUT,-0.377382,-0.340965,-0.121657,0.977156,0.835729,-0.206493,1.516579
2,Caught Up,645hr7BS1p0a2tT5A4rj1b,-1.200935,-0.490868,-1.175867,0.081278,-0.692986,-0.258742,1.442419
3,Girl Like You,5ZsuhbdQSRS4wqggUrCz9e,-1.200935,1.37503,0.866205,0.299195,0.120407,0.025224,1.518193
4,The Difference,4nlvKIIetOWGIMyhjQXgOZ,-1.200935,0.868443,0.35753,0.55343,1.960149,0.734004,1.151212


## Saves Data

Saves the acquired data to my Google Drive.

In [12]:
# saves df's
songs_df.to_csv(root + 'mysongs.csv')
scaled_df.to_csv(root + 'scaledsongs.csv')

# saves scaler
with open(root  + 'Scaler.sav','wb') as f:
    pickle.dump(scaler, f)