<a href="https://colab.research.google.com/github/NaomiKemi/InClassAssignments/blob/main/Week_3_1_Spotify_Playlists_task_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 3.1 Audio Features and Spotify Playlists

For the second part of this week's lecture we are going to look at

* The audio features that``Spotify's API`` will give us about **any track in its collection**


* How these can be used to make recommendations based on song similarity


First we need to install the ``spotipy`` library. The Spotify API is just a ``REST API`` like we might have seen before, so we could access using the ``request`` library ourselves, but this just wraps up some stuff nicely for us!

In [None]:
## Install library
!pip install spotipy

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
##These relate to an App on my developer account at Spotify and are used for authentication
SPOTIPY_CLIENT_ID='4547ac9120de4cbba75a19ee6dc3b908'
SPOTIPY_CLIENT_SECRET='75fa035f5a0b47b7868e69a93ef878b4'

In [None]:
##First we authenticate
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

auth_manager = SpotifyClientCredentials(SPOTIPY_CLIENT_ID,SPOTIPY_CLIENT_SECRET)
sp = spotipy.Spotify(auth_manager=auth_manager)

### Pick a playlist

Now we need to pick a playlist to get some audio features for.

To get the ``id``, sign into the [Spotify Web Player](https://open.spotify.com/) in your browser. When you have chosen a playlist (can be one of yours, or someone elses), the ``id`` will be the string of letters and numbers at the end of the ``URL``.

In [None]:
#Put in the ID of a Spotify playlist of your choosing
playlist_id = '3vRMCRWMVXLidpsrTJrCpe'

In [None]:
limit = 50
trim = True

In [None]:
#Get playlsit
playlist = sp.playlist(playlist_id)
#Extract tracks
tracks = playlist["tracks"]["items"]
if trim:
    tracks = tracks[-limit:]
#Extract track IDs
ids = [track["track"]["id"] for track in tracks]
#Get audio features for tracks
features =  pd.DataFrame(sp.audio_features(ids))
#Label dataframe rows with track name and artist
labels = [track["track"]["artists"][0]["name"] + " - "+ track["track"]["name"] for track in tracks]
features.index = labels

In [None]:
features

### Audio Features

What audio features do we get? You can check out the [Spotify API Documentation](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-several-audio-features)

Some relate to known musical features such as ``key``, ``mode``,``tempo`` and ``time_signature``.

Some are more related to the signal of the audio file itself (``loudness``, ``duration_ms``).

Some are more descriptive (``danceability``, `liveness` etc..). As these are self defined by Spotify, how subjective do you think these ones might be?

In [None]:
#Check out features
features.columns

In [None]:
#Drop columns that are not audio features
my_features = ['danceability', 'energy', 'key',
            'loudness', 'mode', 'speechiness',
            'acousticness', 'instrumentalness',
            'liveness', 'valence', 'tempo', 'duration_ms',
            'time_signature']
features = features[my_features]
features

### Task 1 - Sort by Audio Features

In order to get a feel for features and what they represent, we are now going to use them to sort the playlist.

Try out some different features.

* What do low and high values for the each of the features sound like?


* Given your knowledge of the songs, does this feel correct?


* If there is a feature you are not sure what it measures, are you able to get a better understanding by looking at the sorted tracks?


* Does this sorted order of the playlist flow well?


In [None]:
sort_by = 'liveness'
features.sort_values(by = sort_by, ascending = False)[:10]

### Task 2 - Song Similarity

Now lets investigate which songs in the playlist are similar to each other based on a given set of features.

Change the features in ``my_subset_features`` and re-run the code each time.

* Do you think songs that have **high cosine similarity** (e.g. close to 1) are similar songs?


* Do different subsets of features make seem to provide better results in this regard?

In [None]:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import cosine_similarity as cosine
from sklearn.preprocessing import StandardScaler

In [None]:
my_subset_features = ['danceability', 'energy', 'key',
            'loudness', 'mode', 'speechiness',
            'acousticness', 'instrumentalness',
            'liveness', 'valence', 'tempo', 'duration_ms',
            'time_signature']

In [None]:
#Get subset
subset_features = features[my_subset_features]
#Standardise
scaled_features = StandardScaler().fit_transform(subset_features)
#Get cosine distances
similarities = cosine(scaled_features)
#Visualise
similarities = pd.DataFrame(similarities, columns = features.index, index = features.index)
similarities.style.background_gradient(cmap='Greens')

### Task 3 - Top N Recommendations

Now we're going to use the ``cosine similarity`` to make some recommedations!

Pick a track (``artist - title``) and see which ``n`` songs are closed to it.

* Are these good recommendations based on the seed track?


* For higher values of n, how far before the recommendations start getting irrelevant?


* Again, try different subsets of features (you will need to go back to Task 2 and run the code again)

In [None]:
track = "DJ Paypal - We Finally Made It"

In [None]:
#How many tracks to recommend
n = 10

In [None]:
similarities.sort_values(by = track, ascending=False)[track].index[1:n+1]

### Task 4 - PCA Visualisation

Do songs near each other sound the same? Given one, would the nearby songs make good recommendations?

You may have to reduce the size of ``subset_features`` (and re-run code from Task 2 again) if there are too many songs to see properly!

In [None]:
num_dimensions = 2

#Reduce dimensions
pca = PCA(n_components=num_dimensions)
reduced_features = pca.fit_transform(scaled_features)

In [None]:
#Plot
plt.figure(figsize=(8, 8))
fig,ax = plt.subplots(figsize=(15,15))
x = reduced_features[:,0]
y = reduced_features[:,1]
ax.plot(x,y,"bx")
annotations = subset_features.index
for i, label in enumerate(annotations):
    ax.annotate(label, (x[i], y[i]),alpha=1)