# Valence-Arousal-Recommendation System

In this notebook, we use a dataset featuring around 1200 Spotify tracks, each with a `valence` and an `energy` value, to build a simple mood-based recommendation system.

In [4]:
import pandas as pd
import random
import authorization
import numpy as np
from numpy.linalg import norm

## 1. Preparations

__Load Data__

In [17]:
df = pd.read_csv("valence_arousal_dataset.csv")
print(df.shape)
df.head()

(1239, 6)


Unnamed: 0,id,genre,track_name,artist_name,valence,energy
0,3ohNs5SYgwCjnxl8QSjqvR,acoustic,Only Girl (In The World) (feat. Alex Goot),Boyce Avenue,0.467,0.56
1,2Gb3up6s243JSVuRRjwQoF,acoustic,Nothing On You / Hey Soul Sister feat. Sam Tsui,Ahmir,0.369,0.698
2,5Ukzlujip1Slqka5OY82YS,acoustic,U.N.I.,Ed Sheeran,0.578,0.405
3,1nuqzCMgj2lxZCmpdCmIGv,acoustic,A Lack of Color,Various Artists,0.357,0.337
4,12jjuxN1gxlm29cqL5M6MW,acoustic,I Got You,Jack Johnson,0.544,0.399


In order to compute distances between two tracks, we need to transform the seperate `valence`and `energy` columns to a `mood-vector` column.
This can be done by using `df.apply()` alongside a lambda function.

__Create Mood Vector__

In [18]:
df["mood_vec"] = df[["valence", "energy"]].values.tolist()
df["mood_vec"].head()

0     [0.467, 0.56]
1    [0.369, 0.698]
2    [0.578, 0.405]
3    [0.357, 0.337]
4    [0.544, 0.399]
Name: mood_vec, dtype: object

__Authorize Spotify API Access__

In [19]:
sp = authorization.authorize() 

## 2. Recommendation Algorithm

The algorithm that finds similar tracks to a given input track is now very simple. 
1. Crawl the track's `valence` and `energy` values from the Spotify API.
2. Compute the distances of the input track to each track in the reference dataset.
3. Sort the reference track from lowest to highest distance.
4. Return the `n` most similar tracks.

In [21]:
def recommend(track_id, ref_df, sp, n_recs = 5):
    
    # Crawl valence and arousal of given track from spotify api
    track_features = sp.track_audio_features(track_id)
    track_moodvec = np.array([track_features.valence, track_features.energy])
    print(f"mood_vec for {track_id}: {track_moodvec}")
    
    # Compute distances to all reference tracks
    ref_df["distances"] = ref_df["mood_vec"].apply(lambda x: norm(track_moodvec-np.array(x)))
    # Sort distances from lowest to highest
    ref_df_sorted = ref_df.sort_values(by = "distances", ascending = True)
    # If the input track is in the reference set, it will have a distance of 0, but should not be recommendet
    ref_df_sorted = ref_df_sorted[ref_df_sorted["id"] != track_id]
    
    # Return n recommendations
    return ref_df_sorted.iloc[:n_recs]

Let us try it out using some random tracks from our dataset.

In [22]:
track1 = random.choice(df["id"])
recommend(track_id = track1, ref_df = df, sp = sp, n_recs = 5)

mood_vec for 3DkaXDmC0qaFNAxgLMuwX4: [0.2   0.526]


Unnamed: 0,id,genre,track_name,artist_name,valence,energy,mood_vec,distances
546,0ew8oK4Yj8IhqE5Q7vpRvW,house,Eyes - R3hab Remix,Kaskade,0.191,0.525,"[0.191, 0.525]",0.009055
447,2zDADV891HD9xHi6hAnjQ9,groove,All My Life,K-Ci & JoJo,0.192,0.514,"[0.192, 0.514]",0.014422
1016,2RVHIJWVyGR499tMMQyffm,romance,The Story,Brandi Carlile,0.217,0.504,"[0.217, 0.504]",0.027803
561,7wz9sFeyf7bcRHK0xoNFKp,idm,Ants,edIT,0.168,0.524,"[0.168, 0.524]",0.032062
58,7MXXUCXsi4RlCVRBSvUM4u,anime,傷跡,Kalafina,0.2,0.564,"[0.2, 0.564]",0.038


In [24]:
all_my_life = "2zDADV891HD9xHi6hAnjQ9"
recommend(track_id = all_my_life, ref_df = df, sp = sp, n_recs = 5)

mood_vec for 2zDADV891HD9xHi6hAnjQ9: [0.192 0.514]


Unnamed: 0,id,genre,track_name,artist_name,valence,energy,mood_vec,distances
546,0ew8oK4Yj8IhqE5Q7vpRvW,house,Eyes - R3hab Remix,Kaskade,0.191,0.525,"[0.191, 0.525]",0.011045
1108,3DkaXDmC0qaFNAxgLMuwX4,soul,Dontchange,Musiq Soulchild,0.2,0.526,"[0.2, 0.526]",0.014422
561,7wz9sFeyf7bcRHK0xoNFKp,idm,Ants,edIT,0.168,0.524,"[0.168, 0.524]",0.026
1016,2RVHIJWVyGR499tMMQyffm,romance,The Story,Brandi Carlile,0.217,0.504,"[0.217, 0.504]",0.026926
1191,1KiWN1bwgN14bzjTKSO3T2,techno,Stampede - Original Mix,Dimitri Vegas & Like Mike,0.158,0.52,"[0.158, 0.52]",0.034525


In [25]:
vitamin = "7JEQSlU7K7RC12y3gubFq7"
recommend(track_id = vitamin, ref_df = df, sp = sp, n_recs = 5)

mood_vec for 7JEQSlU7K7RC12y3gubFq7: [0.651 0.898]


Unnamed: 0,id,genre,track_name,artist_name,valence,energy,mood_vec,distances
1223,7esvnbCJ4v9v6zvCDEJm0v,work-out,Reload (feat. Chip) - Radio Edit [Radio Edit],Wiley,0.643,0.906,"[0.643, 0.906]",0.011314
1001,3p77WVkXeHl6s9DBaAtUjZ,rockabilly,Long Blonde Hair,The Top Cats,0.645,0.912,"[0.645, 0.912]",0.015232
920,3fttmSWGThBQTNkuHMoCTN,punk,Girl's Not Grey,AFI,0.672,0.905,"[0.672, 0.905]",0.022136
158,1i5ZMT3Qoe35MgKpoJYbm7,children,Time for Your Check Up,Various Artists,0.632,0.911,"[0.632, 0.911]",0.023022
1148,5tXyNhNcsnn7HbcABntOSf,summer,DJ Got Us Fallin' In Love,Usher,0.669,0.878,"[0.669, 0.878]",0.026907
