# Project: Music Recommendation System using Spotify.

## Team:
    
#### 1. Devdutt Sharma
#### 2. Taylor Furry
#### 3. Yi Chin Tzou
#### 4. Scott Matsubara
#### 5. Sanket Mayekar

-----------------------------------

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import seaborn as sns
from sklearn.preprocessing import MultiLabelBinarizer

pd.set_option('display.max_columns', None)


#ignore warning messages 
import warnings
warnings.filterwarnings('ignore')

# Step 3. Data Pre-processing

In [2]:
spotify_new = pd.read_csv('spotify_data.csv')
spotify_new.head()

Unnamed: 0,track_id,artists,album_name,track_name,popularity,duration_ms,explicit,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,track_genre
0,5SuOikwiRyPMVoIQDJUgSV,Gen Hoshino,Comedy,Comedy,73,230666,0,0.676,0.461,1,-6.746,0,0.143,0.0322,1e-06,0.358,0.715,87.917,4,acoustic
1,4qPNDBW1i3p13qLCt0Ki3A,Ben Woodward,Ghost (Acoustic),Ghost - Acoustic,55,149610,0,0.42,0.166,1,-17.235,1,0.0763,0.924,6e-06,0.101,0.267,77.489,4,acoustic
2,1iJBSr7s7jYXzM8EGcbK5b,Ingrid Michaelson;ZAYN,To Begin Again,To Begin Again,57,210826,0,0.438,0.359,0,-9.734,1,0.0557,0.21,0.0,0.117,0.12,76.332,4,acoustic
3,6lfxq3CG4xtTiEg7opyCyx,Kina Grannis,Crazy Rich Asians (Original Motion Picture Sou...,Can't Help Falling In Love,71,201933,0,0.266,0.0596,0,-18.515,1,0.0363,0.905,7.1e-05,0.132,0.143,181.74,3,acoustic
4,5vjLSffimiIP26QG5WcN2K,Chord Overstreet,Hold On,Hold On,82,198853,0,0.618,0.443,2,-9.681,1,0.0526,0.469,0.0,0.0829,0.167,119.949,4,acoustic


In [3]:
spotify = spotify_new

### Remove unnecessary columns

In [4]:
spotify = spotify.drop(['duration_ms', 'explicit', 'mode', 'time_signature'], axis=1)

### Genres combined

In [5]:
spotify['Genre_Combined'] = spotify.groupby(['track_id'])['track_genre'].transform(lambda x: ','.join(x))
spotify['Genre_Combined'] = spotify['Genre_Combined'].str.split(',')

In [6]:
spotify = spotify.drop_duplicates(subset=['track_id'])

In [7]:
spotify = spotify.drop_duplicates(subset=['track_name'])

In [8]:
spotify.head()

Unnamed: 0,track_id,artists,album_name,track_name,popularity,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_genre,Genre_Combined
0,5SuOikwiRyPMVoIQDJUgSV,Gen Hoshino,Comedy,Comedy,73,0.676,0.461,1,-6.746,0.143,0.0322,1e-06,0.358,0.715,87.917,acoustic,"[acoustic, j-pop, singer-songwriter, songwriter]"
1,4qPNDBW1i3p13qLCt0Ki3A,Ben Woodward,Ghost (Acoustic),Ghost - Acoustic,55,0.42,0.166,1,-17.235,0.0763,0.924,6e-06,0.101,0.267,77.489,acoustic,"[acoustic, chill]"
2,1iJBSr7s7jYXzM8EGcbK5b,Ingrid Michaelson;ZAYN,To Begin Again,To Begin Again,57,0.438,0.359,0,-9.734,0.0557,0.21,0.0,0.117,0.12,76.332,acoustic,[acoustic]
3,6lfxq3CG4xtTiEg7opyCyx,Kina Grannis,Crazy Rich Asians (Original Motion Picture Sou...,Can't Help Falling In Love,71,0.266,0.0596,0,-18.515,0.0363,0.905,7.1e-05,0.132,0.143,181.74,acoustic,[acoustic]
4,5vjLSffimiIP26QG5WcN2K,Chord Overstreet,Hold On,Hold On,82,0.618,0.443,2,-9.681,0.0526,0.469,0.0,0.0829,0.167,119.949,acoustic,[acoustic]


In [9]:
spotify.shape

(73608, 17)

In [10]:
spotify.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 73608 entries, 0 to 113998
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   track_id          73608 non-null  object 
 1   artists           73608 non-null  object 
 2   album_name        73608 non-null  object 
 3   track_name        73608 non-null  object 
 4   popularity        73608 non-null  int64  
 5   danceability      73608 non-null  float64
 6   energy            73608 non-null  float64
 7   key               73608 non-null  int64  
 8   loudness          73608 non-null  float64
 9   speechiness       73608 non-null  float64
 10  acousticness      73608 non-null  float64
 11  instrumentalness  73608 non-null  float64
 12  liveness          73608 non-null  float64
 13  valence           73608 non-null  float64
 14  tempo             73608 non-null  float64
 15  track_genre       73608 non-null  object 
 16  Genre_Combined    73608 non-null  objec

### Scaling

In [11]:
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
spotify_num = spotify.select_dtypes(include=numerics)
spotify_num.columns

Index(['popularity', 'danceability', 'energy', 'key', 'loudness',
       'speechiness', 'acousticness', 'instrumentalness', 'liveness',
       'valence', 'tempo'],
      dtype='object')

####  Scale numerical features

In [12]:
from sklearn.preprocessing import MinMaxScaler

# create a scaler object
scaler = MinMaxScaler()
# fit and transform the data
spotify[spotify_num.columns] = pd.DataFrame(scaler.fit_transform(spotify_num), columns=spotify_num.columns)

In [13]:
spotify.head()

Unnamed: 0,track_id,artists,album_name,track_name,popularity,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_genre,Genre_Combined
0,5SuOikwiRyPMVoIQDJUgSV,Gen Hoshino,Comedy,Comedy,0.73,0.686294,0.461,0.090909,0.791392,0.148187,0.032329,1e-06,0.358,0.718593,0.361245,acoustic,"[acoustic, j-pop, singer-songwriter, songwriter]"
1,4qPNDBW1i3p13qLCt0Ki3A,Ben Woodward,Ghost (Acoustic),Ghost - Acoustic,0.55,0.426396,0.166,0.090909,0.597377,0.079067,0.927711,6e-06,0.101,0.268342,0.318397,acoustic,"[acoustic, chill]"
2,1iJBSr7s7jYXzM8EGcbK5b,Ingrid Michaelson;ZAYN,To Begin Again,To Begin Again,0.57,0.44467,0.359,0.0,0.736123,0.05772,0.210843,0.0,0.117,0.120603,0.313643,acoustic,[acoustic]
3,6lfxq3CG4xtTiEg7opyCyx,Kina Grannis,Crazy Rich Asians (Original Motion Picture Sou...,Can't Help Falling In Love,0.71,0.270051,0.0596,0.0,0.573701,0.037617,0.908635,7.1e-05,0.132,0.143719,0.746758,acoustic,[acoustic]
4,5vjLSffimiIP26QG5WcN2K,Chord Overstreet,Hold On,Hold On,0.82,0.627411,0.443,0.181818,0.737103,0.054508,0.470884,0.0,0.0829,0.167839,0.492863,acoustic,[acoustic]


In [14]:
spotify.shape

(73608, 17)

In [15]:
scaled_spotify = spotify

------------------------

### Saving the output to csv

In [16]:
scaled_spotify.to_csv('scaled_spotify_data_model.csv', index=False)

------------------

## One Hot Encoding

In [17]:
spotify_ohe = pd.read_csv('scaled_spotify_data_model.csv')

In [18]:
mlb = MultiLabelBinarizer(sparse_output=True)
spotify_ohe = spotify_ohe.join(pd.DataFrame.sparse.from_spmatrix(mlb.fit_transform(spotify.pop('Genre_Combined')),
                                                           index=spotify.index,
                                                           columns=mlb.classes_))

In [19]:
spotify_ohe = spotify_ohe.drop(columns = ['track_genre'])
spotify_ohe.dropna(subset=['acoustic'], inplace = True)

In [20]:
spotify_ohe.head()

Unnamed: 0,track_id,artists,album_name,track_name,popularity,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,Genre_Combined,acoustic,afrobeat,alt-rock,alternative,ambient,anime,black-metal,bluegrass,blues,brazil,breakbeat,british,cantopop,chicago-house,children,chill,classical,club,comedy,country,dance,dancehall,death-metal,deep-house,detroit-techno,disco,disney,drum-and-bass,dub,dubstep,edm,electro,electronic,emo,folk,forro,french,funk,garage,german,gospel,goth,grindcore,groove,grunge,guitar,happy,hard-rock,hardcore,hardstyle,heavy-metal,hip-hop,honky-tonk,house,idm,indian,indie,indie-pop,industrial,iranian,j-dance,j-idol,j-pop,j-rock,jazz,k-pop,kids,latin,latino,malay,mandopop,metal,metalcore,minimal-techno,mpb,new-age,opera,pagode,party,piano,pop,pop-film,power-pop,progressive-house,psych-rock,punk,punk-rock,r-n-b,reggae,reggaeton,rock,rock-n-roll,rockabilly,romance,sad,salsa,samba,sertanejo,show-tunes,singer-songwriter,ska,sleep,songwriter,soul,spanish,study,swedish,synth-pop,tango,techno,trance,trip-hop,turkish,world-music
0,5SuOikwiRyPMVoIQDJUgSV,Gen Hoshino,Comedy,Comedy,0.73,0.686294,0.461,0.090909,0.791392,0.148187,0.032329,1e-06,0.358,0.718593,0.361245,"['acoustic', 'j-pop', 'singer-songwriter', 'so...",1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,4qPNDBW1i3p13qLCt0Ki3A,Ben Woodward,Ghost (Acoustic),Ghost - Acoustic,0.55,0.426396,0.166,0.090909,0.597377,0.079067,0.927711,6e-06,0.101,0.268342,0.318397,"['acoustic', 'chill']",1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1iJBSr7s7jYXzM8EGcbK5b,Ingrid Michaelson;ZAYN,To Begin Again,To Begin Again,0.57,0.44467,0.359,0.0,0.736123,0.05772,0.210843,0.0,0.117,0.120603,0.313643,['acoustic'],1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,6lfxq3CG4xtTiEg7opyCyx,Kina Grannis,Crazy Rich Asians (Original Motion Picture Sou...,Can't Help Falling In Love,0.71,0.270051,0.0596,0.0,0.573701,0.037617,0.908635,7.1e-05,0.132,0.143719,0.746758,['acoustic'],1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5vjLSffimiIP26QG5WcN2K,Chord Overstreet,Hold On,Hold On,0.82,0.627411,0.443,0.181818,0.737103,0.054508,0.470884,0.0,0.0829,0.167839,0.492863,['acoustic'],1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [21]:
spotify_ohe.shape

(50698, 130)

### Saving the output to csv

In [22]:
spotify_ohe.to_csv('ohe_scaled_spotify_data_model.csv', index=False)

------------

# Step 4: Building Model

In [23]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import seaborn as sns
from sklearn.preprocessing import MultiLabelBinarizer

pd.set_option('display.max_columns', None)


#ignore warning messages 
import warnings
warnings.filterwarnings('ignore')

In [24]:
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from scipy import sparse

In [25]:
spotify_model = pd.read_csv('scaled_spotify_data_model.csv')
spotify_model_ohe = pd.read_csv('ohe_scaled_spotify_data_model.csv')

In [26]:
spotify_model.head(2)

Unnamed: 0,track_id,artists,album_name,track_name,popularity,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_genre,Genre_Combined
0,5SuOikwiRyPMVoIQDJUgSV,Gen Hoshino,Comedy,Comedy,0.73,0.686294,0.461,0.090909,0.791392,0.148187,0.032329,1e-06,0.358,0.718593,0.361245,acoustic,"['acoustic', 'j-pop', 'singer-songwriter', 'so..."
1,4qPNDBW1i3p13qLCt0Ki3A,Ben Woodward,Ghost (Acoustic),Ghost - Acoustic,0.55,0.426396,0.166,0.090909,0.597377,0.079067,0.927711,6e-06,0.101,0.268342,0.318397,acoustic,"['acoustic', 'chill']"


In [27]:
spotify_model_ohe.head(2)

Unnamed: 0,track_id,artists,album_name,track_name,popularity,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,Genre_Combined,acoustic,afrobeat,alt-rock,alternative,ambient,anime,black-metal,bluegrass,blues,brazil,breakbeat,british,cantopop,chicago-house,children,chill,classical,club,comedy,country,dance,dancehall,death-metal,deep-house,detroit-techno,disco,disney,drum-and-bass,dub,dubstep,edm,electro,electronic,emo,folk,forro,french,funk,garage,german,gospel,goth,grindcore,groove,grunge,guitar,happy,hard-rock,hardcore,hardstyle,heavy-metal,hip-hop,honky-tonk,house,idm,indian,indie,indie-pop,industrial,iranian,j-dance,j-idol,j-pop,j-rock,jazz,k-pop,kids,latin,latino,malay,mandopop,metal,metalcore,minimal-techno,mpb,new-age,opera,pagode,party,piano,pop,pop-film,power-pop,progressive-house,psych-rock,punk,punk-rock,r-n-b,reggae,reggaeton,rock,rock-n-roll,rockabilly,romance,sad,salsa,samba,sertanejo,show-tunes,singer-songwriter,ska,sleep,songwriter,soul,spanish,study,swedish,synth-pop,tango,techno,trance,trip-hop,turkish,world-music
0,5SuOikwiRyPMVoIQDJUgSV,Gen Hoshino,Comedy,Comedy,0.73,0.686294,0.461,0.090909,0.791392,0.148187,0.032329,1e-06,0.358,0.718593,0.361245,"['acoustic', 'j-pop', 'singer-songwriter', 'so...",1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,4qPNDBW1i3p13qLCt0Ki3A,Ben Woodward,Ghost (Acoustic),Ghost - Acoustic,0.55,0.426396,0.166,0.090909,0.597377,0.079067,0.927711,6e-06,0.101,0.268342,0.318397,"['acoustic', 'chill']",1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# 1. Content-based recommender system.

### We will work with 40k observations

In [28]:
spot_mod = spotify_model.head(40000)
spot_mod.shape

(40000, 17)

#### features selection for recommendations

In [29]:
features = ['popularity', 'danceability', 'energy', 'key', 'loudness', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']

#### Create a string of all the features for each track

In [30]:
spot_mod['features'] = spot_mod[features].apply(lambda x: ' '.join(x.astype(str)), axis=1)

#### Vectorize the features

In [31]:
vectorizer = CountVectorizer().fit_transform(spot_mod['features'])

#### Compute cosine similarity between all tracks

In [32]:
cosine_similarities = cosine_similarity(vectorizer)

In [33]:
def get_recommendations(track_id, cosine_similarities=cosine_similarities):
    # Get the index of the track with the given ID
    idx = spot_mod[spot_mod['track_id'] == track_id].index[0]
    
    # Get the cosine similarities between the given track and all other tracks
    sim_scores = list(enumerate(cosine_similarities[idx]))
    
    # Sort the tracks by their similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get the top 10 most similar tracks
    sim_scores = sim_scores[1:11]
    
    # Get the indices and titles of the top 10 most similar tracks
    track_indices = [i[0] for i in sim_scores]
    track_names = spot_mod['track_name'].iloc[track_indices].values
    track_id = spot_mod['track_id'].iloc[track_indices].values
    genre = spot_mod['Genre_Combined'].iloc[track_indices].values
    
    # Create a DataFrame with the track names and similarity scores as columns
    recommendations_df = pd.DataFrame({'track_id': track_id, 'track_name': track_names, 'genre': genre, 'similarity_score': [i[1] for i in sim_scores]})
    
    # Return the DataFrame
    return recommendations_df

#### get recommendations for track with ID

In [34]:
get_recommendations('4qPNDBW1i3p13qLCt0Ki3A')

Unnamed: 0,track_id,track_name,genre,similarity_score
0,2cuPD2epKOW51aapIlMEak,9 Variations on ‘Lison dormait’ from ‘Julie’ b...,['classical'],0.273861
1,3d8kYb3GqZkiHb6annxKEo,Swimmers,['acoustic'],0.261116
2,4Vfgc4BdSKM2hRlkaRFc9C,Bring It On Home To Me,['blues'],0.261116
3,2HXP76Wr3Cj2MlUQVa80Zi,Vem Ver (Ao Vivo),"['brazil', 'r-n-b', 'reggae']",0.261116
4,10d4n3B7NOZxxPs2uEBIDE,"Noëls sur les instruments, H 531, 534: 1. Vous...","['british', 'classical']",0.261116
5,3M01ewRy7GgG7JkSjy9EqN,Mad,['british'],0.261116
6,4bhstP5EAtJRiKOlkJZrW7,Eu Me Apaixonei Pela Morena,['forro'],0.261116
7,5brs9JsOe5Dx0joPjr8zjN,The Bed Is Burning,['grindcore'],0.261116
8,4vBk67mVsxz5tZ1LMMTkd9,Fogo (Adubado),['afrobeat'],0.25
9,4N7h4IHWRaJCOo1VFdTMHV,Truman Sleeps,['ambient'],0.25


In [35]:
get_recommendations('5SuOikwiRyPMVoIQDJUgSV')

Unnamed: 0,track_id,track_name,genre,similarity_score
0,15gcqcGzUYF56VI0GgpOYH,Mala,['guitar'],0.273861
1,63QsFC6m0CaTw2gU0iZa7N,Conrad,['british'],0.261116
2,2A9l1TsM78JQqQMYqAQEZe,Red Lights (feat. Wale),['chill'],0.261116
3,1gCQSac65SXFh7FPW6MQz3,Cry 4 U - Sheryl's Authentic Love Festival Mix,['detroit-techno'],0.261116
4,5JnK6FSCbpoTAXMULIpTi8,Zochrot,['grindcore'],0.261116
5,6mmyRIjtOmYDYxaFRDByfY,Blood,['afrobeat'],0.25
6,1M0RL0QLS6P25NwhPwOKaE,All wit My Hands,['alternative'],0.25
7,4uaRUpAdVcjF5ME0QoZfuG,"12 Variations in C Major on ""Ah, vous dirai-je...",['classical'],0.25
8,5cBCQJtvj9PThDtnLx6HM4,My Naffew Randy,['detroit-techno'],0.25
9,4LfGDWBS4Mlt9mBPVzr99o,New Orleans,['drum-and-bass'],0.25


In [36]:
get_recommendations('1EzrEOXmMH3G43AXT1y7pA')

Unnamed: 0,track_id,track_name,genre,similarity_score
0,6fD8ugOa1yUsW8lGnLFSCZ,308,['death-metal'],0.213201
1,0UK44t60tDM37GEAETglwp,Gårdakvarnar och skit (Live),['goth'],0.213201
2,1txGe6o7tZRqifxZ7daYXv,Another Day - Extended Mix,['happy'],0.213201
3,6FUZY1855hKvmtVRkzpWPC,Laat Maar Waaien,"['happy', 'hardcore']",0.204124
4,3CEntclq2oGD3AkwWsoTgJ,Cria Em Mim - Ao Vivo,"['brazil', 'gospel']",0.125
5,0YG5wOXVpvNUyPihvr1zsa,To Risk to Live,['acoustic'],0.117851
6,5TMgvUss7jcR3HSdMUP9Cf,Club Tonight - Cara B,['afrobeat'],0.117851
7,2vGDTOtslhkco6LVFfdEI2,Black Hole Sun,"['alt-rock', 'alternative', 'grunge', 'hard-ro...",0.117851
8,3gpgMfFAQDkVgcY1u76Ccx,Hola Noviembre,['alt-rock'],0.117851
9,6Xs9eaQXZaKfPkEyETxNl6,Night,['ambient'],0.117851


In [37]:
recommendations_df =get_recommendations('6Xs9eaQXZaKfPkEyETxNl6')

print(f"Recommendations for track '{spot_mod[spot_mod['track_id']=='6fD8ugOa1yUsW8lGnLFSCZ']['track_name'].values[0]}' ({'6fD8ugOa1yUsW8lGnLFSCZ'}):")
recommendations_df[['track_name', 'genre', 'similarity_score']]

Recommendations for track '308' (6fD8ugOa1yUsW8lGnLFSCZ):


Unnamed: 0,track_name,genre,similarity_score
0,Yo Darlin',['dancehall'],0.222222
1,God I Look to You,"['ambient', 'world-music']",0.210819
2,Dark Hollow,['bluegrass'],0.210819
3,Delitfol,['drum-and-bass'],0.210819
4,Erilaisen rakkauden todistaja,['heavy-metal'],0.210819
5,Neck Breakers,['dancehall'],0.201008
6,Knights Forever,['garage'],0.19245
7,I'm Yours,"['acoustic', 'rock']",0.117851
8,Good night,"['acoustic', 'j-pop', 'j-rock']",0.117851
9,Cria Em Mim - Ao Vivo,"['brazil', 'gospel']",0.117851


#### Because content-based recommendation is not a supervised learning model, no standard evaluation metric such as accuracy, precision, recall, or F1-score applies. 

#### Alternatively, we may assess the model's performance by manually deciding if the recommendations it offers for a given track are relevant and useful.

-------------

### We wanted to explore if there is any difference in recommendations after one hot encoding

In [38]:
spot_mod_ohe = spotify_model_ohe.head(40000)
spot_mod_ohe.shape

(40000, 130)

#### select all the features in the dataset excluding 'track_id', 'track_name', 'artists', 'album_name','Genre_Combined'

In [39]:
features_ohe = spot_mod_ohe.columns.difference(['track_id', 'track_name', 'artists', 'album_name','Genre_Combined'])
len(features_ohe)

125

#### Create a string of all the features for each track

In [40]:
spot_mod_ohe['features_ohe'] = spot_mod_ohe[features_ohe].apply(lambda x: ' '.join(x.astype(str)), axis=1)

#### Vectorize the features

In [41]:
vectorizer_ohe = CountVectorizer().fit_transform(spot_mod_ohe['features_ohe'])

#### Compute cosine similarity between all tracks

In [42]:
cosine_similarities_ohe = cosine_similarity(vectorizer_ohe)

In [43]:
def get_recommendations_ohe(track_id, cosine_similarities=cosine_similarities_ohe):
    # Get the index of the track with the given ID
    idx = spot_mod_ohe[spot_mod_ohe['track_id'] == track_id].index[0]
    
    # Get the cosine similarities between the given track and all other tracks
    sim_scores = list(enumerate(cosine_similarities[idx]))
    
    # Sort the tracks by their similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get the top 10 most similar tracks
    sim_scores = sim_scores[1:11]
    
    # Get the indices and titles of the top 10 most similar tracks
    track_indices = [i[0] for i in sim_scores]
    track_names = spot_mod_ohe['track_name'].iloc[track_indices].values
    track_id = spot_mod_ohe['track_id'].iloc[track_indices].values
    genre = spot_mod_ohe['Genre_Combined'].iloc[track_indices].values
    
    recommendations_df = pd.DataFrame({'track_id': track_id, 'track_name': track_names, 'genre': genre, 'similarity_score': [i[1] for i in sim_scores]})
    
    return recommendations_df

In [44]:
get_recommendations_ohe('4qPNDBW1i3p13qLCt0Ki3A')

Unnamed: 0,track_id,track_name,genre,similarity_score
0,2cuPD2epKOW51aapIlMEak,9 Variations on ‘Lison dormait’ from ‘Julie’ b...,['classical'],0.273861
1,3d8kYb3GqZkiHb6annxKEo,Swimmers,['acoustic'],0.261116
2,4Vfgc4BdSKM2hRlkaRFc9C,Bring It On Home To Me,['blues'],0.261116
3,2HXP76Wr3Cj2MlUQVa80Zi,Vem Ver (Ao Vivo),"['brazil', 'r-n-b', 'reggae']",0.261116
4,10d4n3B7NOZxxPs2uEBIDE,"Noëls sur les instruments, H 531, 534: 1. Vous...","['british', 'classical']",0.261116
5,3M01ewRy7GgG7JkSjy9EqN,Mad,['british'],0.261116
6,4bhstP5EAtJRiKOlkJZrW7,Eu Me Apaixonei Pela Morena,['forro'],0.261116
7,5brs9JsOe5Dx0joPjr8zjN,The Bed Is Burning,['grindcore'],0.261116
8,4vBk67mVsxz5tZ1LMMTkd9,Fogo (Adubado),['afrobeat'],0.25
9,4u4yBJmIws6XF0bfJx12tA,Ceremony for the Gathering of Death,['black-metal'],0.25


In [45]:
get_recommendations_ohe('5SuOikwiRyPMVoIQDJUgSV')

Unnamed: 0,track_id,track_name,genre,similarity_score
0,15gcqcGzUYF56VI0GgpOYH,Mala,['guitar'],0.273861
1,63QsFC6m0CaTw2gU0iZa7N,Conrad,['british'],0.261116
2,2A9l1TsM78JQqQMYqAQEZe,Red Lights (feat. Wale),['chill'],0.261116
3,6mmyRIjtOmYDYxaFRDByfY,Blood,['afrobeat'],0.25
4,1M0RL0QLS6P25NwhPwOKaE,All wit My Hands,['alternative'],0.25
5,5cBCQJtvj9PThDtnLx6HM4,My Naffew Randy,['detroit-techno'],0.25
6,4LfGDWBS4Mlt9mBPVzr99o,New Orleans,['drum-and-bass'],0.25
7,5RHEWqyahElZpVulOa10t8,The River and the Mountain,['guitar'],0.25
8,64D3mSyqF6wwuxkdOeQmbl,The Triumph of King Freak (A Crypt of Preserva...,['industrial'],0.25
9,4RTTcgwT6FXqDEWdraGb2n,ようこそ!イコラブ沼,['j-idol'],0.240192


In [46]:
get_recommendations_ohe('1EzrEOXmMH3G43AXT1y7pA')

Unnamed: 0,track_id,track_name,genre,similarity_score
0,6fD8ugOa1yUsW8lGnLFSCZ,308,['death-metal'],0.213201
1,1txGe6o7tZRqifxZ7daYXv,Another Day - Extended Mix,['happy'],0.213201
2,6FUZY1855hKvmtVRkzpWPC,Laat Maar Waaien,"['happy', 'hardcore']",0.204124
3,4WV73uVSyGORPU6orKLvlg,Make Great Mistaks,['kids'],0.125
4,44XjoNvtwevktFKjvVe1vH,Andrea,"['latino', 'reggaeton']",0.125
5,0YG5wOXVpvNUyPihvr1zsa,To Risk to Live,['acoustic'],0.117851
6,5TMgvUss7jcR3HSdMUP9Cf,Club Tonight - Cara B,['afrobeat'],0.117851
7,2vGDTOtslhkco6LVFfdEI2,Black Hole Sun,"['alt-rock', 'alternative', 'grunge', 'hard-ro...",0.117851
8,3gpgMfFAQDkVgcY1u76Ccx,Hola Noviembre,['alt-rock'],0.117851
9,6IWumNs0oyQfa8mLd5rjcn,Voila!,['bluegrass'],0.117851


------------------------

# 2. Popularity-based recommender system

In [47]:
item_popularity = spot_mod.groupby(['track_id', 'track_name', 'Genre_Combined'])['popularity'].mean().reset_index()
item_popularity = item_popularity.sort_values('popularity', ascending=False)
top_items = item_popularity.head(10)

## top 10 most popular tracks

In [48]:
top_items[['track_id', 'track_name', 'Genre_Combined']]

Unnamed: 0,track_id,track_name,Genre_Combined
27465,5N7EGcJp355QVIUsISLXXX,"Sonata for Violin and Cello in C Major, M. 73:...",['classical']
11357,2E01rRrzaz6TuwCdm27m4m,How Was Your Day?,['garage']
22988,4UsNNKzfrPBp3np4Eiznat,Uthi Uthi Gopala,['classical']
25823,532WuZy5kwDhU8iNUbLKQK,Pra Não Dizer Que Não Falei Das Flores,"['hard-rock', 'r-n-b']"
30429,5wgUKICt3iu2spNwz3jugc,Secret Bb,['idm']
2190,0QwZfbw26QeUoIy82Z2jYp,Good Times Bad Times - 1993 Remaster,['hard-rock']
35331,6sj4V2wZXZ2hW2kChTQRKs,"天使 - 劇集 ""寶寶大過天"" 主題曲",['cantopop']
37235,7Fhr46PhABM7LdDVLeucUP,Hoje Eu Só Procuro A Minha Paz,"['hard-rock', 'r-n-b']"
11181,2C9ZfqsCQ2bSI51q4uJHaU,"Modulating Prelude, K.284a/2","['classical', 'classical']"
29034,5gXeGIJCGNycRlkoBIpBeJ,Must Reach,['afrobeat']


## top 10 most popular 'classical' genre tracks

In [49]:
classical_tracks = item_popularity[item_popularity['Genre_Combined'].apply(lambda x: 'classical' in x)]
top_classical_tracks = classical_tracks.head(10)
top_classical_tracks[['track_id', 'track_name', 'Genre_Combined']]

Unnamed: 0,track_id,track_name,Genre_Combined
27465,5N7EGcJp355QVIUsISLXXX,"Sonata for Violin and Cello in C Major, M. 73:...",['classical']
22988,4UsNNKzfrPBp3np4Eiznat,Uthi Uthi Gopala,['classical']
11181,2C9ZfqsCQ2bSI51q4uJHaU,"Modulating Prelude, K.284a/2","['classical', 'classical']"
12711,2UscVwaSo0fbaX5QxVgqH8,Hourglass,['classical']
23868,4eytLZ543ZiGAEs55xd0Ww,Murugane - Gsivachidambaram,['classical']
19301,3lIefB8ZiKOmxk7fRlPZtI,"The Nutcracker, Op. 71, Act II, Scene 3: No. 1...",['classical']
16108,38znqxzVreO8WnoEkBIAJI,"Symphony No. 22 in C, K.162: 3. Presto assai","['classical', 'classical']"
12758,2VMOoFwSgDRJoNq4n3s93u,Harry Potter (The Ultimate Indian Theme),['classical']
8692,1htyTZxRyirh8rEa8dvIOq,Majha Bhav Tujhe Charani,['classical']
24519,4nakDBdFLfwVoWCvvsFskf,12 Variations on ‘La belle Françoise’ in E fla...,['classical']


## top 10 most popular 'hard-rock' genre tracks

In [50]:
hard_rock_items = item_popularity[item_popularity['Genre_Combined'].apply(lambda x: 'hard-rock' in x)]
hard_rock_items = hard_rock_items.head(10)
hard_rock_items[['track_id', 'track_name', 'Genre_Combined']]

Unnamed: 0,track_id,track_name,Genre_Combined
25823,532WuZy5kwDhU8iNUbLKQK,Pra Não Dizer Que Não Falei Das Flores,"['hard-rock', 'r-n-b']"
2190,0QwZfbw26QeUoIy82Z2jYp,Good Times Bad Times - 1993 Remaster,['hard-rock']
37235,7Fhr46PhABM7LdDVLeucUP,Hoje Eu Só Procuro A Minha Paz,"['hard-rock', 'r-n-b']"
17646,3RtP7chdHFagVWsEmi1yux,Be Myself,"['hard-rock', 'r-n-b']"
33470,6WE7jSshLCuVKoCmobVKVf,Heartbreaker - 1990 Remaster,['hard-rock']
34015,6cpbVrZueaJIklHRs9cGFI,Phunky Buddha,"['hard-rock', 'punk-rock', 'punk']"
7844,1XXp1AY5XKaHIgn6huNzvP,O Lobo,['hard-rock']
28488,5a0bHLSL5t4eRNKjmh9PyV,Tão Iguais,"['hard-rock', 'hardcore', 'punk-rock']"
3495,0gkWw8Cx8D9f9wNhNMBy59,"Ritmo, Ritual E Responsa",['hard-rock']
7427,1SkwlmVJaqikct9OqPPn78,O Cidadão Do Mundo,['hard-rock']


## Show Popualrity Scores

In [51]:
hard_rock = spot_mod[spot_mod['Genre_Combined'].apply(lambda x: 'hard-rock' in x)]

#### Group by track and calculate mean popularity score

In [52]:
item_popularity = hard_rock.groupby(['track_id', 'track_name', 'Genre_Combined'])['popularity'].mean().reset_index()
item_popularity = item_popularity.sort_values('popularity', ascending=False)

In [53]:
top_hard_rock = item_popularity.head(10)
top_hard_rock[['track_id', 'track_name', 'Genre_Combined', 'popularity']]

Unnamed: 0,track_id,track_name,Genre_Combined,popularity
504,532WuZy5kwDhU8iNUbLKQK,Pra Não Dizer Que Não Falei Das Flores,"['hard-rock', 'r-n-b']",0.96
50,0QwZfbw26QeUoIy82Z2jYp,Good Times Bad Times - 1993 Remaster,['hard-rock'],0.95
742,7Fhr46PhABM7LdDVLeucUP,Hoje Eu Só Procuro A Minha Paz,"['hard-rock', 'r-n-b']",0.93
355,3RtP7chdHFagVWsEmi1yux,Be Myself,"['hard-rock', 'r-n-b']",0.91
657,6WE7jSshLCuVKoCmobVKVf,Heartbreaker - 1990 Remaster,['hard-rock'],0.91
669,6cpbVrZueaJIklHRs9cGFI,Phunky Buddha,"['hard-rock', 'punk-rock', 'punk']",0.91
153,1XXp1AY5XKaHIgn6huNzvP,O Lobo,['hard-rock'],0.9
560,5a0bHLSL5t4eRNKjmh9PyV,Tão Iguais,"['hard-rock', 'hardcore', 'punk-rock']",0.9
148,1SkwlmVJaqikct9OqPPn78,O Cidadão Do Mundo,['hard-rock'],0.89
78,0gkWw8Cx8D9f9wNhNMBy59,"Ritmo, Ritual E Responsa",['hard-rock'],0.89


-----------------------------------------

# Step 5: Model Evaluation

In [54]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from math import sqrt
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

### We will work with 10k observations

In [55]:
spot_mod_head = spotify_model.head(10000)
spot_mod_head.head(2)

Unnamed: 0,track_id,artists,album_name,track_name,popularity,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_genre,Genre_Combined
0,5SuOikwiRyPMVoIQDJUgSV,Gen Hoshino,Comedy,Comedy,0.73,0.686294,0.461,0.090909,0.791392,0.148187,0.032329,1e-06,0.358,0.718593,0.361245,acoustic,"['acoustic', 'j-pop', 'singer-songwriter', 'so..."
1,4qPNDBW1i3p13qLCt0Ki3A,Ben Woodward,Ghost (Acoustic),Ghost - Acoustic,0.55,0.426396,0.166,0.090909,0.597377,0.079067,0.927711,6e-06,0.101,0.268342,0.318397,acoustic,"['acoustic', 'chill']"


In [56]:
recomm_list = []
for i in range(len(spot_mod_head)):
    recomm_i = get_recommendations(spot_mod_head['track_id'][i])
    recomm_list.append(recomm_i['track_name'])

In [57]:
recomm_list1 = pd.DataFrame(recomm_list)
recomm_list1 = recomm_list1.rename(columns={0: 'First Recommendation', 1: 'Second Recommendation', 2: 'Third Recommendation', 3: 'Fourth Recommendation', 4: 'Fifth Recommendation', 5: 'Sixth Recommendation', 6: 'Seventh Recommendation', 7: 'Eighth Recommendation', 8: 'Ninth Recommendation', 9: 'Tenth Recommendation'})

In [58]:
recomm_list1 = recomm_list1.rename(columns={'track_name': 'original_track_name'})
recomm_list1.insert(0, 'TRACK NAME', spot_mod_head['track_name'].values)

In [59]:
recomm_data = recomm_list1[['TRACK NAME','First Recommendation','Second Recommendation','Third Recommendation','Fourth Recommendation','Fifth Recommendation', 'Sixth Recommendation', 'Seventh Recommendation', 'Eighth Recommendation', 'Ninth Recommendation', 'Tenth Recommendation']]

In [60]:
recomm_data.head(10)

Unnamed: 0,TRACK NAME,First Recommendation,Second Recommendation,Third Recommendation,Fourth Recommendation,Fifth Recommendation,Sixth Recommendation,Seventh Recommendation,Eighth Recommendation,Ninth Recommendation,Tenth Recommendation
track_name,Comedy,Mala,Conrad,Red Lights (feat. Wale),Cry 4 U - Sheryl's Authentic Love Festival Mix,Zochrot,Blood,All wit My Hands,"12 Variations in C Major on ""Ah, vous dirai-je...",My Naffew Randy,New Orleans
track_name,Ghost - Acoustic,9 Variations on ‘Lison dormait’ from ‘Julie’ b...,Swimmers,Bring It On Home To Me,Vem Ver (Ao Vivo),"Noëls sur les instruments, H 531, 534: 1. Vous...",Mad,Eu Me Apaixonei Pela Morena,The Bed Is Burning,Fogo (Adubado),Truman Sleeps
track_name,To Begin Again,Perfect,Agarra Mi Mano,Air I Breathe - Sub Focus & Wilkinson,Fool for Your Loving,Aos Pés Da Cruz - Ao Vivo,天真,The Only One,Everything Must Change,Mur,Paradigms Warped
track_name,Can't Help Falling In Love,Heaven,Smoothie Song,Bullseye,"Call It Fate, Call It Karma",Halayo,Solución Suicida,Super Potato's Theme,"Sonate, Op. 5: IV. (Allegro)",Bhaja Govindam,Crowded Room
track_name,Hold On,Ocidentes Acontecem,School Rooftop,Pure Hatred,Bogotá (Instrumental),Runaway Horses - Abridged,Four Dimensions,驗孕的下晝 - Live,Nite Roads,Feel The Rhythm,Opening Remarks and a Request
track_name,Days I Will Remember,Kiss Me In Berlin,Saddle Road,Dedo Nucué,Dram,Prey,The Sound of Silence,Shellstar,Wyjcie Psy,Lost Dog,Rap du Bom Parte II - Mús. Incid.: Odara
track_name,Say Something,Full Tank - Edit,Are You Behind the Shining Star?,The Darkness That You Fear,Atlantis (Super Sped Up),Elusive Reaches,go if you want to,Little Sin,Count On Me (Cuenta Conmigo),Garota Comunista,Just You and Me
track_name,I'm Yours,308,Gårdakvarnar och skit (Live),Another Day - Extended Mix,Laat Maar Waaien,Cria Em Mim - Ao Vivo,To Risk to Live,Club Tonight - Cara B,Black Hole Sun,Hola Noviembre,Night
track_name,Lucky,Running from the Thoughts,Yaz Günü,The Reason - Acoustic,Quebrou a Cara,Rodrigo: Concierto de Aranjuez: II. Adagio (Ex...,Crosstitution,Copperhead Road,8-Track,No Way (feat. Darius Scott),Vento No Litoral
track_name,Hunger,"Le carnaval des animaux, R. 125: IV. Tortues -...","Symphony No. 5 in B flat, K.22: 3. Allegro molto",Verrat,Magma,A Colheita,Dizem - Ao Vivo,Trip to Ireland,The Story,The Resistance,Divenire


### Saving the recommendations to csv

In [61]:
recomm_data.to_csv('content-based recommendations.csv', index=False)

#### split

In [62]:
train_ids, test_ids = train_test_split(spot_mod_head['track_id'], test_size=0.2, random_state=11)

In [63]:
rmse = 0


for track_id in test_ids:
    original_track_name = spot_mod_head[spot_mod_head['track_id'] == track_id]['track_name'].values[0]    
    
    recommended_track_names = get_recommendations(track_id)['track_name'].values
    
    # cosine similarity
    corpus = [original_track_name] + list(recommended_track_names)
    tfidf = TfidfVectorizer().fit_transform(corpus)
    cosine_sim = cosine_similarity(tfidf)[0][1:]
    
    # Add the squared distance to the RMSE
    rmse += (1 - cosine_sim.max()) ** 2

In [64]:
rmse = sqrt(rmse / len(test_ids))
print("RMSE (cosine similarity): {:.2f}".format(rmse))

RMSE (cosine similarity): 0.97


-------------------

# Step 6. INSIGHTS

# Content-based recommender system

#### 1. Looking at the output of content-based recommedation system, the similarity scores for the recommended tracks are relatively low. This indicates that the our model is not be performing good in recommending tracks that are very similar to the input track_id. Furthermore, some of the recommended tracks have genres that are different from the input track.

#### 2. RMSE is a measure of the average deviation of the predicted values from the true values. In our model, the RMSE value of 0.97 indicates that on average, the recommended track names are quite different from the original track names indicating that the model for recommending tracks based on cosine similarity may not be very effective.

#### 3. Also we need to consider that because content-based recommendation is not a supervised learning model, no standard evaluation metric such as accuracy, precision, recall, or F1-score may apply. Alternatively, we may assess the model's performance by manually deciding if the recommendations it offers for a given track are relevant and useful.

# Popularity-based recommender system

#### 1. The output shows the top 10 recommended tracks based on popularity score. The output also reveals that some tracks have numerous genres, indicating that the popularity score is determined based on the general popularity of the tracks rather than simply inside a certain genre.

#### 2. Overall, the popularity-based recommendation system is a straightforward and efficient method of recommending popular tracks. Unfortunately, it does not take into account the users' particular tastes and may not be suited for all users. Also, the algorithm may offer well-known popular tracks that users have already heard, resulting in a less personalized experience.

-------------------