# KNN Recommender 

👉 K-Nearest-Neighbors (KNN) models can be used to model and make predictions, but they can alternatively be utilized to find the closest points in a dataset.  

👨🏻‍🏫 In this recap, we will use a KNN model to create a basic music recommender system.

In [2]:
import pandas as pd

url = 'https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_spotify_data.csv'

# Using pandas, load the data from the provided URL
df = pd.read_csv(url)
df.head()

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,We're For The Dark - Remastered 2010,['Badfinger'],22,0.678,0.559,0.432,0,3,0.0727,-12.696,0.0334,117.674
1,Sixty Years On - Piano Demo,['Elton John'],25,0.456,0.259,0.368,0,6,0.156,-10.692,0.028,143.783
2,Got to Find Another Way,['The Guess Who'],21,0.433,0.833,0.724,0,0,0.17,-9.803,0.0378,84.341
3,Feelin' Alright - Live At The Fillmore East/1970,['Joe Cocker'],22,0.436,0.87,0.914,0,5,0.855,-6.955,0.061,174.005
4,Caravan - Take 7,['Van Morrison'],23,0.669,0.564,0.412,0,7,0.401,-13.095,0.0679,78.716


🎯 Let's find songs that are "similar" to Queen's mythical *Another One Bites the Dust*.

In [3]:
queen_song = df.iloc[4295:4296] # Another One Bites the Dust - Queen

queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


## 1. Calculating the distances

👇 First, train the KNN to have it learn the distances between each observation of the dataset.  
Since we are only concerned with the similarity of features between the songs, it doesn't matter which target the model is fitted on.

In [8]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import MinMaxScaler

In [7]:
model = KNeighborsRegressor(n_neighbors=7)

In [10]:
X = df.drop(columns=['name', 'artists'])
X.head()

Unnamed: 0,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,22,0.678,0.559,0.432,0,3,0.0727,-12.696,0.0334,117.674
1,25,0.456,0.259,0.368,0,6,0.156,-10.692,0.028,143.783
2,21,0.433,0.833,0.724,0,0,0.17,-9.803,0.0378,84.341
3,22,0.436,0.87,0.914,0,5,0.855,-6.955,0.061,174.005
4,23,0.669,0.564,0.412,0,7,0.401,-13.095,0.0679,78.716


In [16]:
transformer = MinMaxScaler()
X_scaled = pd.DataFrame(transformer.fit_transform(X), columns= X.columns)
X_scaled.head()


Unnamed: 0,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,0.255814,0.687627,0.561245,0.432,0.0,0.272727,0.0727,0.774548,0.034901,0.524307
1,0.290698,0.462475,0.26004,0.368,0.0,0.545455,0.156,0.807362,0.029258,0.640639
2,0.244186,0.439148,0.836345,0.724,0.0,0.0,0.17,0.821918,0.039498,0.375789
3,0.255814,0.442191,0.873494,0.914,0.0,0.454545,0.855,0.868551,0.063741,0.775296
4,0.267442,0.678499,0.566265,0.412,0.0,0.636364,0.401,0.768015,0.070951,0.350726


In [18]:
y = df['tempo']

In [19]:
model.fit(X_scaled,y)

## 2. Passing the new point

👇 You can now pass a new point to the KNN model and find its closest point.

In [20]:
queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


In [22]:
X_new = queen_song.drop(columns=['name', 'artists'])

In [23]:
X_new_scaled = pd.DataFrame(transformer.transform(X_new), columns=X_new.columns)
X_new_scaled


Unnamed: 0,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,0.337209,0.541582,0.114458,0.984,0.0,0.363636,0.982,0.899612,0.310345,0.516809


In [33]:
nearest_songs = model.kneighbors(X_new_scaled, n_neighbors=2)
nearest_songs

(array([[0.        , 0.35999219]]), array([[4295, 1164]]))

In [34]:
df.iloc[nearest_songs[1][0][1]]

name            Hi, Hi, Hi - Live / Remastered
artists                              ['Wings']
popularity                                  27
danceability                             0.219
valence                                  0.162
energy                                   0.939
explicit                                     0
key                                          4
liveness                                 0.993
loudness                                -9.275
speechiness                              0.226
tempo                                  140.832
Name: 1164, dtype: object

In [36]:
df.iloc[nearest_songs[1][0][0]]

name            Another One Bites The Dust - Live at Wembley '86
artists                                                ['Queen']
popularity                                                    29
danceability                                               0.534
valence                                                    0.114
energy                                                     0.984
explicit                                                       0
key                                                            4
liveness                                                   0.982
loudness                                                  -5.058
speechiness                                                0.297
tempo                                                    115.991
Name: 4295, dtype: object

## 3. Making a playlist!

👇 Make a playlist with 10 songs based on Queen's *Another One Bites the Dust*, sorted by increasing tempo.

In [None]:
queen_song

In [60]:
base_playlist = model.kneighbors(X_new_scaled, n_neighbors=10)
indexes = base_playlist[1][0]
indexes

array([4295, 1164, 1761, 8607,  704, 1211, 3307, 2233, 2705, 1614])

In [62]:
playlist = df.iloc[indexes]
playlist

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991
1164,"Hi, Hi, Hi - Live / Remastered",['Wings'],27,0.219,0.162,0.939,0,4,0.993,-9.275,0.226,140.832
1761,Liar,['The Damned'],25,0.348,0.203,0.939,0,4,0.838,-11.54,0.0745,107.064
8607,Cheat Codes,['Nitro Fun'],51,0.626,0.146,0.96,0,4,0.894,-4.234,0.0837,128.001
704,"It Ain't Me, Babe - Live at LA Forum, Inglewoo...",['Bob Dylan'],23,0.455,0.308,0.981,0,7,0.995,-6.409,0.183,100.49
1211,A Light In The Black,['Rainbow'],32,0.334,0.0936,0.982,0,4,0.753,-10.19,0.0735,109.414
3307,Graveyard,['Butthole Surfers'],27,0.504,0.135,0.949,0,7,0.913,-8.797,0.0385,98.128
2233,YYZ - Live In Canada / 1980,['Rush'],26,0.334,0.278,0.911,0,4,0.937,-12.017,0.0642,145.905
2705,A Sort Of Homecoming - Live,['U2'],22,0.505,0.363,0.883,0,6,0.97,-6.794,0.0578,125.824
1614,"Clock Strikes Ten - Live at Nippon Budokan, To...",['Cheap Trick'],21,0.332,0.237,0.981,0,2,0.891,-8.838,0.129,161.16


In [63]:
playlist.sort_values(by='tempo', ascending=True, inplace= True)
playlist

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  playlist.sort_values(by='tempo', ascending=True, inplace= True)


Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
3307,Graveyard,['Butthole Surfers'],27,0.504,0.135,0.949,0,7,0.913,-8.797,0.0385,98.128
704,"It Ain't Me, Babe - Live at LA Forum, Inglewoo...",['Bob Dylan'],23,0.455,0.308,0.981,0,7,0.995,-6.409,0.183,100.49
1761,Liar,['The Damned'],25,0.348,0.203,0.939,0,4,0.838,-11.54,0.0745,107.064
1211,A Light In The Black,['Rainbow'],32,0.334,0.0936,0.982,0,4,0.753,-10.19,0.0735,109.414
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991
2705,A Sort Of Homecoming - Live,['U2'],22,0.505,0.363,0.883,0,6,0.97,-6.794,0.0578,125.824
8607,Cheat Codes,['Nitro Fun'],51,0.626,0.146,0.96,0,4,0.894,-4.234,0.0837,128.001
1164,"Hi, Hi, Hi - Live / Remastered",['Wings'],27,0.219,0.162,0.939,0,4,0.993,-9.275,0.226,140.832
2233,YYZ - Live In Canada / 1980,['Rush'],26,0.334,0.278,0.911,0,4,0.937,-12.017,0.0642,145.905
1614,"Clock Strikes Ten - Live at Nippon Budokan, To...",['Cheap Trick'],21,0.332,0.237,0.981,0,2,0.891,-8.838,0.129,161.16
