# KNN Recommender 

👉 K-Nearest-Neighbors (KNN) can be used to model and make predictions, but they can alternatively be utilized to find the closest points in a dataset.  

👨🏻‍🏫 In this recap, we will use a KNN model to create a basic music recommender system.

In [16]:
import pandas as pd

url = 'https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_spotify_data.csv'

# Using pandas, load the data from the provided URL
# $CHALLENGIFY_BEGIN
df = pd.read_csv(url)

df.head()
# $CHALLENGIFY_END

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,We're For The Dark - Remastered 2010,['Badfinger'],22,0.678,0.559,0.432,0,3,0.0727,-12.696,0.0334,117.674
1,Sixty Years On - Piano Demo,['Elton John'],25,0.456,0.259,0.368,0,6,0.156,-10.692,0.028,143.783
2,Got to Find Another Way,['The Guess Who'],21,0.433,0.833,0.724,0,0,0.17,-9.803,0.0378,84.341
3,Feelin' Alright - Live At The Fillmore East/1970,['Joe Cocker'],22,0.436,0.87,0.914,0,5,0.855,-6.955,0.061,174.005
4,Caravan - Take 7,['Van Morrison'],23,0.669,0.564,0.412,0,7,0.401,-13.095,0.0679,78.716


In [34]:
df[df['artists'] == "['Tom Zé']"]

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
47,Lá Vem a Onda,['Tom Zé'],24,0.494,0.832,0.528,0,2,0.126,-11.866,0.0436,132.887


🎯 Let's find songs that are "similar" to Queen's mythical *Another one bites the dust*.

In [37]:
queen_song = df.iloc[47:48] # Another one bites the dust - Queen

queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
47,Lá Vem a Onda,['Tom Zé'],24,0.494,0.832,0.528,0,2,0.126,-11.866,0.0436,132.887


## 1. Calculating the distances

👇 First, train the KNN to have it learn the distances between each observation of the dataset.  
Since we are only concerned by the similarity of features between the songs, it doesn't matter which target it is fitted to.

In [38]:
X = df.drop(columns =['name','artists'])
X.head(2)

Unnamed: 0,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,22,0.678,0.559,0.432,0,3,0.0727,-12.696,0.0334,117.674
1,25,0.456,0.259,0.368,0,6,0.156,-10.692,0.028,143.783


In [39]:
from sklearn.neighbors import NearestNeighbors

neigh = NearestNeighbors(n_neighbors=10)

neigh.fit(X)

Check out the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor.kneighbors)

## 2. Passing the new point

👇 You can now pass a new point to the KNN model and find its closest point.

In [40]:
queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
47,Lá Vem a Onda,['Tom Zé'],24,0.494,0.832,0.528,0,2,0.126,-11.866,0.0436,132.887


In [42]:
queen_song = queen_song.drop(columns=['name','artists'])
queen_song

Unnamed: 0,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
47,24,0.494,0.832,0.528,0,2,0.126,-11.866,0.0436,132.887


In [43]:
neigh.kneighbors(queen_song)

(array([[0.        , 1.26375425, 1.56498378, 2.29900189, 2.31585057,
         2.32889223, 2.74006332, 2.81587941, 2.88128038, 2.9511916 ]]),
 array([[  47,  608, 1535, 3915, 3017, 1217, 4230,  562, 1794, 2472]]))

In [47]:
df.iloc[3915]

name            Bell Bottom Blues - Remastered
artists                ['Derek & The Dominos']
popularity                                  26
danceability                             0.484
valence                                  0.657
energy                                    0.44
explicit                                     0
key                                          2
liveness                                 0.111
loudness                               -12.201
speechiness                             0.0273
tempo                                  133.952
Name: 3915, dtype: object

## 3. Making a playlist!

👇 Make a playlist with 10 songs based on Queen's *Another one bites the dust*, sorted by increasing tempo.

In [48]:
queen_song

Unnamed: 0,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
47,24,0.494,0.832,0.528,0,2,0.126,-11.866,0.0436,132.887


In [50]:
df.iloc[neigh.kneighbors(queen_song)[1][0]].sort_values(by='energy')

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
608,Petite Fleur,['New Orleans Heritage Hall Jazz Band'],24,0.522,0.53,0.287,0,2,0.17,-10.933,0.0395,132.129
2472,Rally Round Jah Throne,['Bad Brains'],24,0.813,0.635,0.406,0,1,0.305,-14.504,0.0658,133.637
3017,Coming Out Of The Dark,['Gloria Estefan'],24,0.64,0.493,0.422,0,4,0.149,-11.739,0.0464,131.792
3915,Bell Bottom Blues - Remastered,['Derek & The Dominos'],26,0.484,0.657,0.44,0,2,0.111,-12.201,0.0273,133.952
47,Lá Vem a Onda,['Tom Zé'],24,0.494,0.832,0.528,0,2,0.126,-11.866,0.0436,132.887
1535,Cygnus X-1 Book II: Hemispheres,['Rush'],23,0.362,0.206,0.531,0,2,0.275,-12.872,0.0418,132.962
1794,Never Say Never,['Styx'],24,0.565,0.646,0.654,0,4,0.104,-10.733,0.0278,131.166
562,Make up Your Mind,['The J. Geils Band'],23,0.651,0.949,0.683,0,0,0.277,-10.22,0.0464,132.52
1217,Secrets,['The Runaways'],24,0.62,0.848,0.713,0,0,0.107,-10.813,0.0396,132.373
4230,The Greeting Song,['Red Hot Chili Peppers'],26,0.459,0.307,0.962,0,2,0.188,-10.653,0.0485,134.139
