![](../images/bunker_studer.jpeg)
<br>
Photo: [*The Bunker*](https://www.thebunkerstudio.com/)

# Who Do You Sound Like?
### Notebook 4: Recommender System
#### Adam Zucker
---

## Contents
- **Section 1:** Package and data imports, preprocessing
- **Section 2:** Vector generation and dataframe creation
- **Section 3:** Recommender algorithm

---
---
### Section 1
#### Imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp

from sklearn.metrics.pairwise import pairwise_distances, cosine_distances, cosine_similarity, manhattan_distances

import librosa as lib
import librosa.display as libd

In [2]:
# Importing cleaned Spotify dataframe
df = pd.read_csv('../data_clean/spotify_kg_master.csv')
df.head()

Unnamed: 0,name,artists,tempo,key,mode,full_key,A minor,A# major,A# minor,B major,...,energy,instrumentalness,speechiness,acousticness,danceability,valence,popularity,liveness,year,id
0,Thunderstruck,['AC/DC'],133.5,4,1,E major,0,0,0,0,...,0.89,0.0117,0.0364,0.000147,0.502,0.259,83,0.217,1990,57bgtoPSgt236HzfBOd8kj
1,The Gift of Love,['Bette Midler'],157.5,8,1,G# major,0,0,0,0,...,0.467,0.0,0.0287,0.359,0.486,0.286,38,0.11,1990,7FUc1xVSKvABmVwI6kS5Y4
2,Thelma - Bonus Track,['Paul Simon'],94.0,5,1,F major,0,0,0,0,...,0.529,0.0845,0.077,0.872,0.71,0.882,29,0.093,1990,7pcEC5r1jVqWGRypo9D7f7
3,How I Need You,['Bad Boys Blue'],123.1,9,0,A minor,1,0,0,0,...,0.67,0.00347,0.0398,0.0724,0.652,0.963,44,0.119,1990,1yq8h4zD0IDT5X1YTaEwZh
4,Nunca Dudes De Mi,['El Golpe'],143.1,4,1,E major,0,0,0,0,...,0.49,0.0,0.0295,0.151,0.476,0.514,31,0.305,1990,5kNYkLFs3WFFgE6qhfWDEm


---

**BELOW:** Preprocessing and formatting the Spotify dataframe for conversion to a sparse matrix.

In [3]:
# Creating a copy of the dataframe to use for the recommender system
temp_df = df.copy()
print(temp_df.shape)
print(df.shape)

(56798, 41)
(56798, 41)


In [4]:
temp_df.columns

Index(['name', 'artists', 'tempo', 'key', 'mode', 'full_key', 'A minor',
       'A# major', 'A# minor', 'B major', 'B minor', 'C major', 'C minor',
       'C# major', 'C# minor', 'D major', 'D minor', 'D# major', 'D# minor',
       'E major', 'E minor', 'F major', 'F minor', 'F# major', 'F# minor',
       'G major', 'G minor', 'G# major', 'G# minor', 'loudness', 'duration_s',
       'energy', 'instrumentalness', 'speechiness', 'acousticness',
       'danceability', 'valence', 'popularity', 'liveness', 'year', 'id'],
      dtype='object')

In [5]:
# Combining 'name' and 'artists' features to use as indices
temp_df['name_and_artists'] = temp_df['name'] + ' - ' + temp_df['artists']

In [6]:
# Dropping features that won't be converted to sparse for similarity comparisons
temp_df.drop(columns=['name', 'artists', 'key', 'mode', 'full_key', 'popularity', 'year', 'id'], inplace=True)

In [7]:
temp_df.columns

Index(['tempo', 'A minor', 'A# major', 'A# minor', 'B major', 'B minor',
       'C major', 'C minor', 'C# major', 'C# minor', 'D major', 'D minor',
       'D# major', 'D# minor', 'E major', 'E minor', 'F major', 'F minor',
       'F# major', 'F# minor', 'G major', 'G minor', 'G# major', 'G# minor',
       'loudness', 'duration_s', 'energy', 'instrumentalness', 'speechiness',
       'acousticness', 'danceability', 'valence', 'liveness',
       'name_and_artists'],
      dtype='object')

In [8]:
# Reordering columns
temp_df = temp_df[['name_and_artists', 'tempo', 'A minor', 'A# major', 'A# minor', 'B major', 'B minor', 'C major', 
                   'C minor', 'C# major', 'C# minor', 'D major', 'D minor', 'D# major', 'D# minor', 'E major', 'E minor', 
                   'F major', 'F minor', 'F# major', 'F# minor', 'G major', 'G minor', 'G# major', 'G# minor', 'loudness', 
                   'duration_s', 'energy', 'instrumentalness', 'speechiness', 'acousticness', 'danceability', 'valence', 
                   'liveness']]

---
---
### Section 2
#### Vector Generation

In [10]:
# Creating a pivot table of my features
#p_table = temp_df.pivot_table(index='names_and_artists', columns=temp_df.drop(columns='name_and_artists').columns)