# Music Recommendation Model
- We are going to create two music recommendation models using data from Spotify. The model will take a track and give a few songs it recommends you listen to similar to the one given. We will first use a dataset which analyzes a track based on features like tempo, valence, energy, and danceability, then we will do one dependent on what other songs users listen to, if they listen to the inputted track often as well. Lastly, we will compare the results from both of these models and see subjectively which model gives better recommendations.

- Let's first start with the song feature model using the spotify_dataset.csv in the repository. We must do some data cleaning first then we can use our dataset to create a model

In [17]:
pip install pandas numpy scikit-learn matplotlib





[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [55]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Read and examine the data set
data = pd.read_csv("spotify_dataset.csv")
data.head()

Unnamed: 0.1,Unnamed: 0,track_id,artists,album_name,track_name,popularity,duration_ms,explicit,danceability,energy,...,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,track_genre
0,0,5SuOikwiRyPMVoIQDJUgSV,Gen Hoshino,Comedy,Comedy,73,230666,False,0.676,0.461,...,-6.746,0,0.143,0.0322,1e-06,0.358,0.715,87.917,4,acoustic
1,1,4qPNDBW1i3p13qLCt0Ki3A,Ben Woodward,Ghost (Acoustic),Ghost - Acoustic,55,149610,False,0.42,0.166,...,-17.235,1,0.0763,0.924,6e-06,0.101,0.267,77.489,4,acoustic
2,2,1iJBSr7s7jYXzM8EGcbK5b,Ingrid Michaelson;ZAYN,To Begin Again,To Begin Again,57,210826,False,0.438,0.359,...,-9.734,1,0.0557,0.21,0.0,0.117,0.12,76.332,4,acoustic
3,3,6lfxq3CG4xtTiEg7opyCyx,Kina Grannis,Crazy Rich Asians (Original Motion Picture Sou...,Can't Help Falling In Love,71,201933,False,0.266,0.0596,...,-18.515,1,0.0363,0.905,7.1e-05,0.132,0.143,181.74,3,acoustic
4,4,5vjLSffimiIP26QG5WcN2K,Chord Overstreet,Hold On,Hold On,82,198853,False,0.618,0.443,...,-9.681,1,0.0526,0.469,0.0,0.0829,0.167,119.949,4,acoustic


In [56]:
# Further analyze category information
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 114000 entries, 0 to 113999
Data columns (total 21 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   Unnamed: 0        114000 non-null  int64  
 1   track_id          114000 non-null  object 
 2   artists           113999 non-null  object 
 3   album_name        113999 non-null  object 
 4   track_name        113999 non-null  object 
 5   popularity        114000 non-null  int64  
 6   duration_ms       114000 non-null  int64  
 7   explicit          114000 non-null  bool   
 8   danceability      114000 non-null  float64
 9   energy            114000 non-null  float64
 10  key               114000 non-null  int64  
 11  loudness          114000 non-null  float64
 12  mode              114000 non-null  int64  
 13  speechiness       114000 non-null  float64
 14  acousticness      114000 non-null  float64
 15  instrumentalness  114000 non-null  float64
 16  liveness          11

In [57]:
# It looks like we have some null values in track_id, artists, album_name, and track_name so lets delete them
data.dropna(inplace=True)

# Then lets examine our columns now with these changes
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 113999 entries, 0 to 113999
Data columns (total 21 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   Unnamed: 0        113999 non-null  int64  
 1   track_id          113999 non-null  object 
 2   artists           113999 non-null  object 
 3   album_name        113999 non-null  object 
 4   track_name        113999 non-null  object 
 5   popularity        113999 non-null  int64  
 6   duration_ms       113999 non-null  int64  
 7   explicit          113999 non-null  bool   
 8   danceability      113999 non-null  float64
 9   energy            113999 non-null  float64
 10  key               113999 non-null  int64  
 11  loudness          113999 non-null  float64
 12  mode              113999 non-null  int64  
 13  speechiness       113999 non-null  float64
 14  acousticness      113999 non-null  float64
 15  instrumentalness  113999 non-null  float64
 16  liveness          113999 

In [58]:
# We also have to look out for duplicate rows in this dataset and eliminate them so that we do not get the same song twice in reccomendations
duplicate_rows = data['track_id'].duplicated().sum()
if duplicate_rows != 0:
    data = data.drop_duplicates(subset=['track_id'])

# Return duplicate rows to see how much we have deleted
duplicate_rows

24259

In [87]:
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import NearestNeighbors

# Select relevant audio features to input into our Nearest Neighbors model
feature_columns = ["danceability", "energy", "loudness", "speechiness", "acousticness",
                   "instrumentalness", "liveness", "valence", "tempo"]

# Standardize the numerical features
scaler = StandardScaler()
scaled_features = scaler.fit_transform(data[feature_columns])

# Fit a nearest neighbors model using our scaled features
nn_model = NearestNeighbors(n_neighbors=6, metric="cosine") # 6 neighbors for 5 recommendations and our chosen song
nn_model.fit(scaled_features)

# Create a function to recommend songs based on a given track
def recommend_songs_knn(track_name, top_n=5):
    if track_name not in data["track_name"].values:
        return f"Track '{track_name}' not found in dataset"
    
    # Extract index of the track
    track_index = data[data["track_name"] == track_name].index[0]

    # Get nearest neighbors
    distances, indices = nn_model.kneighbors([scaled_features[track_index]], n_neighbors=top_n * 2)

    # Retrieve recommended track names and artists, avoiding duplicates
    seen_tracks = set()
    recommendations = []

    for idx in indices[0][1:]:  # Skip the first since it's the song itself
        track = data.iloc[idx][["track_name", "artists", "track_genre"]]
        track_tuple = (track["track_name"], track["artists"])

        if track_tuple not in seen_tracks:
            seen_tracks.add(track_tuple)
            recommendations.append(track)

        if len(recommendations) >= top_n:
            break

    return pd.DataFrame(recommendations)

# Example output
recommend_songs_knn("Lithium")

Unnamed: 0,track_name,artists,track_genre
30921,Used To Love (with Dean Lewis),Martin Garrix;Dean Lewis,edm
97668,Se For Arrumar Alguém,Murilo Huff,sertanejo
70053,髮如雪,Jay Chou,mandopop
100238,Game Lover,Los Caligaris,ska
9318,O Processo,Suellen Brum,brazil


- These recommendations aren't too bad based on audio features, but the genres are all over the place. Some people might want to find similar songs within a certain genre, so lets create a function which can filter by a genre.

In [94]:
def filtered_recommend_songs(track_name, genre=None, top_n=5):
    if track_name not in data["track_name"].values:
        return f"Track '{track_name}' not found in dataset"
    
    # Extract index of the track
    track_index = data[data["track_name"] == track_name].index[0]

    # Get nearest neighbors
    distances, indices = nn_model.kneighbors([scaled_features[track_index]], n_neighbors=top_n * 1000)

    # Retrieve recommended track names and artists, avoiding duplicates
    seen_tracks = set()
    recommendations = []

    for idx in indices[0][1:]:  # Skip the first since it's the song itself
        track = data.iloc[idx][["track_name", "artists", "track_genre"]]
        track_tuple = (track["track_name"], track["artists"])

        if genre is None or track["track_genre"] == genre:
                seen_tracks.add(track_tuple)
                recommendations.append(track)

        if len(recommendations) >= top_n:
            break

    return pd.DataFrame(recommendations)

filtered_recommend_songs("Lithium", genre="alt-rock")

Unnamed: 0,track_name,artists,track_genre
2218,Automobile,KALEO,alt-rock
2113,It's Not My Time - Acoustic,3 Doors Down,alt-rock
2705,Where Did You Sleep Last Night,Nirvana,alt-rock
2836,Para No Verme Más - En Vivo,La Vela Puerca,alt-rock
2547,Un Toque Mágico,Tex Tex,alt-rock
