# Self-study try-it activity 8.2: Identifying similar songs using real-world data

In this activity, you will work with an open-source music database that contains many attributes for identifying the type of music, including tempo, genre and energy levels to name a few. Then, you'll apply the KNN algorithm to the data set to recommend songs that are similar to a selected list based on these features.

The data set is from 'Spotify Tracks Dataset'. This data set is a large-scale, structured and feature-rich data set of music tracks from Spotify that is designed primarily for data analysis, machine learning and recommendation system development. It provides both metadata and detailed audio features for each track in a convenient CSV format, making it accessible for a wide range of music data science applications

You can download the 'Spotify Tracks Dataset' [here](https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset).


In [None]:
import pandas as pd

# Load your data set CSV file
data = pd.read_csv('/content/dataset.csv')

# View the first few ten rows
print(data.head())

# Check for missing values
print(data.isnull().sum())


Select the `features` such as `tempo`, `energy` and `danceability`. Based on those `features`, use the KNN classifier to find similar `track_genre`. The input `features` are `['tempo', 'energy', 'danceability']`. Based on these, the filter used is `track_genre`.

In [None]:
from sklearn.preprocessing import StandardScaler

# Select features for the recommendation
features = ['tempo', 'energy', 'danceability']

X = data[features]

# Scale the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# If the filter used here is `track_genre`, then:
genres = data['track_genre']


Using a general KNN recomendation, classify songs to recommend songs that have similar `artists`.

In [None]:
from sklearn.neighbors import NearestNeighbors
import numpy as np

# Fit a KNN model on the scaled features
knn = NearestNeighbors(n_neighbors=6, metric='euclidean')  # 6 because 1 will be the song itself
knn.fit(X_scaled)

def recommend_songs(song_index, n_recommendations=5):
    distances, indices = knn.kneighbors([X_scaled[song_index]], n_neighbors=n_recommendations+1)
    # Exclude the first index, as it is the song itself
    recommended_indices = indices[0][1:]
    recommended_songs = data.iloc[recommended_indices]
    #return recommended_songs[[ 'artists', 'loudness', 'popularity', 'liveness']]
    return recommended_songs[[ 'artists']]
# For example, recommend songs that are similar to the song at index 10
print(recommend_songs(5))


The genre-filtered KNN recommendation filters the data set to only include songs of the same genre before fitting KNN. The output recommends songs that have similar `['album_name','track_genre', 'tempo', 'energy']`.

In [None]:
def recommend_songs_same_genre(song_index, n_recommendations=5):
    song_genre = genres.iloc[song_index]
    # Filter the data set by genre
    genre_filter = data['track_genre'] == song_genre
    X_genre = X_scaled[genre_filter]
    knn_genre = NearestNeighbors(n_neighbors=n_recommendations+1)
    knn_genre.fit(X_genre)

    # Find the relative index of the song in the filtered data set
    filtered_indices = data[genre_filter].index.tolist()
    song_relative_index = filtered_indices.index(song_index)

    distances, indices = knn_genre.kneighbors([X_genre[song_relative_index]], n_neighbors=n_recommendations+1)
    recommended_indices = [filtered_indices[i] for i in indices[0][1:]]
    recommended_songs = data.loc[recommended_indices]
    return recommended_songs[['album_name','track_genre', 'tempo', 'energy']]

print(recommend_songs_same_genre(10))


Now that you have explored the provided music data set, which includes features such as tempo, energy and danceability, you can see that it contains a diverse range of songs that spans multiple genres and audio characteristics. The data set is well suited for music recommendation tasks, clustering and feature-based analysis. Using these features, you can build effective machine learning models, such as KNN, to recommend songs with similar musical characteristics.

### To-do:

1. Based on `popularity`, `liveness` and `tempo`, use the KNN classifier to find similar `track_genre`. Recommend songs that have similar `[ 'album_name', 'track_name', 'duration_ms']`.

2. Filter it as per `track_genre`, and recommend songs that have similar `['album_name','loudness', 'acousticness']`.

In [None]:
from sklearn.preprocessing import StandardScaler

# Select features for the recommendation
# Changed 'liveliness' to 'liveness' based on the traceback and previous successful usage
features = ['popularity', 'liveness', 'tempo']

X = data[features]

# Scale the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# If the filter used here is `track_genre`, then:
genres = data['track_genre']

In [None]:
from sklearn.neighbors import NearestNeighbors
import numpy as np

# Fit the KNN model on the scaled features
knn = NearestNeighbors(n_neighbors=6, metric='euclidean')  # 6 because 1 will be the song itself
knn.fit(X_scaled)

def recommend_songs(song_index, n_recommendations=5):
    distances, indices = knn.kneighbors([X_scaled[song_index]], n_neighbors=n_recommendations+1)
    # Exclude the first index, as it is the song itself
    recommended_indices = indices[0][1:]
    recommended_songs = data.iloc[recommended_indices]
    return recommended_songs[[ 'album_name', 'track_name', 'duration_ms']]

# For example, recommend songs that are similar to the song at index 10
print(recommend_songs(5))

In [None]:
def recommend_songs_same_genre(song_index, n_recommendations=5):
    song_genre = genres.iloc[song_index]
    # Filter the data set by genre
    genre_filter = data['track_genre'] == song_genre
    X_genre = X_scaled[genre_filter]
    knn_genre = NearestNeighbors(n_neighbors=n_recommendations+1)
    knn_genre.fit(X_genre)

    # Find the relative index of the song in the filtered data set
    filtered_indices = data[genre_filter].index.tolist()
    song_relative_index = filtered_indices.index(song_index)

    distances, indices = knn_genre.kneighbors([X_genre[song_relative_index]], n_neighbors=n_recommendations+1)
    recommended_indices = [filtered_indices[i] for i in indices[0][1:]]
    recommended_songs = data.loc[recommended_indices]
    return recommended_songs[['album_name','loudness', 'acousticness']]

print(recommend_songs_same_genre(10))