# Recommender Model

### Problem Statement

Recently, Spotify has decided to try and improve the recommendation system used by their current app by implementing something that is more content-based. As a data scientist I was hired to create a recommender that recommends songs based on Artist and Song Title, additionally I am to include a similar time frame for the release year of the song. Spotify requested that I focus on song features like Danceability, Instrumentalness, Popularity of the song, etc. The goal is to improve the listener experience to make the application more appealing than other competitors like Apple Music and TIDAL by creating a model that analyzes the characteristics of the tracks and creates more niche recommendations that enhance the user experience.

#### Imports and Reading in Data

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity, cosine_distances, pairwise_distances 
from sklearn.feature_extraction.text import TfidfVectorizer
import re
import pickle

In [2]:
songs = pd.read_csv('../data/clean_data.csv', index_col = 0)
print(songs.shape)
songs.head(2)

(8716, 15)


Unnamed: 0,Track,Artist,Genre,Album Title,Album Type,Release Date,Thumbnail,Acousticness,Danceability,Energy,Instrumentalness,Popularity,Speechiness,Track ID,Release Year
0,"Shook Ones, Pt. II",Mobb Deep,"east coast hip hop, hardcore hip hop, hip hop,...",The Infamous,album,1995-04-25,https://i.scdn.co/image/ab67616d00004851a2203f...,0.0146,0.763,0.786,0.0114,0.787879,0.229,33ZXjLCpiINn8eQIDYEPTD,1995
1,Hypnotize - 2014 Remaster,The Notorious B.I.G.,"east coast hip hop, gangster rap, hardcore hip...",Life After Death (2014 Remastered Edition),album,1997-03-04,https://i.scdn.co/image/ab67616d00004851fde79b...,0.145,0.901,0.697,0.0,0.838384,0.28,7KwZNVEaqikRSBSpyhXK2j,1997


#### Creating a Track + Artist Column

In [3]:
#First I will create a column that contains the Track title plus the Artist
songs['Track_Artist'] = songs['Track'] + ' -' + songs['Artist']

#Next I will set the new Column as the Index
songs.set_index('Track_Artist', inplace = True)

#Finally, I wil check the .head(2) just to make sure it worked
songs.head(2)

Unnamed: 0_level_0,Track,Artist,Genre,Album Title,Album Type,Release Date,Thumbnail,Acousticness,Danceability,Energy,Instrumentalness,Popularity,Speechiness,Track ID,Release Year
Track_Artist,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
"Shook Ones, Pt. II -Mobb Deep","Shook Ones, Pt. II",Mobb Deep,"east coast hip hop, hardcore hip hop, hip hop,...",The Infamous,album,1995-04-25,https://i.scdn.co/image/ab67616d00004851a2203f...,0.0146,0.763,0.786,0.0114,0.787879,0.229,33ZXjLCpiINn8eQIDYEPTD,1995
Hypnotize - 2014 Remaster -The Notorious B.I.G.,Hypnotize - 2014 Remaster,The Notorious B.I.G.,"east coast hip hop, gangster rap, hardcore hip...",Life After Death (2014 Remastered Edition),album,1997-03-04,https://i.scdn.co/image/ab67616d00004851fde79b...,0.145,0.901,0.697,0.0,0.838384,0.28,7KwZNVEaqikRSBSpyhXK2j,1997


#### Model Creation

Below we will create two functions which will be used for the text found in the index.
The first function removes special characters from the text. The second function is used to allow the use of special characters including capitalized letters.

In [7]:
def preprocess_input(input_string):
    processed_string = re.sub(r'[^a-zA-Z0-9\s]', '', input_string.lower().strip())
    return processed_string

def remove_special_characters(text):
    pattern = r'[^a-zA-Z0-9\s]'
    return re.sub(pattern, '', text)

In [8]:
def recommend_song(songs, similarity_matrix, song_name, artist_name, category, top_n=10):
    processed_song_name = preprocess_input(song_name)
    processed_artist_name = preprocess_input(artist_name)
    song_indices = np.where(songs.index.str.lower().str.contains(processed_song_name) & songs.index.str.lower().str.contains(processed_artist_name))[0]
    if len(song_indices) > 0:
        song_index = song_indices[0]
        similarity_scores = similarity_matrix[song_index]
        sorted_indices = np.argsort(similarity_scores)[::-1]
        recommended_songs = []
        count = 0
        for index in sorted_indices:
            if count >= top_n:
                break
            song = songs.iloc[index]
            if category.lower() == 'track':
                recommendation = song.name
            elif category.lower() == 'artist':
                recommendation = song['Artist']
            elif category.lower() == 'genre':
                recommendation = song['Genre']
            elif category.lower() == 'album title':
                recommendation = song['Album Title']
            else:
                continue
            if recommendation.lower() != song_name.lower():
                recommended_songs.append(song[['Thumbnail', 'Track', 'Artist', 'Album Title', 'Release Year']])
                count += 1
        if recommended_songs:
            return pd.DataFrame(recommended_songs)
        else:
            return None
    else:
        return None

In [9]:
def create_recommender():
    # Load the data
    songs = pd.read_csv('../data/clean_data.csv', index_col=0)
    songs['Track_Artist'] = songs['Track'] + ' - ' + songs['Artist']
    songs.set_index('Track_Artist', inplace=True)

    # Preprocess the data
    songs.index = songs.index.map(remove_special_characters)
    features = ['Acousticness', 'Danceability', 'Energy', 'Instrumentalness', 'Popularity', 'Speechiness', 'Release Year']
    scaler = MinMaxScaler()
    normalized_features = scaler.fit_transform(songs[features])
    tfidf_vectorizer = TfidfVectorizer()
    genre_matrix = tfidf_vectorizer.fit_transform(songs['Genre'])
    release_year = songs['Release Year'].values.reshape(-1, 1)
    combined_matrix = pd.concat([pd.DataFrame(normalized_features), pd.DataFrame(genre_matrix.toarray()), pd.DataFrame(release_year)], axis=1)
    similarity_matrix = cosine_similarity(combined_matrix, combined_matrix)

    # Create the recommender dictionary
    recommender = {
        'songs': songs,
        'similarity_matrix': similarity_matrix,
        'tfidf_vectorizer': tfidf_vectorizer,
    }

    return recommender

# Create the recommender
recommender = create_recommender()

In [10]:
# Pickle the recommender
with open('rec.pkl', 'wb') as f:
    pickle.dump(recommender, f)

---
---

*NOTE: The reason we are using a TFIDF Vectorizer on the Genre column is to create more accurate recommendations.  Additionally, since Spotify provides a compilation of genres for most songs we can look at the term frequency for the genre person which helps us with the recommendations. We have two song from the same general genre but spotify will have multiple genres for each song.*

---
### Using the Model to Make Recommendations
Below we use the model to make recommendations. The model is set up to make recommendations based on song title and artist, song title only, or artist only.  Additionally, the model works with special characters and without them as well thanks to the regex function used above.

>In reference to what is mentioned above, artists like "A$AP Ferg" can be looked up using the dollar symbol or by typing "AAP Ferg" while excluding the symbol. The capitalization of the letters don't matter just like the example below.

- This [StackOverflow Thread](https://stackoverflow.com/questions/54396405/how-can-i-preprocess-nlp-text-lowercase-remove-special-characters-remove-numb) helped me come up with the functions for the text.
- The [Machine Learning Geek](https://machinelearninggeek.com/spotify-song-recommender-system-in-python/) article on Spotify Recommenders helped me decide on the numeric features to use as well as some aid in creating the function for the recommender.
- I also used this [Towards Data Science](https://towardsdatascience.com/using-cosine-similarity-to-build-a-movie-recommendation-system-ae7f20842599) movie recommender as an idea for my approach to the problem statement.
- Lastly, this [video](https://youtu.be/eyEabQRBMQA) provided some aid when writing out the recommend_song function except I added additional information to get the information I need.