# Music recommender system

A **recommender system** is a type of information filtering system that suggests items to users based on their preferences and behavior. It uses data analysis techniques to understand a user's behavior and to predict what they might like or be interested in.

There are two main types of recommender systems:

**Content-based recommender systems**: This type of system analyzes the characteristics of the items a user has interacted with, such as the content of a movie or the genre of a book, to recommend similar items.

**Collaborative filtering recommender systems**: This type of system analyzes the behavior of multiple users to recommend items that are popular among users with similar interests.

Recommender systems are used in a variety of applications, such as e-commerce websites, social media platforms, and music and video streaming services, to personalize the user experience and improve engagement. They can also help businesses increase revenue by suggesting additional products or services that a user might be interested in.

# 1) Content-based recommender systems

Recommendations done using content-based recommenders can be seen as a user-specific classification problem. This classifier learns the user's likes and dislikes from the features of the song.

The most straightforward approach is **keyword matching**.

In our case, because we are working with text and words, **Term Frequency-Inverse Document Frequency (TF-IDF)** can be used for this matching process.

**(TF-IDF)** can be defined as the calculation of how relevant a word in a series or corpus is to a text. The meaning increases proportionally to the number of times in the text a word appears but is compensated by the word frequency in the corpus (data-set).

#### Importing libraries

In [75]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

In [2]:
#from typing import List, Dict

In [76]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

#### Loading dataset

We collected the dataset from following github repository:
https://github.com/ugis22/music_recommender/blob/master/content%20based%20recommedation%20system/songdata.csv

In [77]:
data =  pd.read_csv('songdata.csv')

In [78]:
data.head(20)

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \nAnd..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \nTouch me gentl..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \nWhy I had t...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...
5,ABBA,Burning My Bridges,/a/abba/burning+my+bridges_20003011.html,"Well, you hoot and you holler and you make me ..."
6,ABBA,Cassandra,/a/abba/cassandra_20002811.html,Down in the street they're all singing and sho...
7,ABBA,Chiquitita,/a/abba/chiquitita_20002978.html,"Chiquitita, tell me what's wrong \nYou're enc..."
8,ABBA,Crazy World,/a/abba/crazy+world_20003013.html,I was out with the morning sun \nCouldn't sle...
9,ABBA,Crying Over You,/a/abba/crying+over+you_20177611.html,I'm waitin' for you baby \nI'm sitting all al...


In [79]:
data.describe()

Unnamed: 0,artist,song,link,text
count,57650,57650,57650,57650
unique,643,44824,57650,57494
top,Donna Summer,Have Yourself A Merry Little Christmas,/e/eurythmics/royal+infirmary_20584934.html,Chestnuts roasting on an open fire \nJack Fro...
freq,191,35,1,6


In [80]:
data.shape

(57650, 4)

This dataset contains name, artist, and lyrics for 57650 songs in English.

Because of the dataset being so big, we are going to resample only 5000 random songs.

In [81]:
data = data.sample(n=5000).drop('link', axis=1).reset_index(drop=True)

#### Data pre-processing

In [82]:
data.isnull().sum()

artist    0
song      0
text      0
dtype: int64

No null values found.

In [83]:
data.dtypes

artist    object
song      object
text      object
dtype: object

Data types are appropriate.

In [84]:
data.duplicated().sum()

0

No duplicates.

We can notice also the presence of `\n` in the text, so we are going to remove it.

In [85]:
data['text'] = data['text'].str.replace(r'\n', '')

#### Data Preparation

We use TF-IDF vectorizer that calculates the TF-IDF score for each song lyric, word-by-word.

In [86]:
tfidf = TfidfVectorizer(analyzer='word', stop_words='english')

In [87]:
lyrics_matrix = tfidf.fit_transform(data['text'])

*How do we use this matrix for a recommendation?*

We now need to calculate the similarity of one lyric to another. We are going to use **cosine similarity**.

We want to calculate the cosine similarity of each item with every other item in the dataset. So we just pass the lyrics_matrix as argument.

In [88]:
cosine_similarities = cosine_similarity(lyrics_matrix) 

Once we get the similarities, we'll store in a dictionary the names of the 50 most similar songs for each song in our dataset.

In [89]:
similarities = {}

In [90]:

for i in range(len(cosine_similarities)):
    # Now we'll sort each element in cosine_similarities and get the indexes of the songs. 
    similar_indices = cosine_similarities[i].argsort()[:-50:-1] 
    # After that, we'll store in similarities each name of the 50 most similar songs.
    # Except the first one that is the same song.
    similarities[data['song'].iloc[i]] = [(cosine_similarities[i][x], data['song'][x], data['artist'][x]) for x in similar_indices][1:]

We can use that similarity scores to access the most similar items and give a recommendation.

For that, we'll define our Content based recommender class.

In [91]:
class ContentBasedRecommender:
    def __init__(self, matrix):
        self.matrix_similar = matrix

    def _print_message(self, song, recom_song):
        rec_items = len(recom_song)
        
        print(f'The {rec_items} recommended songs for {song} are:')
        for i in range(rec_items):
            print(f"Number {i+1}:")
            print(f"{recom_song[i][1]} by {recom_song[i][2]} with {round(recom_song[i][0], 3)} similarity score") 
            print("--------------------")
        
    def recommend(self, recommendation):
        # Get song to find recommendations for
        song = recommendation['song']
        # Get number of songs to recommend
        number_songs = recommendation['number_songs']
        # Get the number of songs most similars from matrix similarities
        recom_song = self.matrix_similar[song][:number_songs]
        # print each item
        self._print_message(song=song, recom_song=recom_song)

Now, instantiate class.

In [92]:
recommedations = ContentBasedRecommender(similarities)

Then, we are ready to pick a song from the dataset and make a recommendation.

In [93]:
recommendation = {
    "song": data['song'].iloc[10],
    "number_songs": 4 
}

In [94]:
recommedations.recommend(recommendation)

The 4 recommended songs for I've Got A Crush On You are:
Number 1:
Now That I Have You by Luther Vandross with 0.164 similarity score
--------------------
Number 2:
Planet Telex by Radiohead with 0.158 similarity score
--------------------
Number 3:
To Share Our Love by Moody Blues with 0.156 similarity score
--------------------
Number 4:
Crush by Dave Matthews Band with 0.135 similarity score
--------------------


Let's try a second one.

In [95]:
recommendation2 = {
    "song": data['song'].iloc[120],
    "number_songs": 4 
}

In [25]:
recommedations.recommend(recommendation2)

The 4 recommended songs for Atin Cu Pung Singsing are:
Number 1:
Cinderella by Lionel Richie with 0.092 similarity score
--------------------
Number 2:
Gonna Love Ya by Reba Mcentire with 0.074 similarity score
--------------------
Number 3:
Arthur by Kinks with 0.068 similarity score
--------------------
Number 4:
I'm Ready For You by Drake with 0.061 similarity score
--------------------


#### Disadvantages of content-based recommender system

These algorithms try to recommend items similar to those that a user liked in the past. It does not rely on a user sign-in mechanism to generate this often temporary profile. As a result, the surprisal element is missing in the recommendation as same kind of content gets recommended.

### References

* https://www.kaggle.com/competitions/kkbox-music-recommendation-challenge/overview

* https://en.wikipedia.org/wiki/Recommender_system

* https://www.kaggle.com/code/infinator/music-recommendation-system

* https://github.com/ugis22/music_recommender