# Music recommender system

This notebook is focused on content-based filtering.
  
> Content-based filters predicts what a user likes based on what that particular user has liked in the past. On the other hand, collaborative-based filters predict what a user like based on what other users, that are similar to that particular user, have liked.

First, the classifier learns the user's likes and dislikes from features of certain songs.

The most straightforward approach is **keyword matching**.

In a few words, the idea behind is to extract meaningful keywords present in a song description a user likes, search for the keywords in other song descriptions to estimate similarities among them, and based on that, recommend those songs to the user.

Given we are working with text and words, **Term Frequency-Inverse Document Frequency (TF-IDF)** can be used for this matching process.
  


### Importing required libraries

First, we'll import all the required libraries.

In [1]:
import numpy as np
import pandas as pd
from typing import List, Dict
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

### Dataset

In [10]:
songs = pd.read_csv('songdata.csv')

songs = songs.sample(n=5000).drop('link', axis=1).reset_index(drop=True)

songs['text'] = songs['text'].str.replace(r'\n', '')


  songs['text'] = songs['text'].str.replace(r'\n', '')


After that, we use TF-IDF vectorizerthat calculates the TF-IDF score for each song lyric, word-by-word. 

Here, we pay particular attention to the arguments we can specify.

In [12]:
tfidf = TfidfVectorizer(analyzer='word', stop_words='english')
lyrics_matrix = tfidf.fit_transform(songs['text'])

cosine_similarities = cosine_similarity(lyrics_matrix) 
similarities = {}


In [13]:
for i in range(len(cosine_similarities)):
    # Now we'll sort each element in cosine_similarities and get the indexes of the songs. 
    similar_indices = cosine_similarities[i].argsort()[:-50:-1] 
    similarities[songs['song'].iloc[i]] = [(cosine_similarities[i][x], songs['song'][x], songs['artist'][x]) for x in similar_indices][1:]

After that, all the magic happens. We can use that similarity scores to access the most similar items and give a recommendation.

For that, we'll define our Content based recommender class.

In [14]:
class ContentBasedRecommender:
    def __init__(self, matrix):
        self.matrix_similar = matrix

    def _print_message(self, song, recom_song):
        rec_items = len(recom_song)
        
        print(f'The {rec_items} recommended songs for {song} are:')
        for i in range(rec_items):
            print(f"Number {i+1}:")
            print(f"{recom_song[i][1]} by {recom_song[i][2]} with {round(recom_song[i][0], 3)} similarity score") 
            print("--------------------")
        
    def recommend(self, recommendation):
        # Get song to find recommendations for
        song = recommendation['song']
        # Get number of songs to recommend
        number_songs = recommendation['number_songs']
        # Get the number of songs most similars from matrix similarities
        recom_song = self.matrix_similar[song][:number_songs]
        # print each item
        self._print_message(song=song, recom_song=recom_song)

Now, instantiate class and test

In [15]:
recommedations = ContentBasedRecommender(similarities)
recommendation = {
    "song": songs['song'].iloc[10],
    "number_songs": 4 
}

In [18]:
recommedations.recommend(recommendation)

The 4 recommended songs for I'm Wrong But You Ain't Right are:
Number 1:
Everything Ain't Right by George Jones with 0.403 similarity score
--------------------
Number 2:
Revolution by Rascal Flatts with 0.286 similarity score
--------------------
Number 3:
Ain't That Love by Ray Charles with 0.28 similarity score
--------------------
Number 4:
The Right One by Aiza Seguerra with 0.271 similarity score
--------------------


And we can pick another random song and recommend again: