<h5>LGM_Data_Science_internship   Task_3</h5>
<h1><center> Music Recommendation System </center></h1>
One of the most used machine learning algorithms is recommendation systems. A <b>recommender</b> is a filtering system which aim is to predict a rating or preference a user would give to an item, eg. a film, a product, a song, etc.

<h4><b>=> Content Based Filters</b></h4>
Recommendations done using content-based recommenders can be seen as a user-specific classification problem. This classifier learns the user's likes and dislikes from the features of the song.

The most straightforward approach is keyword matching.

In a few words, the idea behind is to extract meaningful keywords present in a song description a user likes, search for the keywords in other song descriptions to estimate similarities among them, and based on that, recommend those songs to the user.

How is this performed?

In our case, because we are working with text and words, Term Frequency-Inverse Document Frequency (TF-IDF) can be used for this matching process.

We'll go through the steps for generating a content-based music recommender system.

<h1></h1>
<h4><center>Importing required Libraries</center></h4>

In [24]:
import numpy as np
import pandas as pd
from typing import List, Dict
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


<h1></h1>
<h4><center>Dataset Reading and pre-processing</center></h4>

In [25]:
songs = pd.read_csv('https://raw.githubusercontent.com/GrayRobert/big-data-project/master/src/main/resources/temp/data/songdata.csv')

In [26]:
songs.head()

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \nAnd..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \nTouch me gentl..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \nWhy I had t...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...


<h1></h1>
<h4><center>Transforming the Data as per need</center></h4>

In [27]:
#Because of the dataset being so big, we are going to resample only 5000 random songs.

songs = songs.sample(n=5000).drop('link', axis=1).reset_index(drop=True)

In [28]:
# We can notice also the presence of \n in the text, so we are going to remove it.

songs['text'] = songs['text'].str.replace(r'\n', '')

  songs['text'] = songs['text'].str.replace(r'\n', '')


In [29]:
tfidf = TfidfVectorizer(analyzer='word', stop_words='english')

In [30]:
lyrics_matrix = tfidf.fit_transform(songs['text'])

<h1></h1>
<h3><center>Analyzing the Songs</center></h3>
<h5>    Trying to predict the recommend song       </h5>

In [31]:
#How do we use this matrix for a recommendation?
#We now need to calculate the similarity of one lyric to another. We are going to use cosine similarity.

cosine_similarities = cosine_similarity(lyrics_matrix) 

In [32]:
similarities = {}

In [33]:
for i in range(len(cosine_similarities)):
    # Now we'll sort each element in cosine_similarities and get the indexes of the songs. 
    similar_indices = cosine_similarities[i].argsort()[:-50:-1] 
    # After that, we'll store in similarities each name of the 50 most similar songs.
    # Except the first one that is the same song.
    similarities[songs['song'].iloc[i]] = [(cosine_similarities[i][x], songs['song'][x], songs['artist'][x]) for x in similar_indices][1:]

In [34]:
class ContentBasedRecommender:
    def __init__(self, matrix):
        self.matrix_similar = matrix

    def _print_message(self, song, recom_song):
        rec_items = len(recom_song)
        
        print(f'The {rec_items} recommended songs for {song} are:')
        for i in range(rec_items):
            print(f"Number {i+1}:")
            print(f"{recom_song[i][1]} by {recom_song[i][2]} with {round(recom_song[i][0], 3)} similarity score") 
            print("--------------------")
        
    def recommend(self, recommendation):
        # Get song to find recommendations for
        song = recommendation['song']
        # Get number of songs to recommend
        number_songs = recommendation['number_songs']
        # Get the number of songs most similars from matrix similarities
        recom_song = self.matrix_similar[song][:number_songs]
        # print each item
        self._print_message(song=song, recom_song=recom_song)

In [35]:
#nitiating the class 
recommedations = ContentBasedRecommender(similarities)

In [36]:
#Predicting the songs
recommendation = {
    "song": songs['song'].iloc[20],
    "number_songs": 6 
}

recommedations.recommend(recommendation)

The 6 recommended songs for Can't Lie To My Heart are:
Number 1:
I Want You So Bad by Gloria Estefan with 0.255 similarity score
--------------------
Number 2:
Dreamlover by Mariah Carey with 0.252 similarity score
--------------------
Number 3:
I Don't Love Here Anymore by Cheap Trick with 0.249 similarity score
--------------------
Number 4:
Tell Me How You Feel by Michael Bolton with 0.245 similarity score
--------------------
Number 5:
I Get Evil by Pat Benatar with 0.245 similarity score
--------------------
Number 6:
Believe by Cher with 0.244 similarity score
--------------------


In [43]:
recommedation = {
    "song": songs['song'].iloc[654],
    "number_songs": 7
}

recommedations.recommend(recommedation)

The 7 recommended songs for Plain Gold Ring are:
Number 1:
Fool's Gold by Procol Harum with 0.229 similarity score
--------------------
Number 2:
A Poor Man's Roses by Patsy Cline with 0.177 similarity score
--------------------
Number 3:
Almost Gone by Mary Black with 0.152 similarity score
--------------------
Number 4:
Soul Mistake by INXS with 0.152 similarity score
--------------------
Number 5:
Power Of Two by Indigo Girls with 0.149 similarity score
--------------------
Number 6:
Angel by Yonder Mountain String Band with 0.147 similarity score
--------------------
Number 7:
Lords Of The Ring by Styx with 0.136 similarity score
--------------------
