<a href="https://colab.research.google.com/github/Sanele098/Comp-700---Project/blob/main/Music_Recommender_System_Content_Based.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Content-based filters**

Content-based recommender recommendations can be viewed as a user-specific classification challenge. The user's likes and dislikes are inferred from the song's features using this classifier.

The simple and straightforward approach is called keyword matching.

Keyword mathching in short = the goal is to extract significant keywords from a user's favorite song description, search for those keywords in other song descriptions to estimate similarities, and then recommend those songs to the user based on that.


In [3]:
#Importing significant and required libraries
import numpy as np
import pandas as pd
from typing import List, Dict
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

In [6]:
#Loading the data
from google.colab import files
uploaded = files.upload()

Saving songdata.csv to songdata (1).csv


**Dataset**

This dataset contains artist, song, link text/lyrics for 57650

In [9]:
music_dataset = pd.read_csv('songdata (1).csv')
music_dataset.head(3)

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \nAnd..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \nTouch me gentl..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \nWhy I had t...


**Resampling**

Due to the Dataset being huge, we are going to pick 5000 random sangs (*Resampling dataset*)

In [10]:
music_dataset = music_dataset.sample(n = 5000).drop('link', axis = 1).reset_index(drop = True)

In [11]:
#There is a presence of \n in the text, so we are removing it.
music_dataset['text'] = music_dataset['text'].str.replace(r'\n', '')

**TD-IDF Vectorizer**

After sorting dataset
we use TF-IDF vectorizer which calculates the TF-IDF score for each song lyric, word-by-word. 

In [12]:
tf_idf_score = TfidfVectorizer(analyzer='word', stop_words='english')
lyrics_matrix = tf_idf_score.fit_transform(music_dataset['text'])

**Using the lyrics_matrix to construct recommendation**

Important step is calculating similarities from one lyric to another, this is where cosine similarities come into effect.

We will calculate the cosine similarity of each item with every other item in the dataset. to  achieve this we pass the lyrics_matrix as argument.

In [13]:
cosine_similarities = cosine_similarity(lyrics_matrix)

In [27]:
similarities = {}
for i in range(len(cosine_similarities)):
    similar_indices = cosine_similarities[i].argsort()[:-50:-1]  #This is sorting each element in cosine_similarities and get the indexes of the songs. 
    # Store in similarities each name of the 50 most similar songs.
    similarities[music_dataset['song'].iloc[i]] = [(cosine_similarities[i][x], music_dataset['song'][x], music_dataset['artist'][x]) 
    for x in similar_indices][1:] #Removing the first song, technically it is the same song

**Class "Music_Recommender_Content_Based" to keep things in Object oriented fashion**

After that, all the magic happens. We use similarity scores to obtain the similar songs and provide recommendation.

In [20]:
class Music_Recommender_Content_Based:
    def __init__(self, matrix):
        self.matrix_similar = matrix

    def show_recommendations(self, song, recom_song):
        rec_items = len(recom_song)
        
        print(f'The {rec_items} recommended songs for {song} are:')
        for i in range(rec_items):
            print(f"Number {i+1}:")
            print(f"{recom_song[i][1]} by {recom_song[i][2]} with {round(recom_song[i][0], 3)} similarity score") 
            print("___________________******************___________________")
        
    def recommend123(self, track, num_songs):
        song = track
        number_songs = num_songs             # Get number of songs to recommend
        recom_song = self.matrix_similar[song][:number_songs]
        self.show_recommendations(song=song, recom_song=recom_song)  #Displaying each and every song
        
    def recommend(self, recommendation):
        song = recommendation['song']
        number_songs = recommendation['number_songs']               # Get number of songs to recommend
        recom_song = self.matrix_similar[song][:number_songs]
        self.show_recommendations(song=song, recom_song=recom_song)  #Displaying each and every song

In [22]:
#No, we pick a song from the dataset and make a recommendation.
recommendation = {
    "song": music_dataset['song'].iloc[10],
    #song: "As Good As New"
    "number_songs": 4 
}

In [28]:
recommedations = Music_Recommender_Content_Based(similarities)

In [31]:
recommedations.recommend(recommendation)

The 4 recommended songs for You are:
Number 1:
Long Away by Queen with 0.218 similarity score
___________________******************___________________
Number 2:
Love Will Find A Way by Yes with 0.169 similarity score
___________________******************___________________
Number 3:
A Fallen Star by Conway Twitty with 0.16 similarity score
___________________******************___________________
Number 4:
Hope Alone by Indigo Girls with 0.157 similarity score
___________________******************___________________


In [25]:
recommendation2 = {
    "song": music_dataset['song'].iloc[120],
    "number_songs": 4 
}

In [29]:
recommedations.recommend(recommendation2)

The 4 recommended songs for Beautiful Boys are:
Number 1:
I'm Afraid by Neil Diamond with 0.48 similarity score
___________________******************___________________
Number 2:
Don't Be Afraid by Air Supply with 0.464 similarity score
___________________******************___________________
Number 3:
Hideaway by Erasure with 0.421 similarity score
___________________******************___________________
Number 4:
Not Afraid To Cry by Peter Cetera with 0.417 similarity score
___________________******************___________________
