<a href="https://colab.research.google.com/github/Arghya-Bandyopadhyay30/Music-Recommendation-System/blob/main/Content_Based_Recommendation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Music Recommender System**

**There are two main types of recommender systems:**
1.   Content-Based Filters
2.   Collaborative Filters

In [72]:
#Importing Basic Required Libraries
import numpy as np
import pandas as pd

In [88]:
#The dataset contains name, artist and lyrics for 57650 songs in English. 
data = pd.read_csv("songdata.csv")

data.head()

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \nAnd..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \nTouch me gentl..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \nWhy I had t...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...


In [80]:
#Number of Attributes/ Characteristics = 4
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57650 entries, 0 to 57649
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   artist  57650 non-null  object
 1   song    57650 non-null  object
 2   link    57650 non-null  object
 3   text    57650 non-null  object
dtypes: object(4)
memory usage: 1.8+ MB


In [81]:
#The dataset does not contain any missing value. 
data.isnull().sum()

artist    0
song      0
link      0
text      0
dtype: int64

In [90]:
#57650 requires a lot of RAM so let as work with 3000 songs
data = data.sample(n=3000).drop('link', axis=1).reset_index(drop=True)

#Remove the \n in the 'text' attribute
data['text'] = data['text'].str.replace(r'\n', '')
data

  """


Unnamed: 0,artist,song,text
0,Roxy Music,To Turn You On,I could show you in a word If I wanted to A ...
1,Gucci Mane,Mi Casa Tu Casa,"Chorus: Coca, coca, coca cola ""yayo bought..."
2,Randy Travis,Love Lifted Me,I was sinkin' deep in sin Far from the peacef...
3,Enya,The Spirit Of Christmas Past,When tears are in your eyes It's time to look...
4,Weezer,Odd Couple,"I got a PC, you got a Mac I'm giving you flak..."
...,...,...,...
2995,Bruno Mars,Gold,"Chorus: There's no light in this room, It's ..."
2996,Donna Summer,Sweet Emotion,Seems every star is out tonight to light our l...
2997,Scorpions,Humanity,Humanity Auf wiedersehen It's time to say go...
2998,Hillsong,Am I To Believe?,Am I to believe.... That a God would give his...


### **TF-IDF a technique used for information retrieval.**

***Algorithm:***
1.   Find the score for TF and IDF for each term in the document
2.   Product of TF and IDF
3.   Product is the TF-IDF weight of that term

In [91]:
#Using TF-IDF vectorizerthat calculates the TF-IDF score for each 
#song lyric, word-by-word
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(analyzer='word', stop_words='english')

In [97]:
#Lyric Matrix containing each word with its TF_IDF score
lyrics_matrix = tfidf.fit_transform(data['text'])

In [98]:
from sklearn.metrics.pairwise import cosine_similarity

# Calculate cosine similarity of each item with every other item in the dataset
cosineMatrix = cosine_similarity(lyrics_matrix) 

In [99]:
#50 similar songs dataset
similarities = {}

In [107]:
for i in range(len(cosineMatrix)):

    #Sort in ascending order for each element in cosineMatrix and 
    #get the last 50 indexes of the songs (most similar)
    similarIndices = cosineMatrix[i].argsort()[:-50:-1] 

    #For each song we store the amost 50 similar songs (excluding itself 
    #thus [1:]) with the details:
    #1. Value of the Cosine Matrix 
    #2. Name of the Song
    #3. Name of the artist
    
    similarities[data['song'].iloc[i]] = [(cosineMatrix[i][x], data['song'][x], data['artist'][x]) for x in similarIndices][1:]

In [162]:
class ContentBasedRecommender:
  #Parameterized Constructor
  def __init__(self, matrix):
    self.similarMatrix = matrix

  def _print_message(self, song, recomSong, recomNumber):
    print(f'The {recomNumber} recommended songs for \"{song}\" are:\n')
    
    for i in range(recomNumber):
      print(f"Song {i+1}: {recomSong[i][1]}")
      print(f"Artist: {recomSong[i][2]}")
      print(f"Similarity Score: {round(recomSong[i][0], 3)}")
      print("--------------------\n")

  def recommend(self, recommendation):
    #Retrieving the Name of Song
    songName = recommendation['songName']

    #Number of recommend song
    numberSongs = recommendation['numberSongs']

    #Get the number of required similar songs from 'similarities' dataset
    recomendedSongs = self.similarMatrix[songName][:numberSongs]
    # print each item
    self._print_message(songName, recomendedSongs, numberSongs)

In [163]:
#Initiate Class
recommedations = ContentBasedRecommender(similarities)

In [164]:
#Some song name:
#1. To Turn You On
#2. Spinal Remains
#3. Letter To My Daughters

name = input("Enter your Song Name: ")
num = int(input("Number of Recommended Songs: "))

Enter your Song Name: Spinal Remains
Number of Recommended Songs: 4


In [165]:
recommendation = {
    "songName": name,
    "numberSongs": num
}

recommedations.recommend(recommendation)

The 4 recommended songs for "Spinal Remains" are:

Song 1: Not For Free
Artist: Face To Face
Similarity Score: 0.291
--------------------

Song 2: Sensible
Artist: Face To Face
Similarity Score: 0.194
--------------------

Song 3: You Really Got Me
Artist: Van Halen
Similarity Score: 0.189
--------------------

Song 4: Death Valley Lives
Artist: Jimmy Buffett
Similarity Score: 0.18
--------------------

