# **MELOVERSE** -> Music Recommendation System




*   **Type**: Content Based Recommendation System
*   **Dataset**: Spotify Music Dataset
*   **Number of Songs**: 57650
*   **Reduced Dataset Songs**: 10000







### However, we will take **10000** songs *sampled randomly* in order to reduce computation time

In [1]:
# Importing packages
import pandas as pd
import spacy
import sklearn
import pickle

In [2]:
# Load SpaCy English model
nlp = spacy.load("en_core_web_sm")

OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

In [None]:
# Load the music dataset
MusicSet=pd.read_csv('spotify_millsongdata.csv')

In [None]:
MusicSet.head(5)

In [None]:
MusicSet.describe()

In [None]:
# Look for MISSING Values
MusicSet.isnull().sum()

In [None]:
# Remove the Link Column as it is not needed
MusicSet=MusicSet.sample(10000).drop('link',axis=1).reset_index(drop=True)

In [None]:
MusicSet.shape

In [None]:
# Let us look at one sample case -> The Lyrics of one song
MusicSet['text'][0]

In [None]:
# Text Preprocessing
MusicSet['text']=MusicSet['text'].str.lower().replace(r'^\w\s', ' ').replace(r'\n',' ',regex=True)

In [None]:
MusicSet.head(5)

In [None]:
MusicSet.tail(5)

In [None]:
# Lemmatization

def lemmatize(text):
    doc = nlp(text)
    lemmatized_text = [token.lemma_ for token in doc]
    return " ".join(lemmatized_text)

MusicSet['text'].apply(lambda x: lemmatize(x))

In [None]:
"""Vector Semantics -> Using Vectors -> BOW, Word2Vec, TF-IDF"""

# We will use TF-IDF word vectorizer

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

obj=TfidfVectorizer(analyzer='word',stop_words='english')
matrix=obj.fit_transform(MusicSet['text'])

In [None]:
# Measure the cosine similarity for the vectors
similarity=cosine_similarity(matrix)
similarity[0]

In [None]:
MusicSet[MusicSet['song']=="When A Child Is Born"]

In [None]:
# Recommender Function -> Recommend 20 songs
def recommend(song):
    index=MusicSet[MusicSet['song']==song].index[0]
    # Sort the nearest distances
    distances=sorted(list(enumerate(similarity[index])),reverse=True,key=lambda x:x[1])

    #Append the 20 most common songs
    songs=[]
    for dist in distances[1:21]:
        songs.append(MusicSet.iloc[dist[0]].song)

    return songs

In [None]:
# Let us test one
recommend('When A Child Is Born')

In [None]:
# Storing the data in a pickle file
pickle.dump(similarity, open('similarity.pkl','wb'))
pickle.dump(MusicSet, open('MusicSet.pkl','wb'))

### THE END