# Training The Model For Music Recommender System

---



---



In [1]:
# Import the Pandas library for data manipulation and analysis
import pandas as pd

In [2]:
# Read the data from spotify_millsong dataset.
songs_dataset = pd.read_csv("D:\\7th  Semeser\\My 7th Semester\\EC 9640 Artificial Intelligence\\Project\\spotify_millsongdata.csv")

In [3]:
songs_dataset.shape

(57650, 4)

In [4]:
# Look first 5 data in dataset.
songs_dataset.head(10)

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \r\nA..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \r\nTouch me gen..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \r\nWhy I had...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...
5,ABBA,Burning My Bridges,/a/abba/burning+my+bridges_20003011.html,"Well, you hoot and you holler and you make me ..."
6,ABBA,Cassandra,/a/abba/cassandra_20002811.html,Down in the street they're all singing and sho...
7,ABBA,Chiquitita,/a/abba/chiquitita_20002978.html,"Chiquitita, tell me what's wrong \r\nYou're e..."
8,ABBA,Crazy World,/a/abba/crazy+world_20003013.html,I was out with the morning sun \r\nCouldn't s...
9,ABBA,Crying Over You,/a/abba/crying+over+you_20177611.html,I'm waitin' for you baby \r\nI'm sitting all ...


In [5]:
# Make sure, there is no null values in dataset.
songs_dataset.isnull().sum()

artist    0
song      0
link      0
text      0
dtype: int64

According to above output there is no any null values in dataset. So there is no need of handling null (or missing) values in a dataset.

In [6]:
# Remove the 'link' column from the dataset as it is not needed for our operations.
new_songs_dataset = songs_dataset.drop('link', axis=1).reset_index(drop=True)

In [7]:
# Look first 5 data in dataset after removing the 'link' column.
new_songs_dataset.head(10)

Unnamed: 0,artist,song,text
0,ABBA,Ahe's My Kind Of Girl,"Look at her face, it's a wonderful face \r\nA..."
1,ABBA,"Andante, Andante","Take it easy with me, please \r\nTouch me gen..."
2,ABBA,As Good As New,I'll never know why I had to go \r\nWhy I had...
3,ABBA,Bang,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,Making somebody happy is a question of give an...
5,ABBA,Burning My Bridges,"Well, you hoot and you holler and you make me ..."
6,ABBA,Cassandra,Down in the street they're all singing and sho...
7,ABBA,Chiquitita,"Chiquitita, tell me what's wrong \r\nYou're e..."
8,ABBA,Crazy World,I was out with the morning sun \r\nCouldn't s...
9,ABBA,Crying Over You,I'm waitin' for you baby \r\nI'm sitting all ...


In [8]:
# Look lyrics of first song.
new_songs_dataset['text'][0]

"Look at her face, it's a wonderful face  \r\nAnd it means something special to me  \r\nLook at the way that she smiles when she sees me  \r\nHow lucky can one fellow be?  \r\n  \r\nShe's just my kind of girl, she makes me feel fine  \r\nWho could ever believe that she could be mine?  \r\nShe's just my kind of girl, without her I'm blue  \r\nAnd if she ever leaves me what could I do, what could I do?  \r\n  \r\nAnd when we go for a walk in the park  \r\nAnd she holds me and squeezes my hand  \r\nWe'll go on walking for hours and talking  \r\nAbout all the things that we plan  \r\n  \r\nShe's just my kind of girl, she makes me feel fine  \r\nWho could ever believe that she could be mine?  \r\nShe's just my kind of girl, without her I'm blue  \r\nAnd if she ever leaves me what could I do, what could I do?\r\n\r\n"

Based on the provided lyrics, it appears that preprocessing is required for this dataset. The reasons include the presence of newline characters ('\n'), carriage return characters ('\r'), variations in letter case (some letters in uppercase and some in lowercase), and inconsistent spacing between words.

​

In [9]:
# Get first 20000 songs as dataset
first_20000_songs_dataset = new_songs_dataset.head(20000)

### Text Preprocessing

---



---



In [10]:
import re

# Provide replacement for unwanted characters with a single space.

first_20000_songs_dataset['text'] = (
    first_20000_songs_dataset['text']
    .str.lower()  # Convert to lowercase
    .replace(r'[^a-z0-9\s]', ' ', regex=True)  # Replace non-alphanumeric characters with a space
    .replace(r'\n', ' ', regex=True)  # Replace newline characters with a space
    .replace(r'\r', ' ', regex=True)  # Replace carriage return characters with a space
    .replace(r'\s+', ' ', regex=True)  # Replace multiple spaces with a single space
    .str.strip() ) # Remove leading and trailing whitespaces

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  first_20000_songs_dataset['text'] = (


In [11]:
# Look lyrics of first song after character
first_20000_songs_dataset['text'][0]

'look at her face it s a wonderful face and it means something special to me look at the way that she smiles when she sees me how lucky can one fellow be she s just my kind of girl she makes me feel fine who could ever believe that she could be mine she s just my kind of girl without her i m blue and if she ever leaves me what could i do what could i do and when we go for a walk in the park and she holds me and squeezes my hand we ll go on walking for hours and talking about all the things that we plan she s just my kind of girl she makes me feel fine who could ever believe that she could be mine she s just my kind of girl without her i m blue and if she ever leaves me what could i do what could i do'

In [12]:
# Tokenization and stemming of text data.
import nltk
nltk.download('punkt')
from nltk.stem.porter import PorterStemmer as ps

stemmer = ps()

def tokenization(txt):
    tokens = nltk.word_tokenize(txt)
    stemming = [stemmer.stem(w) for w in tokens]
    return " ".join(stemming)

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Safra\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [13]:
# Check tokenization output
tokenization("beauti, beauitful, beauty")

'beauti , beauit , beauti'

In [14]:
# Apply tokenization into dataset.
first_20000_songs_dataset['text'] = first_20000_songs_dataset['text'].apply(lambda x: tokenization(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  first_20000_songs_dataset['text'] = first_20000_songs_dataset['text'].apply(lambda x: tokenization(x))


In [15]:
# Transfer above tokenized text into vector and find minimum angular distance.
from sklearn.feature_extraction.text import TfidfVectorizer as tfv
from sklearn.metrics.pairwise import cosine_similarity as cs

vector = tfv(analyzer='word',stop_words='english')
matrix = vector.fit_transform(first_20000_songs_dataset['text'])
distance_similarity = cs(matrix)

In [16]:
# Check angular distance from 1st song to other songs in dataset.
distance_similarity[0]

array([1.        , 0.03079377, 0.01352288, ..., 0.03345309, 0.03239016,
       0.01688151])

In [17]:
# Take reference number of a song.
first_20000_songs_dataset[first_20000_songs_dataset['song'] == 'Crying Over You'].index[0]

9

### Creating Recommendation System

---



---



In [18]:
# Creating function for auto recommendation of 5 songs.
def recommendation(song_df):
    idx = first_20000_songs_dataset[first_20000_songs_dataset['song'] == song_df].index[0]
    distances = sorted(list(enumerate(distance_similarity[idx])),reverse=True,key=lambda x:x[1])

    songs = []
    for m_id in distances[1:6]:
        songs.append(first_20000_songs_dataset.iloc[m_id[0]].song)

    return songs

In [19]:
recommendation('Crying Over You')

["Cryin'",
 'I Want You To Want Me',
 "Cryin' Time Again",
 'Green Eyes',
 'Crying Time']

In [20]:
recommendation('Crazy World')

['Crazy',
 'When I Close My Eyes',
 'Everytime I Close My Eyes',
 'L.O.V.E',
 'Crazy']

In [21]:
recommendation('As Good As New')

['Auntie -',
 'Ma Baker',
 "Ma' Dear Ma' Dear",
 'Ma',
 "Look What They've Done To My Song"]

In [22]:
recommendation('Burning My Bridges')

['Move Right Out',
 'Do It',
 'Run To You',
 'Tangled And Dark',
 'If It Makes You Feel Good']

In [25]:
# Make avilable to use in web application.
import pickle
pickle.dump(distance_similarity,open('distance_similarity.pkl','wb'))
pickle.dump(first_20000_songs_dataset,open('first_20000_songs_dataset.pkl','wb'))