# Music Recommendation System

In this notebook, we will create a music recommendation system using a dataset of songs. The approach uses text preprocessing and cosine similarity to recommend songs based on user input.


### 1. Unzipping the Dataset

In [4]:
import zipfile

# Specify the file path
zip_file_path = 'archive (6).zip'  # Replace with the path to your zip file
extract_to_path = 'C:/Users/suyash/Documents/Extracted/'  # Replace with the desired extraction location

# Open the zip file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    # Extract all the contents to the specified directory
    zip_ref.extractall(extract_to_path)

print("File unzipped successfully!")


File unzipped successfully!


In [2]:
import pandas as pd

### 2.Loading the Dataset

In [5]:
df = pd.read_csv('spotify_millsongdata.csv')

In [6]:
#Display the first five rows of the dataframe
df.head()

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \r\nA..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \r\nTouch me gen..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \r\nWhy I had...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...


In [7]:
#Display the last five rows of the dataframe
df.tail()

Unnamed: 0,artist,song,link,text
57645,Ziggy Marley,Good Old Days,/z/ziggy+marley/good+old+days_10198588.html,Irie days come on play \r\nLet the angels fly...
57646,Ziggy Marley,Hand To Mouth,/z/ziggy+marley/hand+to+mouth_20531167.html,Power to the workers \r\nMore power \r\nPowe...
57647,Zwan,Come With Me,/z/zwan/come+with+me_20148981.html,all you need \r\nis something i'll believe \...
57648,Zwan,Desire,/z/zwan/desire_20148986.html,northern star \r\nam i frightened \r\nwhere ...
57649,Zwan,Heartsong,/z/zwan/heartsong_20148991.html,come in \r\nmake yourself at home \r\ni'm a ...


In [8]:
#Check the missing values
df.isnull().sum()

artist    0
song      0
link      0
text      0
dtype: int64

In [9]:
#Display the shape of the dataframe
df.shape

(57650, 4)

57650 : Columns

4     : Rows

In [10]:
df

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \r\nA..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \r\nTouch me gen..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \r\nWhy I had...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...
...,...,...,...,...
57645,Ziggy Marley,Good Old Days,/z/ziggy+marley/good+old+days_10198588.html,Irie days come on play \r\nLet the angels fly...
57646,Ziggy Marley,Hand To Mouth,/z/ziggy+marley/hand+to+mouth_20531167.html,Power to the workers \r\nMore power \r\nPowe...
57647,Zwan,Come With Me,/z/zwan/come+with+me_20148981.html,all you need \r\nis something i'll believe \...
57648,Zwan,Desire,/z/zwan/desire_20148986.html,northern star \r\nam i frightened \r\nwhere ...


### 3. Data Sampling and  Preprocessing

In [11]:
#Sample 5000 rows and drop the 'link' column
df =df.sample(5000).drop('link', axis=1).reset_index(drop=True)

In [12]:
# Display the shape of the sampled DataFrame
df.shape

(5000, 3)

In [13]:
# Display the first text entry
df['text'][0]

"[Intro]  \r\nOh, oh  \r\nUh-oh-oh-oh  \r\nEh  \r\n  \r\n[Verse 1]  \r\nIt's a little blurry how the whole thing started  \r\nI don't even really know what you intended  \r\nThought that you were cute and you could make me jealous  \r\nPoured it down, so I poured it down  \r\nNext thing that I know I'm in a hotel with you  \r\nYou were talking deep like it was mad love to you  \r\nYou wanted my heart but I just liked your tattoos  \r\nPoured it down, so I poured it down  \r\n  \r\n[Pre-Chorus 1]  \r\nAnd now I don't understand it  \r\nYou don't mess with love, you mess with the truth  \r\nAnd I know I shouldn't say it  \r\nBut my heart don't understand  \r\n  \r\n[Chorus 1]  \r\nWhy I got you on my mind  \r\nWhy I got you on my mind  \r\nWhy I got you on my mind  \r\nWhy I got you on my mind  \r\nBut my heart don't understand  \r\nWhy I got you on my mind  \r\nWhy I got you on my mind  \r\nWhy I got you on my mind  \r\nWhy I got you on my mind  \r\n  \r\n[Verse 2]  \r\nI always hear, a

In [14]:
# Text Preprocessing: Convert text to lowercase and clean it
df['text'] = df['text'].str.lower().replace(r'^\w\s', ' ').replace(r'\n', ' ', regex = True)

In [15]:
df['text']

0       [intro]  \r oh, oh  \r uh-oh-oh-oh  \r eh  \r ...
1       there's something on your mind  \r by the way ...
2       i wake up in a different daylight  \r guess i ...
3       [verse:]  \r stare in the face of the grim dea...
4       like a candle  \r burning bright  \r love is g...
                              ...                        
4995    in the passion your heart is abused  \r he is ...
4996    i remember when you seemed real shiny with the...
4997    fill the sky with love  \r fill the sky with l...
4998    well, you rang me up you say you wanna get hig...
4999    sometimes they're in a bottle  \r sometimes a ...
Name: text, Length: 5000, dtype: object

In [16]:
df.head()

Unnamed: 0,artist,song,text
0,Ellie Goulding,On My Mind,"[intro] \r oh, oh \r uh-oh-oh-oh \r eh \r ..."
1,Etta James,There's Something On Your Mind,there's something on your mind \r by the way ...
2,Roxette,You Turn Me On,i wake up in a different daylight \r guess i ...
3,Marilyn Manson,Day 3,[verse:] \r stare in the face of the grim dea...
4,Whitney Houston,Nobody Loves Me Like You Do,like a candle \r burning bright \r love is g...


### 4. Tokenization and Stemming


In [17]:
import nltk
from nltk.stem.porter import PorterStemmer

# Initialize the Porter Stemmer
stemmer = PorterStemmer()

# Define a function for tokenization and stemming
def tokenization(txt):
    tokens = nltk.word_tokenize(txt)# Tokenize the text
    stemming = [stemmer.stem(w) for w in tokens]  # Apply stemming
    return " ".join(stemming) # Return the processed text



In [18]:
# Apply the tokenization function to the text column
df['text'] = df['text'].apply(lambda x: tokenization(x))

### 5. Feature Extraction With TF-IDF

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
# Initialize the TF-IDF vectorizer
tfidvector = TfidfVectorizer(analyzer='word',stop_words='english')

# Fit and transform the text data into a TF-IDF matrix
matrix = tfidvector.fit_transform(df['text'])

# Calculate cosine similarity between the songs
similarity = cosine_similarity(matrix)

In [None]:
# Display similarity scores for the first song
similarity[0]

### 6.Song Recommendation Function

We will define a function that takes a song name as input and returns a list of recommended songs.

In [None]:
# Function to recommend songs based on similarity
def recommendation(song_df):
    idx = df[df['song'] == song_df].index[0] # Find the index of the input song
    distances = sorted(list(enumerate(similarity[idx])),reverse=True,key=lambda x:x[1]) # Sort songs by similarity
    
    songs = []
    for m_id in distances[1:21]: # Get top 20 similar songs
        songs.append(df.iloc[m_id[0]].song)
        
    return songs

In [None]:
# Test the recommendation function
recommendation('Crying Over You')


### 7.Saving the Model

In [None]:
import pickle

# Save the similarity matrix and the DataFrame to pickle files
pickle.dump(similarity, open("similarity.pkl", "wb"))
pickle.dump(df, open("df.pkl", "wb"))