<a href="https://colab.research.google.com/github/Saifullah785/Machine_Learning_Projects/blob/main/Project-8-Music-Recommendation-System-Using-ML/Project_8_Music_Recommendation_System_Using_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data Loading and Exploration:**

The project begins by loading song data from a CSV file into a pandas DataFrame. Initial steps involve displaying the head of the DataFrame and checking its dimensions to understand the data structure.

# **Data Preprocessing:**

A sample of 5000 songs is selected, the 'link' column is removed, and the index is reset. The song lyrics ('text' column) are cleaned by converting them to lowercase, removing special characters, and replacing newlines with spaces.

# **Text Vectorization and Similarity Calculation:**

Natural Language Toolkit (nltk) is used for text processing, including tokenization and stemming. The text data is then transformed into a TF-IDF matrix using TfidfVectorizer. Cosine similarity is calculated on this matrix to determine the similarity between songs.

# **Recommendation System:**

A function is defined to recommend songs based on their similarity. Given a song title, the function finds the index of the song, sorts the similarity scores, and returns the titles of the top similar songs.

# **Model Application and Output:**

The unique song titles in the DataFrame are printed. Finally, the recommendation function is applied to a specific song ('Some People Are Crazy') to demonstrate the system's functionality and print the recommended songs.

In [95]:
# importing necessary libraries

import pandas as pd
import numpy as np


In [96]:
# Read the song data from a CSV file into a pandas DataFrame

df = pd.read_csv('songdata.csv')

# Display the first few rows of the DataFrame

df.head()

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \nAnd..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \nTouch me gentl..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \nWhy I had t...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...


In [97]:
# Display the dimensions (number of rows and columns) of the DataFrame

df.shape

(57650, 4)

In [98]:
# Sample 5000 random rows from the DataFrame, drop the 'link' column, and reset the index

df = df.sample(n=5000).drop('link', axis=1).reset_index(drop=True)

In [99]:
# Display the dimensions of the DataFrame after sampling

df.shape

(5000, 3)

In [100]:
# Clean the 'text' column by converting to lowercase, removing special characters and replacing newlines with spaces

df['text'] = df['text'].str.lower().replace(r'[^\w\s]','').replace(r'\n',' ', regex=True)

In [101]:
# Display the cleaned text of the first song

df['text'][0]

"chestnuts roasting on an open fire   jack frost nipping at your nose   yule-tide carols being sung by a choir   and folks dressed up like eskimos.      everybody knows a turkey and some mistletoe   help to make the season bright   tiny tots with their eyes all aglow   will find it hard to sleep tonight.      they know that santa's on his way   he's loaded lots of toys and goodies on his sleigh   and every mother's child is gonna spy   to see if reindeer really know how to fly.      and so i'm offering this simple phrase   to kids from one to ninety-two   although it's been said many times, many ways   merry christmas to you!  "

In [102]:
import nltk
from nltk.stem.porter import PorterStemmer

# Initialize the Porter Stemmer

stemmer = PorterStemmer()

# Define a function for tokenization and stemming

def tokenization(txt):
    tokens = nltk.word_tokenize(txt)
    stemming = [stemmer.stem(w) for w in tokens]
    return " ".join(stemming)

In [103]:
# Check if the 'punkt_tab' tokenizer is available, otherwise download it

try:
    nltk.data.find('tokenizers/punkt_tab')
except LookupError:
    nltk.download('punkt_tab')

In [90]:
# Apply the tokenization and stemming function to the 'text' column

df['text'] = df['text'].apply(lambda x: tokenization(x))

In [91]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [92]:
# Initialize TfidfVectorizer with 'word' analyzer and 'english' stop words

tfidvector = TfidfVectorizer(analyzer='word',stop_words='english')

# Fit the vectorizer to the 'text' data and transform it into a TF-IDF matrix

matrix = tfidvector.fit_transform(df['text'])

# Calculate the cosine similarity between all pairs of songs

similarity = cosine_similarity(matrix)

In [93]:
# Display the similarity scores for the first song

similarity[0]

array([1.        , 0.02830446, 0.        , ..., 0.01618107, 0.0267796 ,
       0.0080036 ])

In [94]:
# Find songs with an empty title (likely for debugging or checking data integrity)

df[df['song']=='']

Unnamed: 0,artist,song,text,song_cleaned


# **recommedation function**

In [84]:
# Define the recommendation function

def recommendation(song_df):

    # Get the index of the input song
    idx = df[df['song'] == song_df].index[0]

    # Sort the similarity scores for the input song in descending order and get the indices
    distances = sorted(list(enumerate(similarity[idx])),reverse=True,key=lambda x:x[1])

    songs = []

     # Get the top 20 most similar songs (excluding the input song itself)

    for m_id in distances[1:21]:
        songs.append(df.iloc[m_id[0]].song)

    # Return the list of recommended song titles
    return songs

In [89]:
# Print all unique song titles in the DataFrame
print(df['song'].unique())

['Some People Are Crazy' 'My Eyes' 'Silent Night' ... 'Write Off The Debt'
 'The Great Misconception Of Me' 'Antechrist']


In [88]:
# Get recommendations for the song 'Some People Are Crazy' and print them

recommendation('Some People Are Crazy')

['Green Grass',
 'People',
 'People Need The Lord',
 'Two People',
 'Eleanor Rigby',
 'An Innocent Man',
 'Kissing A Fool',
 'Margaret On The Guillotine',
 'Take It To Heart',
 'An Innocent Man',
 'Trinity',
 'Armagideon Time',
 'Beautiful People',
 'City Song',
 "Don't Save It All For Christmas Day",
 'Drunken Hearted Boy',
 'Dallas',
 'The Way It Is',
 'Something So Right',
 "Let's See Action"]

In [67]:
# import pickle
# pickle.dump(similarity,open('similarity.pkl','wb'))
# pickle.dump(df,open('df.pkl','wb'))