#Music Recommendation System
##Introduction:
Based on the textual content of song lyrics, this music recommendation system will suggest songs. It uses machine learning and Natural Language Processing (NLP) techniques to find songs with similar lyrics and introduce music lovers to new songs they might like.
The dataset used for this project is "Songs Recommendation Dataset
" available on kaggle(https://www.kaggle.com/datasets/noorsaeed/songs-recommendation-dataset/)

##Use:
Users can input the name of a song they like, and the system will provide a list of 20 recommended songs based on textual similarity in the lyrics. The recommendations are generated using TF-IDF vectorization and cosine similarity, making it a valuable tool for discovering new music.

In [None]:
import pandas as pd
import numpy as np
import nltk
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


* It begins by reading a song dataset from a CSV file containing song information.
* Randomly selects and resets the index of 5000 rows for further analysis.







In [None]:

df = pd.read_csv('songdata.csv')


df = df.sample(n=5000).reset_index(drop=True)


* Standardizes the text data for processing by converting it to lowercase and removing special characters.

In [None]:

#standardizing the text data for processing
df['text'] = df['text'].str.lower().replace(r'[^\w\s]', '').replace(r'\n', ' ', regex=True)
df.head(2)

A Porter Stemmer is initialized in the first line. In the following function, called "tokenization," text input is tokenized into words, and each word is stemmed using the Porter Stemmer. The function returns the stemmed words as a single string, effectively reducing words to their root forms.

In [None]:


stemmer = PorterStemmer()


def tokenization(txt):
    tokens = nltk.word_tokenize(txt)
    stemming = [stemmer.stem(w) for w in tokens]
    return " ".join(stemming)

download the NLTK 'punkt' dataset for tokenization. Then, it applies the "tokenization" function to each element in the 'text' column of the DataFrame 'df.' This function tokenizes and stems the text, essentially preprocessing it for further analysis.

In [None]:



nltk.download('punkt')
df['text'] = df['text'].apply(lambda x: tokenization(x))

* It creates a TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer using scikit-learn.
* The vectorizer processes the 'text' data in the DataFrame, treating it as a bag of words with English stop words removed.
* Computes the cosine similarity between the TF-IDF vectors for the song lyrics.
* This similarity matrix is used for text similarity calculations, which serve as the basis for song recommendations.

In [None]:



tfidvector = TfidfVectorizer(analyzer='word', stop_words='english')
matrix = tfidvector.fit_transform(df['text'])
similarity = cosine_similarity(matrix)


* It checks for rows in the DataFrame where the 'song' column is empty or equal to an empty string.
* If matching rows are found, it extracts the index of the first matching row, providing insights into data quality.
* If no matching rows are found, it prints a message confirming that there are no empty 'song' values.

In [None]:


empty_song_rows = df[df['song'] == '']
if not empty_song_rows.empty:

    empty_song_index = empty_song_rows.index[0]
    print("Index of the first empty 'song':", empty_song_index)
else:

    print("No rows with 'song' equal to '' were found.")

## Recommendation Function:

* The core functionality of the recommendation system is encapsulated in a function called "recommendation."
* When given the name of a song, this function finds songs with lyrics similar to the input song.
* It identifies the index of the input song in the DataFrame, computes the similarity of the input song with other songs, and returns a list of the top 20 recommended songs based on similarity.

In [None]:
#recommendation function
def recommendation(song_df):
    idx = df[df['song'] == song_df].index[0]
    distances = sorted(list(enumerate(similarity[idx])),reverse=True,key=lambda x:x[1])

    songs = []
    for m_id in distances[1:21]:
        songs.append(df.iloc[m_id[0]].song)

    return songs


##Creating the Gradio Interface:

* The system uses Gradio, an open-source Python library for creating user interfaces, to make the recommendation system accessible.
* Sets up a Gradio interface that takes a song name as input and returns a list of recommended songs as output.
* It employs a compact theme with a specified title and description.
* Additional CSS styling is applied to limit the maximum height of the output.


In [None]:
!pip install gradio
import gradio as gr
iface = gr.Interface(
    fn=recommendation,
    inputs="text",
    outputs="text",
    live=True,
    theme="compact",
    title="Song Recommender",
    description="Enter a song name to get recommendations",
    css=""".output {max-height: 400px;}""",
)

iface.launch()

##Conclusion:
This Music Recommendation System combines data preprocessing, text processing, and machine learning to offer personalized song recommendations. The Gradio interface makes it user-friendly and accessible to music enthusiasts, providing them with a novel way to explore music based on their preferences.