# Music Recommendation System

- **`import pandas as pd`**: Imports the `pandas` library, which is used for data manipulation and analysis, and aliases it as `pd`.
- **`import numpy as np`**: Imports the `numpy` library, which provides support for numerical operations and array handling, and aliases it as `np`.
- **`from sklearn.model_selection import train_test_split`**: Imports the `train_test_split` function from `scikit-learn`, which is used to split data into training and testing sets.
- **`from sklearn.metrics.pairwise import cosine_similarity`**: Imports the `cosine_similarity` function from `scikit-learn`, used to calculate the cosine similarity between vectors.
- **`from scipy.sparse import csr_matrix`**: Imports the `csr_matrix` from `scipy.sparse`, which is a compressed sparse row matrix format for efficient storage and operations on large sparse matrices.


In [50]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import csr_matrix


- **`file_path = '/mnt/data/Spotify-2000.csv'`**: Defines the file path to the dataset.
- **`data = pd.read_csv('./Spotify-2000.csv')`**: Loads the CSV file into a pandas DataFrame named `data`.
- **`data.head()`**: Displays the first few rows of the DataFrame to give a quick overview of the dataset's structure and content.


In [51]:
file_path = '/mnt/data/Spotify-2000.csv'
data = pd.read_csv('./Spotify-2000.csv')

data.head()

Unnamed: 0,Index,Title,Artist,Top Genre,Year,Beats Per Minute (BPM),Energy,Danceability,Loudness (dB),Liveness,Valence,Length (Duration),Acousticness,Speechiness,Popularity
0,1,Sunrise,Norah Jones,adult standards,2004,157,30,53,-14,11,68,201,94,3,71
1,2,Black Night,Deep Purple,album rock,2000,135,79,50,-11,17,81,207,17,7,39
2,3,Clint Eastwood,Gorillaz,alternative hip hop,2001,168,69,66,-9,7,52,341,2,17,69
3,4,The Pretender,Foo Fighters,alternative metal,2007,173,96,43,-4,3,37,269,0,4,76
4,5,Waitin' On A Sunny Day,Bruce Springsteen,classic rock,2002,106,82,58,-5,10,87,256,1,3,59


### Data Preparation and Encoding

The code cleans the DataFrame by removing duplicates and missing values, then resets the index. It displays summary statistics for numerical columns and converts categorical features into numerical format using one-hot encoding, dropping the first category to avoid redundancy.


In [57]:
data.drop_duplicates(inplace=True)
data.dropna(inplace=True)

data.reset_index(drop=True, inplace=True)

data.describe()

data_encoded = pd.get_dummies(data, columns=['Artist', 'Top Genre'], drop_first=True)


### Creating a User-Item Interaction Matrix for Collaborative Filtering

For collaborative filtering, we need a user-item interaction matrix. Since user information isn't available, we'll simulate users by grouping by 'Artist' or 'Top Genre'. 


In [58]:
# Example: Group by 'Artist' and Create a User-Item Matrix
user_item_matrix = data.pivot_table(index='Artist', columns='Title', values='Popularity', fill_value=0)

user_item_sparse = csr_matrix(user_item_matrix.values)


### Calculating Item Similarity

The code calculates the cosine similarity between items based on the user-item matrix and creates a DataFrame to display the similarity matrix. This helps in identifying how similar different items are to each other.


In [59]:
item_similarity = cosine_similarity(user_item_sparse.T)
item_similarity_df = pd.DataFrame(item_similarity, index=user_item_matrix.columns, columns=user_item_matrix.columns)

item_similarity_df.head()


Title,(Don't Fear) The Reaper,(Everything I Do) I Do It For You,(I Can't Get No) Satisfaction - Mono Version,"(I've Had) The Time of My Life - From ""Dirty Dancing"" Soundtrack",(Sittin' On) the Dock of the Bay,(Something Inside) So Strong,(They Long To Be) Close To You,(What A) Wonderful World,(You Make Me Feel Like) A Natural Woman,(You're The) Devil in Disguise,...,Zonder Jou,Zoutelande - feat. Geike,Zwart Wit,bad guy,k Heb Je Lief,kom terug,lippy kids,t Dondert En 't Bliksemt,tous les mêmes,Élan
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
(Don't Fear) The Reaper,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
(Everything I Do) I Do It For You,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
(I Can't Get No) Satisfaction - Mono Version,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"(I've Had) The Time of My Life - From ""Dirty Dancing"" Soundtrack",0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
(Sittin' On) the Dock of the Bay,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Song Recommendation Function

The `recommend_songs` function suggests songs similar to a given `song_title` using a similarity matrix. It checks if the song exists in the matrix, sorts similar songs by their similarity scores, and excludes the input song from recommendations. The function returns a DataFrame with the top recommendations, including song titles, artists, and a serial number for easy viewing.


In [55]:
def recommend_songs(song_title, similarity_matrix, num_recommendations=5):
    
    if song_title not in similarity_matrix.columns:#song title similarity
        return pd.DataFrame(columns=['S.No', 'Title', 'Artist'])  

    similar_songs = similarity_matrix[song_title].sort_values(ascending=False)
    
    recommended_songs = similar_songs.iloc[1:num_recommendations+1].index.tolist()
    
    recommendations = data[data['Title'].isin(recommended_songs)][['Title', 'Artist']].reset_index()
    
    recommendations.insert(0, 'S.No', recommendations.index + 1)
    
    return recommendations

In [56]:
# Example
example_song = 'Sunrise' 
recommendations = recommend_songs(example_song, item_similarity_df, num_recommendations=5)

if not recommendations.empty:
    display(recommendations.style.set_properties(**{'text-align': 'left'}).set_table_styles(
        [{'selector': 'th', 'props': [('text-align', 'left')]}]
    ))
else:
    print("No recommendations available.")

Unnamed: 0,S.No,index,Title,Artist
0,1,33,Don't Know Why,Norah Jones
1,2,1318,Holding Back the Years - 2008 Remaster,Simply Red
2,3,1481,If You Don't Know Me by Now - 2008 Remaster,Simply Red
3,4,1556,Stars,Simply Red
4,5,1693,Fairground,Simply Red
