<a href="https://colab.research.google.com/github/Rajnandanigithub/Songs_Recommendation_using_Word2Vec_model/blob/main/Song_Recommendation_using_word2Vec.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Song Recommendation Using Word2Vec Model**

In this approach, the Word2Vec algorithm is used to generate embeddings for songs based on human-curated playlists. The idea is to treat each song as a "word" and each playlist as a "sentence." By training the model on playlists, where songs frequently appear together, we allow the algorithm to learn relationships between songs that often co-occur in the same playlist. This way, songs that are contextually similar (i.e., often appear together) are represented by similar vectors in the embedding space. These song embeddings can then be used to recommend similar songs that are frequently grouped together in playlists. Essentially, the model captures patterns in user preferences, offering personalized song recommendations based on the co-occurrence of songs in different playlists.

In [2]:
import pandas as pd
import numpy as np

In [3]:
from google.colab import files
import pandas as pd

# Step 1: Upload the files
uploaded = files.upload()

Saving play_list_train.txt to play_list_train.txt


In [4]:
from google.colab import files
import pandas as pd

# Step 1: Upload the files
uploaded = files.upload()

Saving playlist _song_hash.txt to playlist _song_hash.txt


In [5]:
import os
os.listdir('/content/')

['.config', 'play_list_train.txt', 'playlist _song_hash.txt', 'sample_data']

In [6]:


# The uploaded files will be in '/content/', so you can use these paths
train_file_path = '/content/play_list_train.txt'  # Replace with the name of your uploaded file
song_file_path = '/content/playlist _song_hash.txt'  # Replace with the name of your uploaded file

# Step 2: Read the playlist dataset file from the uploaded location
with open(train_file_path, 'r') as file:
    lines = file.read().split('\n')[2:]

# Step 3: Remove playlists with only one song
playlists = [s.rstrip().split() for s in lines if len(s.split()) > 1]

# Step 4: Read song metadata from the uploaded location
with open(song_file_path, 'r') as file:
    songs_file = file.read().split('\n')

# Step 5: Parse the song metadata
songs = [s.rstrip().split('\t') for s in songs_file]

# Step 6: Create a DataFrame for the song metadata
songs_df = pd.DataFrame(data=songs, columns=['id', 'title', 'artist'])
songs_df = songs_df.set_index('id')

# Display the first few rows of the DataFrame
songs_df.head()


Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Gucci Time (w\/ Swizz Beatz),Gucci Mane
1,Aston Martin Music (w\/ Drake & Chrisette Mich...,Rick Ross
2,Get Back Up (w\/ Chris Brown),T.I.
3,Hot Toddy (w\/ Jay-Z & Ester Dean),Usher
4,Whip My Hair,Willow


In [7]:
## Model Training

In [8]:
from gensim.models import Word2Vec
# Train our Word2Vec model
model = Word2Vec(
 playlists, vector_size=32, window=20, negative=50,min_count=1, workers=4)


In [9]:
song_id = 2172
# Ask the model for songs similar to song #2172
model.wv.most_similar(positive=str(song_id))


[('2849', 0.9980979561805725),
 ('3167', 0.9969621300697327),
 ('11473', 0.9968299269676208),
 ('2640', 0.9965595006942749),
 ('5549', 0.9962553381919861),
 ('3094', 0.9957565069198608),
 ('3105', 0.9954192638397217),
 ('3136', 0.9953827857971191),
 ('2715', 0.9953437447547913),
 ('2976', 0.9942629933357239)]

In [10]:
# Note : That is the list of the songs whose embeddings are most similar to song Id 2172


In [11]:
# Lets find out which song corresponds to song id 2172

In [12]:
print(songs_df.iloc[2172])

title     Fade To Black
artist        Metallica
Name: 2172 , dtype: object


In [13]:
## Making a fnction named make_recommendation that will take song_id as argument and return a dataframe constituting song_id , Title , artist
import numpy as np
def print_recommendations(song_id):
 similar_songs = np.array(
 model.wv.most_similar(positive=str(song_id),topn=5)
 )[:,0]
 return songs_df.iloc[similar_songs]
# Extract recommendations
print_recommendations(2172)


Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
2849,Run To The Hills,Iron Maiden
3167,Unchained,Van Halen
11473,Little Guitars,Van Halen
2640,Red Barchetta,Rush
5549,November Rain,Guns N' Roses


In [14]:
# checking for unique artists presents
songs_df["artist"].nunique()

15976

In [15]:
songs_df.shape

(75263, 2)

In [16]:
songs_df["artist"].value_counts()

Unnamed: 0_level_0,count
artist,Unnamed: 1_level_1
-,1812
The Beatles,201
Frank Sinatra,166
Vicente Fernandez,166
Metallica,141
...,...
"Peedi Crakk, Beanie Sigel, Freeway & Young Chris",1
Blackberry Smoke,1
Earl Hooker,1
Mable John,1


In [17]:
songs_df[songs_df["artist"]=="The Beatles"]

Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
1675,Let It Be,The Beatles
2578,Don't Let Me Down (w\/ Billy Preston),The Beatles
2663,Come Together,The Beatles
2788,Sgt. Pepper's Lonely Hearts Club Band,The Beatles
2789,A Day In The Life,The Beatles
...,...,...
73490,Goodnight,The Beatles
73494,Anna (Go To Him),The Beatles
74639,The Continuing Story Of Bungalow Bill,The Beatles
74946,"I'll Get You (Mono, Past masters)",The Beatles


In [18]:
import numpy as np
def print_recommendations(song_id):
 similar_songs = np.array(
 model.wv.most_similar(positive=str(song_id),topn=5)
 )[:,0]
 return songs_df.iloc[similar_songs]
# Extract recommendations
print_recommendations(1675)

Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
3106,You Ain't Seen Nothing Yet,Bachman-Turner Overdrive
2915,Some Kind Of Wonderful,Grand Funk Railroad
2819,Magic Carpet Ride,Steppenwolf
2963,Born To Be Wild,Steppenwolf
3052,Honky Tonk Women,The Rolling Stones


songs recommended similar to "Let it Be" in playlist are "you Ain't Seen Nothing Yet " , "Down to The Corner" ,"Revolution"