## Dataset Information

Million Songs Dataset contains of two files: triplet_file and metadata_file. The triplet_file contains user_id, song_id and listen time. The metadata_file contains song_id, title, release, year and artist_name. Million Songs Dataset is a mixture of song from various website with the rating that users gave after listening to the song.

There are 3 types of recommendation system: content-based, collaborative and popularity.

## Import modules

In [1]:
import pandas as pd
import numpy as np
import Recommenders as Recommenders

## Loading the dataset

In [3]:
song_df_1 = pd.read_csv(r'D:\Github_slash_mark_project\slash_mark_project\major-project\model-3\data\triplets_file\triplets_file.csv')
song_df_1.head()

Unnamed: 0,user_id,song_id,listen_count
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1


In [4]:
song_df_2 = pd.read_csv(r'D:\Github_slash_mark_project\slash_mark_project\major-project\model-3\data\song_data\song_data.csv')
song_df_2.head()

Unnamed: 0,song_id,title,release,artist_name,year
0,SOQMMHC12AB0180CB8,Silent Night,Monster Ballads X-Mas,Faster Pussy cat,2003
1,SOVFVAK12A8C1350D9,Tanssi vaan,Karkuteillä,Karkkiautomaatti,1995
2,SOGTUKN12AB017F4F1,No One Could Ever,Butter,Hudson Mohawke,2006
3,SOBNYVR12A8C13558C,Si Vos Querés,De Culo,Yerba Brava,2003
4,SOHSBXH12A8C13B0DF,Tangle Of Aspens,Rene Ablaze Presents Winter Sessions,Der Mystic,0


In [5]:
# combine both data
song_df = pd.merge(song_df_1, song_df_2.drop_duplicates(['song_id']), on='song_id', how='left')
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999


In [6]:
print(len(song_df_1), len(song_df_2))

2000000 1000000


In [7]:
len(song_df)

2000000

## Data Preprocessing

In [8]:
# creating new feature combining title and artist name
song_df['song'] = song_df['title']+' - '+song_df['artist_name']
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0,The Cove - Jack Johnson
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,Entre Dos Aguas - Paco De Lucia
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007,Stronger - Kanye West
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005,Constellations - Jack Johnson
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999,Learn To Fly - Foo Fighters


# Small Size 

In [9]:
# taking top 10k samples for quick results
song_df = song_df.head(20000)

In [10]:
# cummulative sum of listen count of the songs
song_grouped = song_df.groupby(['song']).agg({'listen_count':'count'}).reset_index()
song_grouped.head(15)

Unnamed: 0,song,listen_count
0,#40 - DAVE MATTHEWS BAND,1
1,& Down - Boys Noize,5
2,' Cello Song - Nick Drake,1
3,'97 Bonnie & Clyde - Eminem,3
4,'Round Midnight - Amy Winehouse,2
5,'Round Midnight - Miles Davis,3
6,'Til We Die (Album Version) - Slipknot,1
7,'Till I Collapse - Eminem / Nate Dogg,10
8,(Anaesthesia) Pulling Teath - Metallica,1
9,(I Cant Get No) Satisfaction - Cat Power,2


In [11]:
grouped_sum = song_grouped['listen_count'].sum()
song_grouped['percentage'] = (song_grouped['listen_count'] / grouped_sum ) * 100
song_grouped.sort_values(['listen_count', 'song'], ascending=[0,1])

Unnamed: 0,song,listen_count,percentage
5183,Sehr kosmisch - Harmonia,79,0.395
1514,Dog Days Are Over (Radio Edit) - Florence + Th...,69,0.345
4918,Revelry - Kings Of Leon,65,0.325
6603,Undo - Björk,65,0.325
5175,Secrets - OneRepublic,64,0.320
...,...,...,...
7226,Zwitter - Rammstein,1,0.005
7228,aNYway - Armand Van Helden & A-TRAK Present Du...,1,0.005
7230,high fives - Four Tet,1,0.005
7231,in white rooms - Booka Shade,1,0.005


# Popularity Recommendation Engine Method 1

In [12]:
pr = Recommenders.popularity_recommender_py()

In [13]:
pr.create(song_df, 'user_id', 'song')

In [14]:
# display the top 10 popular songs
pr.recommend(song_df['user_id'][5])

Unnamed: 0,user_id,song,score,Rank
5183,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Sehr kosmisch - Harmonia,79,1.0
1514,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Dog Days Are Over (Radio Edit) - Florence + Th...,69,2.0
4918,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Revelry - Kings Of Leon,65,3.0
6603,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Undo - Björk,65,4.0
5175,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Secrets - OneRepublic,64,5.0
7185,b80344d063b5ccb3212f76538f3d9e43d87dca9e,You're The One - Dwight Yoakam,60,6.0
2631,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Horn Concerto No. 4 in E flat K495: II. Romanc...,53,7.0
1972,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Fireflies - Charttraxx Karaoke,52,8.0
6199,b80344d063b5ccb3212f76538f3d9e43d87dca9e,The Scientist - Coldplay,50,9.0
2547,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Hey_ Soul Sister - Train,45,10.0


In [15]:
pr.recommend(song_df['user_id'][100])

Unnamed: 0,user_id,song,score,Rank
5183,e006b1a48f466bf59feefed32bec6494495a4436,Sehr kosmisch - Harmonia,79,1.0
1514,e006b1a48f466bf59feefed32bec6494495a4436,Dog Days Are Over (Radio Edit) - Florence + Th...,69,2.0
4918,e006b1a48f466bf59feefed32bec6494495a4436,Revelry - Kings Of Leon,65,3.0
6603,e006b1a48f466bf59feefed32bec6494495a4436,Undo - Björk,65,4.0
5175,e006b1a48f466bf59feefed32bec6494495a4436,Secrets - OneRepublic,64,5.0
7185,e006b1a48f466bf59feefed32bec6494495a4436,You're The One - Dwight Yoakam,60,6.0
2631,e006b1a48f466bf59feefed32bec6494495a4436,Horn Concerto No. 4 in E flat K495: II. Romanc...,53,7.0
1972,e006b1a48f466bf59feefed32bec6494495a4436,Fireflies - Charttraxx Karaoke,52,8.0
6199,e006b1a48f466bf59feefed32bec6494495a4436,The Scientist - Coldplay,50,9.0
2547,e006b1a48f466bf59feefed32bec6494495a4436,Hey_ Soul Sister - Train,45,10.0


## Item Similarity Recommendation

In [16]:
ir = Recommenders.item_similarity_recommender_py()
ir.create(song_df, 'user_id', 'song')

In [17]:
user_items = ir.get_user_items(song_df['user_id'][5])

In [18]:
song_df['user_id'][5]

'b80344d063b5ccb3212f76538f3d9e43d87dca9e'

In [19]:
user_5_id = song_df['user_id'][5]
user_5_info = song_df[song_df['user_id'] == user_5_id]
user_5_info

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0,The Cove - Jack Johnson
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,Entre Dos Aguas - Paco De Lucia
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007,Stronger - Kanye West
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005,Constellations - Jack Johnson
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999,Learn To Fly - Foo Fighters
5,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODDNQT12A6D4F5F7E,5,Apuesta Por El Rock 'N' Roll,Antología Audiovisual,Héroes del Silencio,2007,Apuesta Por El Rock 'N' Roll - Héroes del Sile...
6,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODXRTY12AB0180F3B,1,Paper Gangsta,The Fame Monster,Lady GaGa,2008,Paper Gangsta - Lady GaGa
7,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOFGUAY12AB017B0A8,1,Stacked Actors,There Is Nothing Left To Lose,Foo Fighters,1999,Stacked Actors - Foo Fighters
8,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOFRQTD12A81C233C0,1,Sehr kosmisch,Musik von Harmonia,Harmonia,0,Sehr kosmisch - Harmonia
9,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOHQWYZ12A6D4FA701,1,Heaven's gonna burn your eyes,Hôtel Costes 7 by Stéphane Pompougnac,Thievery Corporation feat. Emiliana Torrini,2002,Heaven's gonna burn your eyes - Thievery Corpo...


In [20]:
# display user songs history
for user_item in user_items:
    print(user_item)

The Cove - Jack Johnson
Entre Dos Aguas - Paco De Lucia
Stronger - Kanye West
Constellations - Jack Johnson
Learn To Fly - Foo Fighters
Apuesta Por El Rock 'N' Roll - Héroes del Silencio
Paper Gangsta - Lady GaGa
Stacked Actors - Foo Fighters
Sehr kosmisch - Harmonia
Heaven's gonna burn your eyes - Thievery Corporation feat. Emiliana Torrini
Let It Be Sung - Jack Johnson / Matt Costa / Zach Gill / Dan Lebowitz / Steve Adams
I'll Be Missing You (Featuring Faith Evans & 112)(Album Version) - Puff Daddy
Love Shack - The B-52's
Clarity - John Mayer
I?'m A Steady Rollin? Man - Robert Johnson
The Old Saloon - The Lonely Island
Behind The Sea [Live In Chicago] - Panic At The Disco
Champion - Kanye West
Breakout - Foo Fighters
Ragged Wood - Fleet Foxes
Mykonos - Fleet Foxes
Country Road - Jack Johnson / Paula Fuga
Oh No - Andrew Bird
Love Song For No One - John Mayer
Jewels And Gold - Angus & Julia Stone
83 - John Mayer
Neon - John Mayer
The Middle - Jimmy Eat World
High and dry - Jorge Drexle

# METHOD 2

In [21]:
# give song recommendation for that user
ir.recommend(song_df['user_id'][5])

No. of unique songs for the user: 45
no. of unique songs in the training set: 7242
Non zero values in cooccurence_matrix :11401


Unnamed: 0,user_id,song,score,rank
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Oil And Water - Incubus,0.033294,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Quiet Houses - Fleet Foxes,0.030706,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Your Protector - Fleet Foxes,0.030706,3
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Tiger Mountain Peasant Song - Fleet Foxes,0.030706,4
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Sun It Rises - Fleet Foxes,0.030706,5
5,b80344d063b5ccb3212f76538f3d9e43d87dca9e,The End - Pearl Jam,0.030219,6
6,b80344d063b5ccb3212f76538f3d9e43d87dca9e,White Winter Hymnal - Fleet Foxes,0.029673,7
7,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Take California - Propellerheads,0.028793,8
8,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Pressure - MORS PRINCIPIUM EST,0.028059,9
9,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Pills Drunk Daddy - Kyle Cease,0.028059,10


# METHOD 3 


In [22]:
# give related songs based on the words
ir.get_similar_items(['Oliver James - Fleet Foxes', 'The End - Pearl Jam'])

no. of unique songs in the training set: 7242
Non zero values in cooccurence_matrix :272


Unnamed: 0,user_id,song,score,rank
0,,Quiet Houses - Fleet Foxes,0.5,1
1,,Your Protector - Fleet Foxes,0.5,2
2,,Tiger Mountain Peasant Song - Fleet Foxes,0.5,3
3,,Sun It Rises - Fleet Foxes,0.5,4
4,,He Doesn't Know Why - Fleet Foxes,0.45,5
5,,Meadowlarks - Fleet Foxes,0.416667,6
6,,Heard Them Stirring - Fleet Foxes,0.416667,7
7,,St. Elsewhere - Dave Grusin,0.375,8
8,,White Winter Hymnal - Fleet Foxes,0.330357,9
9,,Oil And Water - Incubus,0.266667,10


In [23]:
import pickle

# Save the merged dataset (song_df)
pickle.dump(song_df, open('song_df.pkl', 'wb'))

# Save the popularity-based recommendation model
pickle.dump(pr, open('popularity_model.pkl', 'wb'))

# Save the item similarity-based recommendation model
pickle.dump(ir, open('item_similarity_model.pkl', 'wb'))

print("Pickle files created successfully!")

Pickle files created successfully!
