## Dataset Information

Million Songs Dataset contains of two files: triplet_file and metadata_file. The triplet_file contains user_id, song_id and listen time. The metadata_file contains song_id, title, release, year and artist_name. Million Songs Dataset is a mixture of song from various website with the rating that users gave after listening to the song.

There are 3 types of recommendation system: content-based, collaborative and popularity.

## Import modules

In [1]:
import pandas as pd
import numpy as np
import Recommenders as Recommenders

## Loading the dataset

In [2]:
song_df_1 = pd.read_csv('triplets_file.csv')
song_df_1.head()

Unnamed: 0,user_id,song_id,listen_count
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1


In [3]:
song_df_2 = pd.read_csv('song_data.csv')
song_df_2.head()

Unnamed: 0,song_id,title,release,artist_name,year
0,SOQMMHC12AB0180CB8,Silent Night,Monster Ballads X-Mas,Faster Pussy cat,2003
1,SOVFVAK12A8C1350D9,Tanssi vaan,Karkuteillä,Karkkiautomaatti,1995
2,SOGTUKN12AB017F4F1,No One Could Ever,Butter,Hudson Mohawke,2006
3,SOBNYVR12A8C13558C,Si Vos Querés,De Culo,Yerba Brava,2003
4,SOHSBXH12A8C13B0DF,Tangle Of Aspens,Rene Ablaze Presents Winter Sessions,Der Mystic,0


In [4]:
# combine both data
song_df = pd.merge(song_df_1, song_df_2.drop_duplicates(['song_id']), on='song_id', how='left')
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999


In [5]:
print(len(song_df_1), len(song_df_2))

2000000 1000000


In [6]:
len(song_df)

2000000

## Data Preprocessing

In [7]:
# creating new feature combining title and artist name
song_df['song'] = song_df['title']+' - '+song_df['artist_name']
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0,The Cove - Jack Johnson
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,Entre Dos Aguas - Paco De Lucia
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007,Stronger - Kanye West
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005,Constellations - Jack Johnson
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999,Learn To Fly - Foo Fighters


In [8]:
# taking top 10k samples for quick results
song_df = song_df.head(10000)

In [9]:
# cummulative sum of listen count of the songs
song_grouped = song_df.groupby(['song']).agg({'listen_count':'count'}).reset_index()
song_grouped.head()

Unnamed: 0,song,listen_count
0,#40 - DAVE MATTHEWS BAND,1
1,& Down - Boys Noize,4
2,'97 Bonnie & Clyde - Eminem,2
3,'Round Midnight - Miles Davis,3
4,'Till I Collapse - Eminem / Nate Dogg,6


In [10]:
grouped_sum = song_grouped['listen_count'].sum()
song_grouped['percentage'] = (song_grouped['listen_count'] / grouped_sum ) * 100
song_grouped.sort_values(['listen_count', 'song'], ascending=[0,1])

Unnamed: 0,song,listen_count,percentage
3660,Sehr kosmisch - Harmonia,45,0.45
4678,Undo - Björk,32,0.32
5105,You're The One - Dwight Yoakam,32,0.32
1071,Dog Days Are Over (Radio Edit) - Florence + Th...,28,0.28
3655,Secrets - OneRepublic,28,0.28
...,...,...,...
5139,high fives - Four Tet,1,0.01
5140,in white rooms - Booka Shade,1,0.01
5143,paranoid android - Christopher O'Riley,1,0.01
5149,¿Lo Ves? [Piano Y Voz] - Alejandro Sanz,1,0.01


## Popularity Recommendation Engine

In [11]:
pr = Recommenders.popularity_recommender_py()

In [12]:
pr.create(song_df, 'user_id', 'song')

In [13]:
# display the top 10 popular songs
pr.recommend(song_df['user_id'][2])

Unnamed: 0,user_id,song,score,Rank
3660,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Sehr kosmisch - Harmonia,45,1.0
4678,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Undo - Björk,32,2.0
5105,b80344d063b5ccb3212f76538f3d9e43d87dca9e,You're The One - Dwight Yoakam,32,3.0
1071,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Dog Days Are Over (Radio Edit) - Florence + Th...,28,4.0
3655,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Secrets - OneRepublic,28,5.0
4378,b80344d063b5ccb3212f76538f3d9e43d87dca9e,The Scientist - Coldplay,27,6.0
4712,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Use Somebody - Kings Of Leon,27,7.0
3476,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Revelry - Kings Of Leon,26,8.0
1387,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Fireflies - Charttraxx Karaoke,24,9.0
1862,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Horn Concerto No. 4 in E flat K495: II. Romanc...,23,10.0


## Item Similarity Recommendation

In [15]:
song_df

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0,The Cove - Jack Johnson
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,Entre Dos Aguas - Paco De Lucia
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007,Stronger - Kanye West
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005,Constellations - Jack Johnson
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999,Learn To Fly - Foo Fighters
...,...,...,...,...,...,...,...,...
9995,15cc706a7f24975ca831aaaf297bf0392746b3fe,SOFSETB12A8C134038,2,Show Me How To Live,Audioslave,Audioslave,2002,Show Me How To Live - Audioslave
9996,15cc706a7f24975ca831aaaf297bf0392746b3fe,SOHIROU12AB01852AF,5,Billy Liar,Billy Liar (CD-Single),The Decemberists,2003,Billy Liar - The Decemberists
9997,15cc706a7f24975ca831aaaf297bf0392746b3fe,SOOAVGC12AB01821EC,5,The Bachelor and the Bride,Her Majesty The Decemberists,The Decemberists,2003,The Bachelor and the Bride - The Decemberists
9998,15cc706a7f24975ca831aaaf297bf0392746b3fe,SOPKEIV12AB018220D,5,Red Right Ankle,Her Majesty The Decemberists,The Decemberists,2003,Red Right Ankle - The Decemberists


In [26]:
ir = Recommenders.item_similarity_recommender_py()
ir.create(song_df, 'user_id', 'song')

In [29]:
user_items = ir.get_user_items(song_df['user_id'][355])

In [30]:
# display user songs history
for user_item in user_items:
    print(user_item)

Harder Better Faster Stronger - Daft Punk
Phantom Part 1.5 (Album Version) - Justice
I Got Mine - The Black Keys
Face To Face (Cosmo VItelli Remix) - Daft Punk
Pogo - Digitalism
Rorol - Octopus Project
Tchaparian - Hot Chip
Sometimes Things Get_ Whatever - Deadmau5
Auto-Dub - Skream
That Was Just A Dream - Cut Copy
Face To Face (Demon Remix) - Daft Punk
Aerodynamic - Daft Punk
Swallowed In The Sea - Coldplay
Marble House - The Knife
Hilarious Movie Of The 90s - Four Tet
One More Time (Romanthony's Unplugged) - Daft Punk
Echo Sam - Holy Fuck
Indo Silver Club - Daft Punk
Younger Than Springtime - William Tabbert
Take It In - Hot Chip
We're Looking For A Lot Of Love - Hot Chip
Emotion - Daft Punk
Slip - Deadmau5
Parks - Four Tet
Korg Rhythm Afro - Holy Fuck
Us V Them - LCD Soundsystem
Monkey Man - Amy Winehouse
Love - Simian Mobile Disco
Stay Lit - Holy Fuck
TTHHEE PPAARRTTYY - Justice
Love Theme - Fred Falke
We Have Love - Hot Chip
Newjack - Justice
Staralfur - Sigur Ros
Full Circle (Exp

In [32]:
# give song recommendation for that user
ir.recommend(song_df['user_id'][355])

No. of unique songs for the user: 396
no. of unique songs in the training set: 5151
Non zero values in cooccurence_matrix :193268


Unnamed: 0,user_id,song,score,rank
0,5a905f000fc1ff3df7ca807d57edb608863db05d,Shake A Fist - Hot Chip,0.090615,1
1,5a905f000fc1ff3df7ca807d57edb608863db05d,La Rock 01 - Vitalic,0.090615,2
2,5a905f000fc1ff3df7ca807d57edb608863db05d,Indra - Thievery Corporation,0.090615,3
3,5a905f000fc1ff3df7ca807d57edb608863db05d,Needy Girl - Chromeo,0.090615,4
4,5a905f000fc1ff3df7ca807d57edb608863db05d,Vietnam - Crystal Castles,0.07545,5
5,5a905f000fc1ff3df7ca807d57edb608863db05d,Suffocation - Crystal Castles,0.07545,6
6,5a905f000fc1ff3df7ca807d57edb608863db05d,Tomorrow Comes Today - Gorillaz,0.07545,7
7,5a905f000fc1ff3df7ca807d57edb608863db05d,Riot Van - Arctic Monkeys,0.07545,8
8,5a905f000fc1ff3df7ca807d57edb608863db05d,In A Darkened Room - Skid Row,0.07545,9
9,5a905f000fc1ff3df7ca807d57edb608863db05d,Red Light Indicates Doors Are Secured - Arctic...,0.07545,10


In [33]:
ir.recommend(song_df['user_id'][15])

No. of unique songs for the user: 45
no. of unique songs in the training set: 5151
Non zero values in cooccurence_matrix :6844


Unnamed: 0,user_id,song,score,rank
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Oliver James - Fleet Foxes,0.043076,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Quiet Houses - Fleet Foxes,0.043076,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Your Protector - Fleet Foxes,0.043076,3
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Tiger Mountain Peasant Song - Fleet Foxes,0.043076,4
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Sun It Rises - Fleet Foxes,0.043076,5
5,b80344d063b5ccb3212f76538f3d9e43d87dca9e,The End - Pearl Jam,0.037531,6
6,b80344d063b5ccb3212f76538f3d9e43d87dca9e,St. Elsewhere - Dave Grusin,0.037531,7
7,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Misled - Céline Dion,0.037531,8
8,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Oil And Water - Incubus,0.037531,9
9,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Meadowlarks - Fleet Foxes,0.037531,10


In [42]:
# give related songs based on the words
ir.get_similar_items(["Going Nowhere - Cut Copy"])

no. of unique songs in the training set: 5151
Non zero values in cooccurence_matrix :554


Unnamed: 0,user_id,song,score,rank
0,,Time Stands Still - Cut Copy,1.0,1
1,,Suena tu guitarra - Fernando Soto,1.0,2
2,,That Was Just A Dream - Cut Copy,1.0,3
3,,Visions - Cut Copy,0.75,4
4,,A Dream - Cut Copy,0.75,5
5,,Feel The Love - Cut Copy,0.75,6
6,,Hearts On Fire - Cut Copy,0.75,7
7,,Saturdays - Cut Copy,0.75,8
8,,We Fight For Diamonds - Cut Copy,0.666667,9
9,,So Haunted - Cut Copy,0.666667,10
