## Dataset Information

Million Songs Dataset contains of two files: triplet_file and metadata_file. The triplet_file contains user_id, song_id and listen time. The metadata_file contains song_id, title, release, year and artist_name. Million Songs Dataset is a mixture of song from various website with the rating that users gave after listening to the song.

There are 3 types of recommendation system: content-based, collaborative and popularity.

## Import modules

In [1]:
import pandas as pd
import numpy as np
import Recommenders as Recommenders

## Loading the dataset

In [2]:
song_df_1 = pd.read_csv('triplets_file.csv')
song_df_1.head()

Unnamed: 0,user_id,song_id,listen_count
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1


In [3]:
song_df_2 = pd.read_csv('song_data.csv')
song_df_2.head()

Unnamed: 0,song_id,title,release,artist_name,year
0,SOQMMHC12AB0180CB8,Silent Night,Monster Ballads X-Mas,Faster Pussy cat,2003.0
1,SOVFVAK12A8C1350D9,Tanssi vaan,Karkuteillä,Karkkiautomaatti,1995.0
2,SOGTUKN12AB017F4F1,No One Could Ever,Butter,Hudson Mohawke,2006.0
3,SOBNYVR12A8C13558C,Si Vos Querés,De Culo,Yerba Brava,2003.0
4,SOHSBXH12A8C13B0DF,Tangle Of Aspens,Rene Ablaze Presents Winter Sessions,Der Mystic,0.0


In [4]:
# combine both data
song_df = pd.merge(song_df_1, song_df_2.drop_duplicates(['song_id']), on='song_id', how='left')
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,,,,
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976.0
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,,,,
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,,,,
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,,,,


In [5]:
print(len(song_df_1), len(song_df_2))

33258 40146


In [6]:
len(song_df)

33258

## Data Preprocessing

In [7]:
# creating new feature combining title and artist name
song_df['song'] = song_df['title']+' - '+song_df['artist_name']
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,,,,,
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976.0,Entre Dos Aguas - Paco De Lucia
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,,,,,
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,,,,,
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,,,,,


In [8]:
# taking top 10k samples for quick results
song_df = song_df.head(10000)

In [9]:
# cummulative sum of listen count of the songs
song_grouped = song_df.groupby(['song']).agg({'listen_count':'count'}).reset_index()
song_grouped.head()

Unnamed: 0,song,listen_count
0,(Anaesthesia) Pulling Teath - Metallica,1
1,A Dream - Cut Copy,4
2,A Kind Of Hope - Pilot Speed,1
3,Achy Breaky Heart - Billy Ray Cyrus,1
4,Addicted - Amy Winehouse,2


In [10]:
grouped_sum = song_grouped['listen_count'].sum()
song_grouped['percentage'] = (song_grouped['listen_count'] / grouped_sum ) * 100
song_grouped.sort_values(['listen_count', 'song'], ascending=[0,1])

Unnamed: 0,song,listen_count,percentage
129,Nothin' On You [feat. Bruno Mars] (Album Versi...,16,3.579418
227,You've Got The Love - Florence + The Machine,10,2.237136
86,If I Ain't Got You - Alicia Keys,9,2.013423
198,Toxic - Britney Spears,9,2.013423
145,Rabbit Heart (Raise It Up) - Florence + The Ma...,8,1.789709
...,...,...,...
222,You Love Me (Album Version) - Devotchka,1,0.223714
224,You Wanted A Hit - LCD Soundsystem,1,0.223714
225,You Were Meant For Me (LP Version) - Jewel,1,0.223714
226,You'll Never Know (My Love) (Bovellian 07 Mix)...,1,0.223714


## Popularity Recommendation Engine

In [11]:
pr = Recommenders.popularity_recommender_py()

In [12]:
pr.create(song_df, 'user_id', 'song')

In [13]:
# display the top 10 popular songs
pr.recommend(song_df['user_id'][5])

Unnamed: 0,user_id,song,score,Rank
129,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Nothin' On You [feat. Bruno Mars] (Album Versi...,16,1.0
227,b80344d063b5ccb3212f76538f3d9e43d87dca9e,You've Got The Love - Florence + The Machine,10,2.0
86,b80344d063b5ccb3212f76538f3d9e43d87dca9e,If I Ain't Got You - Alicia Keys,9,3.0
198,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Toxic - Britney Spears,9,4.0
145,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Rabbit Heart (Raise It Up) - Florence + The Ma...,8,5.0
75,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Here Without You - 3 Doors Down,7,6.0
55,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Eenie Meenie - Sean Kingston and Justin Bieber,6,7.0
28,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Big Me - Foo Fighters,5,8.0
62,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Forgive Me - Leona Lewis,5,9.0
81,b80344d063b5ccb3212f76538f3d9e43d87dca9e,I Can't Stay - The Killers,5,10.0


In [14]:
pr.recommend(song_df['user_id'][100])

Unnamed: 0,user_id,song,score,Rank
129,e006b1a48f466bf59feefed32bec6494495a4436,Nothin' On You [feat. Bruno Mars] (Album Versi...,16,1.0
227,e006b1a48f466bf59feefed32bec6494495a4436,You've Got The Love - Florence + The Machine,10,2.0
86,e006b1a48f466bf59feefed32bec6494495a4436,If I Ain't Got You - Alicia Keys,9,3.0
198,e006b1a48f466bf59feefed32bec6494495a4436,Toxic - Britney Spears,9,4.0
145,e006b1a48f466bf59feefed32bec6494495a4436,Rabbit Heart (Raise It Up) - Florence + The Ma...,8,5.0
75,e006b1a48f466bf59feefed32bec6494495a4436,Here Without You - 3 Doors Down,7,6.0
55,e006b1a48f466bf59feefed32bec6494495a4436,Eenie Meenie - Sean Kingston and Justin Bieber,6,7.0
28,e006b1a48f466bf59feefed32bec6494495a4436,Big Me - Foo Fighters,5,8.0
62,e006b1a48f466bf59feefed32bec6494495a4436,Forgive Me - Leona Lewis,5,9.0
81,e006b1a48f466bf59feefed32bec6494495a4436,I Can't Stay - The Killers,5,10.0


## Item Similarity Recommendation

In [15]:
ir = Recommenders.item_similarity_recommender_py()
ir.create(song_df, 'user_id', 'song')

In [16]:
user_items = ir.get_user_items(song_df['user_id'][5])

In [17]:
# display user songs history
for user_item in user_items:
    print(user_item)

nan
Entre Dos Aguas - Paco De Lucia
Love Song For No One - John Mayer


In [18]:
# give song recommendation for that user
ir.recommend(song_df['user_id'][5])

No. of unique songs for the user: 3
no. of unique songs in the training set: 232
Non zero values in cooccurence_matrix :8


Unnamed: 0,user_id,song,score,rank
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,DONTTRUSTME (Explicit Album Version) - 3OH!3,0.111111,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Here Without You - 3 Doors Down,0.041667,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Rabbit Heart (Raise It Up) - Florence + The Ma...,0.037037,3
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Toxic - Britney Spears,0.033333,4
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Clenching The Fists Of Dissent (Explicit Album...,0.0,5
5,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Helpless - Crosby_ Stills_ Nash and Young,0.0,6
6,b80344d063b5ccb3212f76538f3d9e43d87dca9e,We've Only Just Begun - Carpenters,0.0,7
7,b80344d063b5ccb3212f76538f3d9e43d87dca9e,You Wanted A Hit - LCD Soundsystem,0.0,8
8,b80344d063b5ccb3212f76538f3d9e43d87dca9e,The Light - Common,0.0,9
9,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Wooden Ships (LP Version) - Crosby_ Stills & Nash,0.0,10


In [19]:
# give related songs based on the words
ir.get_similar_items(['Oliver James - Fleet Foxes', 'The End - Pearl Jam'])

no. of unique songs in the training set: 232
Non zero values in cooccurence_matrix :0


Unnamed: 0,user_id,song,score,rank
0,,Clenching The Fists Of Dissent (Explicit Album...,0.0,1
1,,Helpless - Crosby_ Stills_ Nash and Young,0.0,2
2,,We've Only Just Begun - Carpenters,0.0,3
3,,You Wanted A Hit - LCD Soundsystem,0.0,4
4,,The Light - Common,0.0,5
5,,Wooden Ships (LP Version) - Crosby_ Stills & Nash,0.0,6
6,,Arc Of Time (time Code) (Album Version) - Brig...,0.0,7
7,,Slow Death - Flamin' Groovies,0.0,8
8,,Lo Que Yo No Tengo - Son By 4,0.0,9
9,,Wonderwall - Ryan Adams,0.0,10
