# Dataset Information

Dataset contains two files: triplet_file. The triplet_file contains user_id, song_id and listen time. 
The metadata_file contains song_id, title, release, year and artist_name. The Dataset is a mixture of songs from various websites with the rating that users gave after listening to the song.


The dataset involves 3 types of recommendation system: content based, collaborative and popularity.

## Import modules

In [26]:
import pandas as pd
import numpy as np
import Recommenders as Recommenders

## Load the dataset

In [27]:
song_df_1 = pd.read_csv('triplets_file.csv')
song_df_1.head()


Unnamed: 0,user_id,song_id,listen_count
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1


In [28]:
song_df_2 = pd.read_csv('song_data.csv')
song_df_2.head()

Unnamed: 0,song_id,title,release,artist_name,year
0,SOQMMHC12AB0180CB8,Silent Night,Monster Ballads X-Mas,Faster Pussy cat,2003
1,SOVFVAK12A8C1350D9,Tanssi vaan,Karkuteillä,Karkkiautomaatti,1995
2,SOGTUKN12AB017F4F1,No One Could Ever,Butter,Hudson Mohawke,2006
3,SOBNYVR12A8C13558C,Si Vos Querés,De Culo,Yerba Brava,2003
4,SOHSBXH12A8C13B0DF,Tangle Of Aspens,Rene Ablaze Presents Winter Sessions,Der Mystic,0


In [29]:
# combining both data
song_df = pd.merge(song_df_1,
song_df_2.drop_duplicates(['song_id']), on='song_id', how='left')
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999


In [30]:
print(len(song_df_1), len(song_df_2))

2000000 1000000


In [31]:
len(song_df)

2000000

## Data Preprocessing
### Combining multiple kewwords into a feature

In [32]:
# a new feature combining title and artist name
song_df['song'] = song_df['title']+' - '+song_df['artist_name']
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0,The Cove - Jack Johnson
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,Entre Dos Aguas - Paco De Lucia
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007,Stronger - Kanye West
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005,Constellations - Jack Johnson
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999,Learn To Fly - Foo Fighters


In [33]:
# testing out with 15,000 samples 
song_df = song_df.head(15000)

In [39]:
# cummulative sum of the listen count of the songs
song_grouped = song_df.groupby(['song']).agg({'listen_count':'count'}).reset_index()
song_grouped.head()

Unnamed: 0,song,listen_count
0,#40 - DAVE MATTHEWS BAND,1
1,& Down - Boys Noize,4
2,' Cello Song - Nick Drake,1
3,'97 Bonnie & Clyde - Eminem,2
4,'Round Midnight - Miles Davis,3


In [40]:
grouped_sum = song_grouped['listen_count'].sum()
song_grouped['percentage'] = (song_grouped['listen_count'] / grouped_sum ) * 100
song_grouped.sort_values(['listen_count', 'song'], ascending=[0,1])

Unnamed: 0,song,listen_count,percentage
4577,Sehr kosmisch - Harmonia,62,0.413333
1312,Dog Days Are Over (Radio Edit) - Florence + Th...,53,0.353333
5844,Undo - Björk,48,0.320000
4570,Secrets - OneRepublic,46,0.306667
4334,Revelry - Kings Of Leon,44,0.293333
...,...,...,...
6400,aNYway - Armand Van Helden & A-TRAK Present Du...,1,0.006667
6402,high fives - Four Tet,1,0.006667
6403,in white rooms - Booka Shade,1,0.006667
6406,paranoid android - Christopher O'Riley,1,0.006667


## Popularity Recommendation Engine

In [41]:
pr = Recommenders.popularity_recommender_py()

In [42]:
pr.create(song_df, 'user_id', 'song')

### Displaying top 10 songs based on "popularity"

In [43]:
pr.recommend(song_df['user_id'][5])

Unnamed: 0,user_id,song,score,Rank
4577,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Sehr kosmisch - Harmonia,62,1.0
1312,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Dog Days Are Over (Radio Edit) - Florence + Th...,53,2.0
5844,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Undo - Björk,48,3.0
4570,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Secrets - OneRepublic,46,4.0
4334,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Revelry - Kings Of Leon,44,5.0
6363,b80344d063b5ccb3212f76538f3d9e43d87dca9e,You're The One - Dwight Yoakam,41,6.0
2299,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Horn Concerto No. 4 in E flat K495: II. Romanc...,38,7.0
5476,b80344d063b5ccb3212f76538f3d9e43d87dca9e,The Scientist - Coldplay,36,8.0
5884,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Use Somebody - Kings Of Leon,34,9.0
2227,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Hey_ Soul Sister - Train,33,10.0


In [44]:
# displaying top 10 recommended songs
pr.recommend(song_df['user_id'][100])

Unnamed: 0,user_id,song,score,Rank
4577,e006b1a48f466bf59feefed32bec6494495a4436,Sehr kosmisch - Harmonia,62,1.0
1312,e006b1a48f466bf59feefed32bec6494495a4436,Dog Days Are Over (Radio Edit) - Florence + Th...,53,2.0
5844,e006b1a48f466bf59feefed32bec6494495a4436,Undo - Björk,48,3.0
4570,e006b1a48f466bf59feefed32bec6494495a4436,Secrets - OneRepublic,46,4.0
4334,e006b1a48f466bf59feefed32bec6494495a4436,Revelry - Kings Of Leon,44,5.0
6363,e006b1a48f466bf59feefed32bec6494495a4436,You're The One - Dwight Yoakam,41,6.0
2299,e006b1a48f466bf59feefed32bec6494495a4436,Horn Concerto No. 4 in E flat K495: II. Romanc...,38,7.0
5476,e006b1a48f466bf59feefed32bec6494495a4436,The Scientist - Coldplay,36,8.0
5884,e006b1a48f466bf59feefed32bec6494495a4436,Use Somebody - Kings Of Leon,34,9.0
2227,e006b1a48f466bf59feefed32bec6494495a4436,Hey_ Soul Sister - Train,33,10.0


## Song Similarity Recommendation

In [46]:
ir = Recommenders.item_similarity_recommender_py()
ir.create(song_df, 'user_id', 'song')

In [47]:
user_items = ir.get_user_items(song_df['user_id'][5])

### Displaying the songs from user history

In [48]:
for user_item in user_items:
    print(user_item)

The Cove - Jack Johnson
Entre Dos Aguas - Paco De Lucia
Stronger - Kanye West
Constellations - Jack Johnson
Learn To Fly - Foo Fighters
Apuesta Por El Rock 'N' Roll - Héroes del Silencio
Paper Gangsta - Lady GaGa
Stacked Actors - Foo Fighters
Sehr kosmisch - Harmonia
Heaven's gonna burn your eyes - Thievery Corporation feat. Emiliana Torrini
Let It Be Sung - Jack Johnson / Matt Costa / Zach Gill / Dan Lebowitz / Steve Adams
I'll Be Missing You (Featuring Faith Evans & 112)(Album Version) - Puff Daddy
Love Shack - The B-52's
Clarity - John Mayer
I?'m A Steady Rollin? Man - Robert Johnson
The Old Saloon - The Lonely Island
Behind The Sea [Live In Chicago] - Panic At The Disco
Champion - Kanye West
Breakout - Foo Fighters
Ragged Wood - Fleet Foxes
Mykonos - Fleet Foxes
Country Road - Jack Johnson / Paula Fuga
Oh No - Andrew Bird
Love Song For No One - John Mayer
Jewels And Gold - Angus & Julia Stone
83 - John Mayer
Neon - John Mayer
The Middle - Jimmy Eat World
High and dry - Jorge Drexle

In [49]:
ir.recommend(song_df['user_id'][5])

No. of unique songs for the user: 45
no. of unique songs in the training set: 6414
Non zero values in cooccurence_matrix :9186


Unnamed: 0,user_id,song,score,rank
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Oil And Water - Incubus,0.041464,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,The End - Pearl Jam,0.037225,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Oliver James - Fleet Foxes,0.031305,3
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Quiet Houses - Fleet Foxes,0.031305,4
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Your Protector - Fleet Foxes,0.031305,5
5,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Tiger Mountain Peasant Song - Fleet Foxes,0.031305,6
6,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Sun It Rises - Fleet Foxes,0.031305,7
7,b80344d063b5ccb3212f76538f3d9e43d87dca9e,White Winter Hymnal - Fleet Foxes,0.028909,8
8,b80344d063b5ccb3212f76538f3d9e43d87dca9e,St. Elsewhere - Dave Grusin,0.028877,9
9,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Meadowlarks - Fleet Foxes,0.028877,10


In [51]:
# related songs based on words
ir.get_similar_items(['Oliver James - Fleet Foxes', 'The End - Pearl Jam'])

no. of unique songs in the training set: 6414
Non zero values in cooccurence_matrix :194


Unnamed: 0,user_id,song,score,rank
0,,Quiet Houses - Fleet Foxes,0.666667,1
1,,Your Protector - Fleet Foxes,0.666667,2
2,,Tiger Mountain Peasant Song - Fleet Foxes,0.666667,3
3,,Sun It Rises - Fleet Foxes,0.666667,4
4,,St. Elsewhere - Dave Grusin,0.5,5
5,,Meadowlarks - Fleet Foxes,0.5,6
6,,Id Die Without You - P.M. Dawn,0.5,7
7,,Meet Virginia - Train,0.5,8
8,,Heard Them Stirring - Fleet Foxes,0.5,9
9,,He Doesn't Know Why - Fleet Foxes,0.5,10
