<a href="https://colab.research.google.com/github/SisekoC/OpenCV/blob/main/Million_Songs_Data_Recommendation_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Dataset Information

Million Songs Dataset contains of two files: triplet_file and metadata_file. The triplet_file contains user_id, song_id and listen time. The metadata_file contains song_id, title, release, year and artist_name. Million Songs Dataset is a mixture of song from various website with the rating that users gave after listening to the song.

There are 3 types of recommendation system: content-based, collaborative and popularity.

## Import modules

In [1]:
import pandas as pd
import numpy as np
import Recommenders as Recommenders

## Loading the dataset

In [2]:
song_df_1 = pd.read_csv('triplets_file.csv')
song_df_1.head()

Unnamed: 0,user_id,song_id,listen_count
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1.0
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2.0
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1.0
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1.0
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1.0


In [3]:
song_df_2 = pd.read_csv('song_data.csv')
song_df_2.head()

Unnamed: 0,song_id,title,release,artist_name,year
0,SOQMMHC12AB0180CB8,Silent Night,Monster Ballads X-Mas,Faster Pussy cat,2003.0
1,SOVFVAK12A8C1350D9,Tanssi vaan,Karkuteillä,Karkkiautomaatti,1995.0
2,SOGTUKN12AB017F4F1,No One Could Ever,Butter,Hudson Mohawke,2006.0
3,SOBNYVR12A8C13558C,Si Vos Querés,De Culo,Yerba Brava,2003.0
4,SOHSBXH12A8C13B0DF,Tangle Of Aspens,Rene Ablaze Presents Winter Sessions,Der Mystic,0.0


In [4]:
# combine both data
song_df = pd.merge(song_df_1, song_df_2.drop_duplicates(['song_id']), on='song_id', how='left')
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1.0,,,,
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2.0,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976.0
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1.0,,,,
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1.0,,,,
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1.0,,,,


In [5]:
print(len(song_df_1), len(song_df_2))

16630 26791


In [6]:
len(song_df)

16630

## Data Preprocessing

In [7]:
# creating new feature combining title and artist name
song_df['song'] = song_df['title']+' - '+song_df['artist_name']
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1.0,,,,,
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2.0,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976.0,Entre Dos Aguas - Paco De Lucia
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1.0,,,,,
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1.0,,,,,
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1.0,,,,,


In [8]:
# taking top 10k samples for quick results
song_df = song_df.head(10000)

In [9]:
# cummulative sum of listen count of the songs
song_grouped = song_df.groupby(['song']).agg({'listen_count':'count'}).reset_index()
song_grouped.head()

Unnamed: 0,song,listen_count
0,(Anaesthesia) Pulling Teath - Metallica,1
1,A Dream - Cut Copy,4
2,A Kind Of Hope - Pilot Speed,1
3,Addicted - Amy Winehouse,2
4,All Men Are Liars - Nick Lowe,3


In [10]:
grouped_sum = song_grouped['listen_count'].sum()
song_grouped['percentage'] = (song_grouped['listen_count'] / grouped_sum ) * 100
song_grouped.sort_values(['listen_count', 'song'], ascending=[0,1])

Unnamed: 0,song,listen_count,percentage
89,Nothin' On You [feat. Bruno Mars] (Album Versi...,16,5.245902
157,You've Got The Love - Florence + The Machine,10,3.278689
137,Toxic - Britney Spears,9,2.950820
39,Eenie Meenie - Sean Kingston and Justin Bieber,6,1.967213
20,Big Me - Foo Fighters,5,1.639344
...,...,...,...
151,Wooden Ships (LP Version) - Crosby_ Stills & Nash,1,0.327869
152,Work It Out - Beyoncé,1,0.327869
154,You Love Me (Album Version) - Devotchka,1,0.327869
156,You Wanted A Hit - LCD Soundsystem,1,0.327869


## Popularity Recommendation Engine

In [11]:
pr = Recommenders.popularity_recommender_py()

In [12]:
pr.create(song_df, 'user_id', 'song')

In [13]:
# display the top 10 popular songs
pr.recommend(song_df['user_id'][5])

Unnamed: 0,user_id,song,score,Rank
89,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Nothin' On You [feat. Bruno Mars] (Album Versi...,16,1.0
157,b80344d063b5ccb3212f76538f3d9e43d87dca9e,You've Got The Love - Florence + The Machine,10,2.0
137,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Toxic - Britney Spears,9,3.0
39,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Eenie Meenie - Sean Kingston and Justin Bieber,6,4.0
20,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Big Me - Foo Fighters,5,5.0
73,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Marshall Examines His Carcass - Octopus Project,5,6.0
94,b80344d063b5ccb3212f76538f3d9e43d87dca9e,One - Metallica,5,7.0
115,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Sincerité Et Jalousie - Alliance Ethnik,5,8.0
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,A Dream - Cut Copy,4,9.0
65,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Kiss (LP Version) - Prince & The Revolution,4,10.0


In [14]:
pr.recommend(song_df['user_id'][100])

Unnamed: 0,user_id,song,score,Rank
89,e006b1a48f466bf59feefed32bec6494495a4436,Nothin' On You [feat. Bruno Mars] (Album Versi...,16,1.0
157,e006b1a48f466bf59feefed32bec6494495a4436,You've Got The Love - Florence + The Machine,10,2.0
137,e006b1a48f466bf59feefed32bec6494495a4436,Toxic - Britney Spears,9,3.0
39,e006b1a48f466bf59feefed32bec6494495a4436,Eenie Meenie - Sean Kingston and Justin Bieber,6,4.0
20,e006b1a48f466bf59feefed32bec6494495a4436,Big Me - Foo Fighters,5,5.0
73,e006b1a48f466bf59feefed32bec6494495a4436,Marshall Examines His Carcass - Octopus Project,5,6.0
94,e006b1a48f466bf59feefed32bec6494495a4436,One - Metallica,5,7.0
115,e006b1a48f466bf59feefed32bec6494495a4436,Sincerité Et Jalousie - Alliance Ethnik,5,8.0
1,e006b1a48f466bf59feefed32bec6494495a4436,A Dream - Cut Copy,4,9.0
65,e006b1a48f466bf59feefed32bec6494495a4436,Kiss (LP Version) - Prince & The Revolution,4,10.0


## Item Similarity Recommendation

In [15]:
ir = Recommenders.item_similarity_recommender_py()
ir.create(song_df, 'user_id', 'song')

In [16]:
user_items = ir.get_user_items(song_df['user_id'][5])

In [17]:
# display user songs history
for user_item in user_items:
    print(user_item)

nan
Entre Dos Aguas - Paco De Lucia


In [18]:
# give song recommendation for that user
ir.recommend(song_df['user_id'][5])

No. of unique songs for the user: 2
no. of unique songs in the training set: 161
Non zero values in cooccurence_matrix :2


Unnamed: 0,user_id,song,score,rank
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Toxic - Britney Spears,0.05,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Clenching The Fists Of Dissent (Explicit Album...,0.0,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Helpless - Crosby_ Stills_ Nash and Young,0.0,3
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,We've Only Just Begun - Carpenters,0.0,4
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,You Wanted A Hit - LCD Soundsystem,0.0,5
5,b80344d063b5ccb3212f76538f3d9e43d87dca9e,The Light - Common,0.0,6
6,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Wooden Ships (LP Version) - Crosby_ Stills & Nash,0.0,7
7,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Arc Of Time (time Code) (Album Version) - Brig...,0.0,8
8,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Wonderwall - Ryan Adams,0.0,9
9,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Don't Upset The Rhythm (Go Baby Go) - Noisettes,0.0,10


In [19]:
# give related songs based on the words
ir.get_similar_items(['Oliver James - Fleet Foxes', 'The End - Pearl Jam'])

no. of unique songs in the training set: 161
Non zero values in cooccurence_matrix :0


Unnamed: 0,user_id,song,score,rank
0,,Clenching The Fists Of Dissent (Explicit Album...,0.0,1
1,,Helpless - Crosby_ Stills_ Nash and Young,0.0,2
2,,We've Only Just Begun - Carpenters,0.0,3
3,,You Wanted A Hit - LCD Soundsystem,0.0,4
4,,The Light - Common,0.0,5
5,,Wooden Ships (LP Version) - Crosby_ Stills & Nash,0.0,6
6,,Arc Of Time (time Code) (Album Version) - Brig...,0.0,7
7,,Wonderwall - Ryan Adams,0.0,8
8,,Don't Upset The Rhythm (Go Baby Go) - Noisettes,0.0,9
9,,No One - Cold,0.0,10
