# Recommender System: Python, HetRec 2011 Last.FM Dataset<br>
## Simple Recommender based on artist popularity<br>

Reference: 2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011). I. Cantod, P Brusilovsky, T. Kuflik. Proceedings of the 5th ACM conference on Recommender systems.<br>
https://grouplens.org/datasets/hetrec-2011/<br>

In [1]:
import pandas as pd
import numpy as np

In [2]:
artists = pd.read_csv('artists.dat',
                      delimiter='\t', low_memory=False)

In [3]:
artists.head(3)

Unnamed: 0,id,name,url,pictureURL
0,1,MALICE MIZER,http://www.last.fm/music/MALICE+MIZER,http://userserve-ak.last.fm/serve/252/10808.jpg
1,2,Diary of Dreams,http://www.last.fm/music/Diary+of+Dreams,http://userserve-ak.last.fm/serve/252/3052066.jpg
2,3,Carpathian Forest,http://www.last.fm/music/Carpathian+Forest,http://userserve-ak.last.fm/serve/252/40222717...


In [44]:
artists = artists.rename(columns = {'id':'artistID'})
artists = artists.set_index('artistID')
artists.head(3)

Unnamed: 0_level_0,name,url,pictureURL
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,MALICE MIZER,http://www.last.fm/music/MALICE+MIZER,http://userserve-ak.last.fm/serve/252/10808.jpg
2,Diary of Dreams,http://www.last.fm/music/Diary+of+Dreams,http://userserve-ak.last.fm/serve/252/3052066.jpg
3,Carpathian Forest,http://www.last.fm/music/Carpathian+Forest,http://userserve-ak.last.fm/serve/252/40222717...


In [4]:
user_artists = pd.read_csv('user_artists.dat',
                      delimiter='\t', low_memory=False)

In [5]:
user_artists.head()
#weight corresponds to listening count

Unnamed: 0,userID,artistID,weight
0,2,51,13883
1,2,52,11690
2,2,53,11351
3,2,54,10300
4,2,55,8983


In [70]:
tags = pd.read_csv('tags.dat', delimiter='\t', low_memory=False, encoding='latin-1')

In [71]:
tags.head(3)

Unnamed: 0,tagID,tagValue
0,1,metal
1,2,alternative metal
2,3,goth rock


In [82]:
tags = tags.set_index('tagID')
tags.head(3)

Unnamed: 0_level_0,tagValue
tagID,Unnamed: 1_level_1
1,metal
2,alternative metal
3,goth rock


In [9]:
user_taggedartists = pd.read_csv('user_taggedartists.dat',
                                 delimiter='\t', low_memory=False)

In [79]:
user_taggedartists.head(3)

Unnamed: 0,userID,artistID,tagID,day,month,year
0,2,52,13,1,4,2009
1,2,52,15,1,4,2009
2,2,52,18,1,4,2009


In [80]:
user_taggedartists.dtypes

userID      int64
artistID    int64
tagID       int64
day         int64
month       int64
year        int64
dtype: object

In [81]:
art_user_taggedartists = user_taggedartists.set_index('artistID')
art_user_taggedartists.head(3)

Unnamed: 0_level_0,userID,tagID,day,month,year
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
52,2,13,1,4,2009
52,2,15,1,4,2009
52,2,18,1,4,2009


### Weighting Formula Used
Weighted Popularity (WP) = (u/(u+m).R)+(m/(u+m).A)<br>
where,<br>
u corresponds to the number of users listening to a particular artist,<br>
m corresponds to the minimum number of users listening to any one artist,<br>
R corresponds to the average number of times artist was listened, and<br>
A corresponds to the average number of times any artist was listened to across all artists.

In [59]:
# calculate A
art_sumweight = user_artists.groupby('artistID').sum()
art_sumweight = art_sumweight.drop(['userID'], axis=1)
art_sumweight.head(5)

Unnamed: 0_level_0,weight
artistID,Unnamed: 1_level_1
1,771
2,8012
3,775
4,563
5,913


In [60]:
A = art_sumweight['weight'].mean()
A

3923.773536751361

In [61]:
# calculate m
numuser_art = user_artists.groupby('artistID').count()
numuser_art.head(5)

Unnamed: 0_level_0,userID,weight
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,3,3
2,12,12
3,3,3
4,2,2
5,2,2


In [62]:
m = numuser_art['userID'].quantile(0.90)
m

8.0

In [63]:
art_mostusers = numuser_art.copy().loc[numuser_art['userID'] >= m]
art_mostusers = art_mostusers.drop(['weight'], axis=1)
art_mostusers.shape
art_mostusers.head(3)

Unnamed: 0_level_0,userID
artistID,Unnamed: 1_level_1
2,12
6,10
7,133


In [64]:
# combine art_sumweight with art_mostusers
art_mostusers_sumweight = pd.merge(art_sumweight, art_mostusers,
                                  how='inner', left_index=True,
                                   right_index=True)
art_mostusers_sumweight.head(3)

Unnamed: 0_level_0,weight,userID
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1
2,8012,12
6,5080,10
7,96201,133


In [65]:
# function computes the weighted popularity of each artist
def weighted_pop(df, m=m, A=A):
    u = df['userID'] 
    R = df['weight'] 
    return (u/(u+m)*R)+(m/(m+u)*A)

In [66]:
art_mostusers_sumweight['popularity'] = art_mostusers_sumweight.apply(weighted_pop,
                                                                      axis=1)

In [67]:
art_mostusers_sumweight.head(3)

Unnamed: 0_level_0,weight,userID,popularity
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,8012,12,6376.709415
6,5080,10,4566.121572
7,96201,133,90965.412683


In [68]:
# combine art_mostusers_sumweight with artists
popular_artists = pd.merge(artists, art_mostusers_sumweight,
                                  how='inner', left_index=True,
                                   right_index=True)
popular_artists.head(3)

Unnamed: 0_level_0,name,url,pictureURL,weight,userID,popularity
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2,Diary of Dreams,http://www.last.fm/music/Diary+of+Dreams,http://userserve-ak.last.fm/serve/252/3052066.jpg,8012,12,6376.709415
6,Moonspell,http://www.last.fm/music/Moonspell,http://userserve-ak.last.fm/serve/252/2181591.jpg,5080,10,4566.121572
7,Marilyn Manson,http://www.last.fm/music/Marilyn+Manson,http://userserve-ak.last.fm/serve/252/2558217.jpg,96201,133,90965.412683


In [69]:
popular_artists = popular_artists.sort_values('popularity', ascending=False)

# show 10 most popular artists on Last.FM 
popular_artists.head(10)

Unnamed: 0_level_0,name,url,pictureURL,weight,userID,popularity
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
289,Britney Spears,http://www.last.fm/music/Britney+Spears,http://userserve-ak.last.fm/serve/252/60126439...,2393140,522,2357076.0
89,Lady Gaga,http://www.last.fm/music/Lady+Gaga,http://userserve-ak.last.fm/serve/252/47390093...,1291387,611,1274748.0
72,Depeche Mode,http://www.last.fm/music/Depeche+Mode,http://userserve-ak.last.fm/serve/252/75022.jpg,1301308,282,1265518.0
292,Christina Aguilera,http://www.last.fm/music/Christina+Aguilera,http://userserve-ak.last.fm/serve/252/47363849...,1058405,407,1038078.0
498,Paramore,http://www.last.fm/music/Paramore,http://userserve-ak.last.fm/serve/252/35837991...,963449,399,944588.6
67,Madonna,http://www.last.fm/music/Madonna,http://userserve-ak.last.fm/serve/252/340387.jpg,921198,429,904405.8
288,Rihanna,http://www.last.fm/music/Rihanna,http://userserve-ak.last.fm/serve/252/53023109...,905423,484,890764.5
701,Shakira,http://www.last.fm/music/Shakira,http://userserve-ak.last.fm/serve/252/52116105...,688529,319,671780.2
227,The Beatles,http://www.last.fm/music/The+Beatles,http://userserve-ak.last.fm/serve/252/2588646.jpg,662116,480,651326.0
300,Katy Perry,http://www.last.fm/music/Katy+Perry,http://userserve-ak.last.fm/serve/252/42128121...,532545,473,523753.0
