# Simple Recommender System: Last.FM Dataset
## Recommender based on artist popularity

Reference: 2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011). I. Cantod, P Brusilovsky, T. Kuflik. Proceedings of the 5th ACM conference on Recommender systems.<br>
https://grouplens.org/datasets/hetrec-2011/<br>

In [1]:
import pandas as pd
import numpy as np

In [2]:
# opening artist data as pandas dataframe
artists = pd.read_csv('artists.dat',
                      delimiter='\t', low_memory=False)
artists.head(3)

Unnamed: 0,id,name,url,pictureURL
0,1,MALICE MIZER,http://www.last.fm/music/MALICE+MIZER,http://userserve-ak.last.fm/serve/252/10808.jpg
1,2,Diary of Dreams,http://www.last.fm/music/Diary+of+Dreams,http://userserve-ak.last.fm/serve/252/3052066.jpg
2,3,Carpathian Forest,http://www.last.fm/music/Carpathian+Forest,http://userserve-ak.last.fm/serve/252/40222717...


In [3]:
# changing column title from id to artistID to match user_artists dataframe
artists = artists.rename(columns = {'id':'artistID'})
# setting artistID as index 
artists = artists.set_index('artistID')
artists.head(3)

Unnamed: 0_level_0,name,url,pictureURL
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,MALICE MIZER,http://www.last.fm/music/MALICE+MIZER,http://userserve-ak.last.fm/serve/252/10808.jpg
2,Diary of Dreams,http://www.last.fm/music/Diary+of+Dreams,http://userserve-ak.last.fm/serve/252/3052066.jpg
3,Carpathian Forest,http://www.last.fm/music/Carpathian+Forest,http://userserve-ak.last.fm/serve/252/40222717...


In [4]:
print('Total number of artistID = {}'.format(len(artists)))

Total number of artistID = 17632


In [5]:
# opening user_artist data as pandas dataframe
user_artists = pd.read_csv('user_artists.dat',
                      delimiter='\t', low_memory=False)

In [6]:
# weight column corresponds to listening count
user_artists.head(5)

Unnamed: 0,userID,artistID,weight
0,2,51,13883
1,2,52,11690
2,2,53,11351
3,2,54,10300
4,2,55,8983


### Weighting Formula Used
Weighted Popularity (WP) = (u/(u+m).R)+(m/(u+m).A)<br>
where,<br>
u corresponds to the number of users listening to a particular artist,<br>
m corresponds to the minimum number of users listening to any one artist,<br>
R corresponds to the average number of times artist was listened, and<br>
A corresponds to the average number of times any artist was listened to across all artists.

Weighted popularity takes into account not only how many times a artist is listened to, but also the number of users that listen to any particular artist.

In [7]:
# calculating A
art_sumweight = user_artists.groupby('artistID').sum()
art_sumweight = art_sumweight.drop(['userID'], axis=1)
art_sumweight.head(5)

Unnamed: 0_level_0,weight
artistID,Unnamed: 1_level_1
1,771
2,8012
3,775
4,563
5,913


In [8]:
A = art_sumweight['weight'].mean()
print('The average number of times any artist was listened to across all artists (A) = {}'.format(int(A)))

The average number of times any artist was listened to across all artists (A) = 3923


In [9]:
# calculating m
numuser_art = user_artists.groupby('artistID').count()
numuser_art.head(5)

Unnamed: 0_level_0,userID,weight
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,3,3
2,12,12
3,3,3
4,2,2
5,2,2


In [10]:
# m was calculated considering a quantile of 0.9
m = numuser_art['userID'].quantile(0.90)
print('The minimum number of users listening to any one artist (m) = {}'.format(int(m)))

The minimum number of users listening to any one artist (m) = 8


In [11]:
# filtering artists that most users listen to
art_mostusers = numuser_art.copy().loc[numuser_art['userID'] >= m]
art_mostusers = art_mostusers.drop(['weight'], axis=1)
art_mostusers.shape
art_mostusers.head(3)

Unnamed: 0_level_0,userID
artistID,Unnamed: 1_level_1
2,12
6,10
7,133


In [12]:
# combining total weight per artist (art_sumweight) with 
# dataframe of artists most users listen to (art_mostusers)
art_mostusers_sumweight = pd.merge(art_sumweight, art_mostusers,
                                  how='inner', left_index=True,
                                   right_index=True)
art_mostusers_sumweight.head(3)

Unnamed: 0_level_0,weight,userID
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1
2,8012,12
6,5080,10
7,96201,133


In [13]:
# function computes the weighted popularity of each artist
def weighted_pop(df, m=m, A=A):
    u = df['userID'] 
    R = df['weight'] 
    return (u/(u+m)*R)+(m/(m+u)*A)

In [14]:
# computing the weighted popularity for combined dataframe (art_mostusers_sumweight)
art_mostusers_sumweight['popularity'] = art_mostusers_sumweight.apply(weighted_pop,
                                                                      axis=1)
art_mostusers_sumweight.head(3)

Unnamed: 0_level_0,weight,userID,popularity
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,8012,12,6376.709415
6,5080,10,4566.121572
7,96201,133,90965.412683


In [15]:
# combining weighted popularity (art_mostusers_sumweight) with artist dataframe
popular_artists = pd.merge(artists, art_mostusers_sumweight,
                           how='inner', left_index=True,
                           right_index=True)
popular_artists.head(3)

Unnamed: 0_level_0,name,url,pictureURL,weight,userID,popularity
artistID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2,Diary of Dreams,http://www.last.fm/music/Diary+of+Dreams,http://userserve-ak.last.fm/serve/252/3052066.jpg,8012,12,6376.709415
6,Moonspell,http://www.last.fm/music/Moonspell,http://userserve-ak.last.fm/serve/252/2181591.jpg,5080,10,4566.121572
7,Marilyn Manson,http://www.last.fm/music/Marilyn+Manson,http://userserve-ak.last.fm/serve/252/2558217.jpg,96201,133,90965.412683


In [19]:
# ordering dataframe according to artist popularity
popular_artists = popular_artists.sort_values('popularity', ascending=False)

# showing 10 most popular artists on Last.FM 
print('Top 10 artists are: ')
print(popular_artists['name'].iloc[:10])

Top 10 artists are: 
artistID
289        Britney Spears
89              Lady Gaga
72           Depeche Mode
292    Christina Aguilera
498              Paramore
67                Madonna
288               Rihanna
701               Shakira
227           The Beatles
300            Katy Perry
Name: name, dtype: object
