#Building a song recommender


#Fire up GraphLab Create

In [2]:
import graphlab

#Load music data

In [3]:
song_data = graphlab.SFrame('song_data.gl/')



[INFO] This non-commercial license of GraphLab Create is assigned to ImNooBUD@gmail.com and will expire on January 16, 2017. For commercial licensing options, visit https://dato.com/buy/.



[INFO] Start server at: ipc:///tmp/graphlab_server-22512 - Server binary: /Library/Python/2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1454768831.log


[INFO] GraphLab Server Version: 1.8.1


#Explore data

Music data shows how many times a user listened to a song, as well as the details of the song.

In [None]:
song_data.head()

##Showing the most popular songs in the dataset

In [None]:
graphlab.canvas.set_target('ipynb')

In [None]:
song_data['song'].show()

In [None]:
len(song_data)

##Count number of unique users in the dataset

In [None]:
users = song_data['user_id'].unique()

In [None]:
len(users)

#Create a song recommender

In [34]:
train_data,test_data = song_data.random_split(.8,seed=0)

##Simple popularity-based recommender

In [None]:
popularity_model = graphlab.popularity_recommender.create(train_data,
                                                         user_id='user_id',
                                                         item_id='song')

###Use the popularity model to make some predictions

A popularity model makes the same prediction for all users, so provides no personalization.

In [None]:
popularity_model.recommend(users=[users[0]])

In [None]:
popularity_model.recommend(users=[users[1]])

##Build a song recommender with personalization

We now create a model that allows us to make personalized recommendations to each user. 

In [None]:
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')

###Applying the personalized model to make song recommendations

As you can see, different users get different recommendations now.

In [None]:
personalized_model.recommend(users=[users[0]])

In [None]:
personalized_model.recommend(users=[users[1]])

###We can also apply the model to find similar songs to any song in the dataset

In [None]:
personalized_model.get_similar_items(['With Or Without You - U2'])

In [None]:
personalized_model.get_similar_items(['Chan Chan (Live) - Buena Vista Social Club'])

#Quantitative comparison between the models

We now formally compare the popularity and the personalized models using precision-recall curves. 

In [None]:
if graphlab.version[:3] >= "1.6":
    model_performance = graphlab.compare(test_data, [popularity_model, personalized_model], user_sample=0.05)
    graphlab.show_comparison(model_performance,[popularity_model, personalized_model])
else:
    %matplotlib inline
    model_performance = graphlab.recommender.util.compare_models(test_data, [popularity_model, personalized_model], user_sample=.05)

The curve shows that the personalized model provides much better performance. 

In [6]:
(song_data['artist'] == 'Kanye West').sum()

3775

In [8]:
(song_data['artist'] == 'Foo Fighters').sum()

3429

In [11]:
(song_data['artist'] == 'Taylor Swift').sum()

6227

In [12]:
(song_data['artist'] == 'Lady GaGa').sum()

4129

In [None]:
from graphlab import aggregate as agg

In [23]:
artists_pop = \
    song_data.groupby('artist', operations={'total_listening': agg.SUM('listen_count')})

In [28]:
artists_pop.sort('total_listening',
                 ascending=False)

artist,total_listening
Kings Of Leon,43218
Dwight Yoakam,40619
Björk,38889
Coldplay,35362
Florence + The Machine,33387
Justin Bieber,29715
Alliance Ethnik,26689
OneRepublic,25754
Train,25402
The Black Keys,22184


In [33]:
artists_pop.sort('total_listening', ascending=True)

artist,total_listening
William Tabbert,14
Reel Feelings,24
Beyoncé feat. Bun B and Slim Thug ...,26
Diplo,30
Boggle Karaoke,30
harvey summers,31
Nâdiya,36
Kanye West / Talib Kweli / Q-Tip / Common / ...,38
Aneta Langerova,38
Jody Bernal,38


In [38]:
personalized_model = graphlab.item_similarity_recommender.create(train_data, user_id='user_id', item_id='song')

PROGRESS: Recsys training: model = item_similarity
PROGRESS:     To use one of these as a target column, set target = <column_name>
PROGRESS:     and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Data has 893580 observations with 66085 users and 9952 items.
PROGRESS:     Data prepared in: 3.22369s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 9952 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time    |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000            | 3.92109         |
PROGRESS: | 2000            | 4.20795         |
PROGRESS: | 3000            | 4.43116         |
PROGRESS: | 4000            | 4.58437         |
PROGRESS: | 5000            | 4.69488         |
PROGRESS: | 6000            | 4.95266         |
PROGRESS: | 7000            | 5.08086         |
PROGRESS: | 8000            | 5.47291         |
PROGRESS: | 9000          

In [None]:
subset_test_users = test_data['user_id'].unique()[0:10000]

In [None]:
recom_songs = personalized_model.recommend(subset_test_users, k=1)

In [45]:
agg_recom_songs = recom_songs.groupby('song', {'recom_amount':agg.COUNT()})

In [48]:
agg_recom_songs.sort('recom_amount', ascending=False)

song,recom_amount
Undo - Björk,431
Secrets - OneRepublic,382
Revelry - Kings Of Leon,231
You're The One - Dwight Yoakam ...,170
Fireflies - Charttraxx Karaoke ...,122
Hey_ Soul Sister - Train,107
Horn Concerto No. 4 in E flat K495: II. Romance ...,98
Sehr kosmisch - Harmonia,71
OMG - Usher featuring will.i.am ...,58
Dog Days Are Over (Radio Edit) - Florence + The ...,54
