#Building a song recommender


#Fire up GraphLab Create

In [1]:
import graphlab

A newer version of GraphLab Create (v1.8.3) is available! Your current version is v1.8.2.

You can use pip to upgrade the graphlab-create package. For more information see https://dato.com/products/create/upgrade.


#Load music data

In [2]:
song_data = graphlab.SFrame('song_data.gl/')

[INFO] GraphLab Create v1.8.2 started. Logging: C:\Users\PROID_~1\AppData\Local\Temp\graphlab_server_1456491803.log.0


#Explore data

Music data shows how many times a user listened to a song, as well as the details of the song.

In [4]:
song_data.head()

user_id,song_id,listen_count,title,artist
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOAKIMP12A8C130995,1,The Cove,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Paco De Lucia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBXHDL12A81C204C0,1,Stronger,Kanye West
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBYHAJ12A6701BF1D,1,Constellations,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODACBL12A8C13C273,1,Learn To Fly,Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODDNQT12A6D4F5F7E,5,Apuesta Por El Rock 'N' Roll ...,Héroes del Silencio
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODXRTY12AB0180F3B,1,Paper Gangsta,Lady GaGa
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOFGUAY12AB017B0A8,1,Stacked Actors,Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOFRQTD12A81C233C0,1,Sehr kosmisch,Harmonia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOHQWYZ12A6D4FA701,1,Heaven's gonna burn your eyes ...,Thievery Corporation feat. Emiliana Torrini ...

song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De Lucia ...
Stronger - Kanye West
Constellations - Jack Johnson ...
Learn To Fly - Foo Fighters ...
Apuesta Por El Rock 'N' Roll - Héroes del ...
Paper Gangsta - Lady GaGa
Stacked Actors - Foo Fighters ...
Sehr kosmisch - Harmonia
Heaven's gonna burn your eyes - Thievery ...


##Showing the most popular songs in the dataset

In [3]:
graphlab.canvas.set_target('ipynb')

In [None]:
song_data['song'].show()

In [None]:
len(song_data)

##Count number of unique users in the dataset

In [None]:
users = song_data['user_id'].unique()

In [14]:
s1=song_data[song_data['artist']=='Kanye West']
len(s1['user_id'].unique())

2522

In [13]:
s2=song_data[song_data['artist']=='Foo Fighters']
len(s2['user_id'].unique())

2055

In [15]:
s3=song_data[song_data['artist']=='Taylor Swift']
len(s3['user_id'].unique())

3246

In [16]:
s4=song_data[song_data['artist']=='Lady GaGa']
len(s4['user_id'].unique())

2928

In [None]:
len(users)

In [18]:
artists=song_data.groupby(key_columns='artist', operations={'total_count': graphlab.aggregate.SUM('listen_count')})

In [21]:
artists.sort('total_count',ascending=False).print_rows(30)

+-------------------------------+-------------+
|             artist            | total_count |
+-------------------------------+-------------+
|         Kings Of Leon         |    43218    |
|         Dwight Yoakam         |    40619    |
|             Björk             |    38889    |
|            Coldplay           |    35362    |
|     Florence + The Machine    |    33387    |
|         Justin Bieber         |    29715    |
|        Alliance Ethnik        |    26689    |
|          OneRepublic          |    25754    |
|             Train             |    25402    |
|         The Black Keys        |    22184    |
| Barry Tuckwell/Academy of ... |    21953    |
|            Harmonia           |    21646    |
|             Eminem            |    21627    |
|          Linkin Park          |    21368    |
|           Metallica           |    20336    |
|              Muse             |    20220    |
|          Jack Johnson         |    20177    |
|          Taylor Swift         |    193

In [23]:
artists.sort('total_count',ascending=True).print_rows(30)

+-------------------------------+-------------+
|             artist            | total_count |
+-------------------------------+-------------+
|        William Tabbert        |      14     |
|         Reel Feelings         |      24     |
| Beyoncé feat. Bun B and Sl... |      26     |
|             Diplo             |      30     |
|         Boggle Karaoke        |      30     |
|         harvey summers        |      31     |
|             Nâdiya            |      36     |
| Kanye West / Talib Kweli /... |      38     |
|        Aneta Langerova        |      38     |
|          Jody Bernal          |      38     |
|          John Altman          |      39     |
|           Trademark           |      40     |
|   Lloyd / Ashanti / Scarface  |      42     |
| Yung Joc feat Trick Daddy_... |      42     |
|           Deadmau 5           |      43     |
|             Tandem            |      43     |
|        Light This City        |      44     |
|         Elvis Perkins         |      4

#Create a song recommender

In [None]:
train_data,test_data = song_data.random_split(.8,seed=0)

##Simple popularity-based recommender

In [None]:
popularity_model = graphlab.popularity_recommender.create(train_data,
                                                         user_id='user_id',
                                                         item_id='song')

###Use the popularity model to make some predictions

A popularity model makes the same prediction for all users, so provides no personalization.

In [None]:
popularity_model.recommend(users=[users[0]])

In [None]:
popularity_model.recommend(users=[users[1]])

##Build a song recommender with personalization

We now create a model that allows us to make personalized recommendations to each user. 

In [None]:
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')

###Applying the personalized model to make song recommendations

As you can see, different users get different recommendations now.

In [None]:
personalized_model.recommend(users=[users[0]])

In [None]:
personalized_model.recommend(users=[users[1]])

###We can also apply the model to find similar songs to any song in the dataset

In [None]:
personalized_model.get_similar_items(['With Or Without You - U2'])

In [None]:
personalized_model.get_similar_items(['Chan Chan (Live) - Buena Vista Social Club'])

#Quantitative comparison between the models

We now formally compare the popularity and the personalized models using precision-recall curves. 

In [None]:
if graphlab.version[:3] >= "1.6":
    model_performance = graphlab.compare(test_data, [popularity_model, personalized_model], user_sample=0.05)
    graphlab.show_comparison(model_performance,[popularity_model, personalized_model])
else:
    %matplotlib inline
    model_performance = graphlab.recommender.util.compare_models(test_data, [popularity_model, personalized_model], user_sample=.05)

The curve shows that the personalized model provides much better performance. 

In [25]:
train_data,test_data = song_data.random_split(.8,seed=0)

In [26]:
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')

In [27]:
subset_test_users = test_data['user_id'].unique()[0:10000]

In [32]:
song_new=personalized_model.recommend(subset_test_users,k=1)

In [33]:
song_recommand=song_new.groupby(key_columns='song', operations={'count': graphlab.aggregate.COUNT('song')})

In [34]:
song_recommand.sort("count",ascending=False)

song,count
Undo - Björk,430
Secrets - OneRepublic,382
Revelry - Kings Of Leon,232
You're The One - Dwight Yoakam ...,169
Fireflies - Charttraxx Karaoke ...,122
Hey_ Soul Sister - Train,105
Horn Concerto No. 4 in E flat K495: II. Romance ...,98
Sehr kosmisch - Harmonia,75
OMG - Usher featuring will.i.am ...,58
Dog Days Are Over (Radio Edit) - Florence + The ...,54
