#Building a song recommender


#Fire up GraphLab Create

In [1]:
import graphlab

#Load music data

In [2]:
song_data = graphlab.SFrame('song_data.gl/')

[INFO] This non-commercial license of GraphLab Create is assigned to anindya.saha@tamu.eduand will expire on September 21, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-14248 - Server binary: C:\Users\Anindya\AppData\Local\Dato\Dato Launcher\lib\site-packages\graphlab\unity_server.exe - Server log: C:\Users\Anindya\AppData\Local\Temp\graphlab_server_1443940361.log.0
[INFO] GraphLab Server Version: 1.6.1


#Explore data

Music data shows how many times a user listened to a song, as well as the details of the song.

In [3]:
song_data.head()

user_id,song_id,listen_count,title,artist
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOAKIMP12A8C130995,1,The Cove,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Paco De Lucia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBXHDL12A81C204C0,1,Stronger,Kanye West
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBYHAJ12A6701BF1D,1,Constellations,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODACBL12A8C13C273,1,Learn To Fly,Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODDNQT12A6D4F5F7E,5,Apuesta Por El Rock 'N' Roll ...,Héroes del Silencio
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODXRTY12AB0180F3B,1,Paper Gangsta,Lady GaGa
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOFGUAY12AB017B0A8,1,Stacked Actors,Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOFRQTD12A81C233C0,1,Sehr kosmisch,Harmonia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOHQWYZ12A6D4FA701,1,Heaven's gonna burn your eyes ...,Thievery Corporation feat. Emiliana Torrini ...

song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De Lucia ...
Stronger - Kanye West
Constellations - Jack Johnson ...
Learn To Fly - Foo Fighters ...
Apuesta Por El Rock 'N' Roll - Héroes del ...
Paper Gangsta - Lady GaGa
Stacked Actors - Foo Fighters ...
Sehr kosmisch - Harmonia
Heaven's gonna burn your eyes - Thievery ...


In [4]:
graphlab.canvas.set_target('ipynb')

##Count number of unique users in the dataset who listens to Kanye West

In [11]:
len(song_data[song_data['artist'] == 'Kanye West']['user_id'].unique())

2522

##Count number of unique users in the dataset who listens to Foo Fighters

In [12]:
len(song_data[song_data['artist'] == 'Foo Fighters']['user_id'].unique())

2055

##Count number of unique users in the dataset who listens to Taylor Swift

In [13]:
len(song_data[song_data['artist'] == 'Taylor Swift']['user_id'].unique())

3246

##Count number of unique users in the dataset who listens to Lady GaGa

In [14]:
len(song_data[song_data['artist'] == 'Lady GaGa']['user_id'].unique())

2928

## Using groupby-aggregate to find the most popular and least popular artist

In [18]:
song_data_grouped_by_artist = song_data.groupby(key_columns='artist', operations={'total_count':graphlab.aggregate.SUM('listen_count')})
song_data_grouped_by_artist = song_data_grouped_by_artist.sort('total_count', ascending=False)

### most popular artist

In [23]:
song_data_grouped_by_artist[0]

{'artist': 'Kings Of Leon', 'total_count': 43218L}

### least popular artist

In [24]:
song_data_grouped_by_artist[len(song_data_grouped_by_artist) - 1]

{'artist': 'William Tabbert', 'total_count': 14L}

#Using groupby-aggregate to find the most recommended songs

##Create a song recommender

In [25]:
train_data,test_data = song_data.random_split(.8,seed=0)

##Build a song recommender with personalization

We now create a model that allows us to make personalized recommendations to each user. 

In [29]:
item_similarity_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')

PROGRESS: Recsys training: model = item_similarity
PROGRESS:     To use one of these as a target column, set target = <column_name>
PROGRESS:     and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Data has 893580 observations with 66085 users and 9952 items.
PROGRESS:     Data prepared in: 1.13106s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 9952 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time    |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000            | 0.933053        |
PROGRESS: | 2000            | 1.01406         |
PROGRESS: | 3000            | 1.09306         |
PROGRESS: | 4000            | 1.17107         |
PROGRESS: | 5000            | 1.24307         |
PROGRESS: | 6000            | 1.31208         |
PROGRESS: | 7000            | 1.38008         |
PROGRESS: | 8000            | 1.46908         |
PROGRESS: | 9000          

We are going to make recommendations for the users in the test data, but there are over 200,000 unique users 
in the test set. Computing recommendations for these many users can be slow in some computers. 
Thus, we will use only the first 10,000 users only in this question. Using this command to select this subset of users:

In [31]:
subset_test_users = test_data['user_id'][0:10000]

###Applying the personalized model to get most recommended (k = 1) song for each of first 10,000 users in the test data

In [33]:
most_recommended_song_for_each_user = item_similarity_model.recommend(subset_test_users, k = 1)

PROGRESS: recommendations finished on 1000/10000 queries. users per second: 2066
PROGRESS: recommendations finished on 2000/10000 queries. users per second: 2129.8
PROGRESS: recommendations finished on 3000/10000 queries. users per second: 2118.52
PROGRESS: recommendations finished on 4000/10000 queries. users per second: 2135.49
PROGRESS: recommendations finished on 5000/10000 queries. users per second: 2151.34
PROGRESS: recommendations finished on 6000/10000 queries. users per second: 2161.26
PROGRESS: recommendations finished on 7000/10000 queries. users per second: 2165.05
PROGRESS: recommendations finished on 8000/10000 queries. users per second: 2171.43
PROGRESS: recommendations finished on 9000/10000 queries. users per second: 2172.21
PROGRESS: recommendations finished on 10000/10000 queries. users per second: 2158.31


In [34]:
most_recommended_song_for_each_user

user_id,song,score,rank
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,Meadowlarks - Fleet Foxes,0.024807244384,1
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,Meadowlarks - Fleet Foxes,0.024807244384,1
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,Meadowlarks - Fleet Foxes,0.024807244384,1
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,Meadowlarks - Fleet Foxes,0.024807244384,1
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,Meadowlarks - Fleet Foxes,0.024807244384,1
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,Meadowlarks - Fleet Foxes,0.024807244384,1
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,Meadowlarks - Fleet Foxes,0.024807244384,1
969cc6fb74e076a68e36a0440 9cb9d3765757508 ...,I Give You To His Heart - Alison Krauss ...,0.0199217228222,1
969cc6fb74e076a68e36a0440 9cb9d3765757508 ...,I Give You To His Heart - Alison Krauss ...,0.0199217228222,1
969cc6fb74e076a68e36a0440 9cb9d3765757508 ...,I Give You To His Heart - Alison Krauss ...,0.0199217228222,1


### Get most recommended song across all first 10,000 users in the test data
Finally, we can use .groupby() to find the most recommended song! We simply want to count how often each song is recommended, so we will use the COUNT aggregator, and store the results in a column called 'count' and use 'song' as the key to the aggregator 

By sorting the results, we can find out the most recommended song to the first 10,000 users in the test data!

In [36]:
most_recommended_song = most_recommended_song_for_each_user.groupby(key_columns='song', operations={'count':graphlab.aggregate.COUNT()})
most_recommended_song = most_recommended_song.sort('count', ascending=False)

In [37]:
most_recommended_song

song,count
Undo - Björk,423
Secrets - OneRepublic,385
You're The One - Dwight Yoakam ...,229
Revelry - Kings Of Leon,198
Hey_ Soul Sister - Train,165
Fireflies - Charttraxx Karaoke ...,146
Horn Concerto No. 4 in E flat K495: II. Romance ...,101
Sehr kosmisch - Harmonia,92
Dog Days Are Over (Radio Edit) - Florence + The ...,78
Mia - Emmy The Great,66
