# Building a song recommender


# Fire up GraphLab Create
(See [Getting Started with SFrames](../Week%201/Getting%20Started%20with%20SFrames.ipynb) for setup instructions)

In [1]:
import graphlab

# Load music data

In [2]:
song_data = graphlab.SFrame('song_data.gl/')

2016-06-12 21:35:35,331 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.9 started. Logging: /tmp/graphlab_server_1465767334.log


This non-commercial license of GraphLab Create is assigned to jzhao59@illinois.edu and will expire on June 11, 2017. For commercial licensing options, visit https://dato.com/buy/.


# Explore data

Music data shows how many times a user listened to a song, as well as the details of the song.

In [3]:
song_data[:2]

user_id,song_id,listen_count,title,artist
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOAKIMP12A8C130995,1,The Cove,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Paco De Lucia

song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De Lucia ...


## Count number of unique users in the dataset

In [4]:
users = song_data['user_id'].unique()

In [5]:
len(users)

66346

# Question 1

In [6]:
singer_name = ['Kanye West', 'Foo Fighters', 'Taylor Swift', 'Lady GaGa']
for name in singer_name:
    data = song_data[song_data['artist'] == name]['user_id'].unique()
    print(name + '\t' + str(len(data)))

Kanye West	2522
Foo Fighters	2055
Taylor Swift	3246
Lady GaGa	2928


# Question 2

In [8]:
group = song_data.groupby(key_columns='artist', operations={'total_count': graphlab.aggregate.SUM('listen_count')})

In [11]:
group.sort('total_count', False)

artist,total_count
Kings Of Leon,43218
Dwight Yoakam,40619
Björk,38889
Coldplay,35362
Florence + The Machine,33387
Justin Bieber,29715
Alliance Ethnik,26689
OneRepublic,25754
Train,25402
The Black Keys,22184


In [12]:
group.sort('total_count', False)

artist,total_count
William Tabbert,14
Reel Feelings,24
Beyoncé feat. Bun B and Slim Thug ...,26
Diplo,30
Boggle Karaoke,30
harvey summers,31
Nâdiya,36
Jody Bernal,38
Kanye West / Talib Kweli / Q-Tip / Common / ...,38
Aneta Langerova,38


# Create a song recommender

In [13]:
train_data,test_data = song_data.random_split(.8,seed=0)

## Build a song recommender with personalization

We now create a model that allows us to make personalized recommendations to each user. 

In [14]:
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')

### Applying the personalized model to make song recommendations

As you can see, different users get different recommendations now.

In [15]:
subset_test_users = test_data['user_id'].unique()[0:10000]

In [16]:
recommend = personalized_model.recommend(subset_test_users,k=1)

In [19]:
result = recommend.groupby(key_columns='song', operations={'count': graphlab.aggregate.COUNT()})

In [20]:
result.sort('count', False)

song,count
Secrets - OneRepublic,404
Undo - Björk,393
Revelry - Kings Of Leon,211
You're The One - Dwight Yoakam ...,183
Fireflies - Charttraxx Karaoke ...,119
Hey_ Soul Sister - Train,116
Horn Concerto No. 4 in E flat K495: II. Romance ...,93
Sehr kosmisch - Harmonia,77
OMG - Usher featuring will.i.am ...,65
Dog Days Are Over (Radio Edit) - Florence + The ...,56
