# Recommending songs
In this module, we focused on building recommender systems to find products, music and movies that interest users. We also built an exciting Jupyter notebook for recommending songs, which we compared the simple popularity-based recommendation with a personalized model, and which showed the significant improvement provided by personalization.

In this assignment, we are going to explore the song data and the recommendations made by our model. **In the process, you are going to learn how to use one of the most important data manipulation primitives: groupby.**

Follow the rest of the instructions on this page to complete your program. When you are done, instead of uploading your code, you will answer a series of quiz questions (see the quiz after this reading) to document your completion of this assignment. The instructions will indicate what data to collect for answering the quiz.

### Learning outcomes
- Execute song recommendation code with the Jupyter notebook
- Load and transform real, song data
- Build a song recommender model
- Use the model to recommend songs to individual users
- Use groupby to compute aggregate statistics of the data

In [1]:
import turicreate

In [3]:
songs = turicreate.SFrame('song_data.sframe/')
songs

user_id,song_id,listen_count,title,artist
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOAKIMP12A8C130995,1,The Cove,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Paco De Lucia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBXHDL12A81C204C0,1,Stronger,Kanye West
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBYHAJ12A6701BF1D,1,Constellations,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODACBL12A8C13C273,1,Learn To Fly,Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODDNQT12A6D4F5F7E,5,Apuesta Por El Rock 'N' Roll ...,Héroes del Silencio
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODXRTY12AB0180F3B,1,Paper Gangsta,Lady GaGa
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOFGUAY12AB017B0A8,1,Stacked Actors,Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOFRQTD12A81C233C0,1,Sehr kosmisch,Harmonia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOHQWYZ12A6D4FA701,1,Heaven's gonna burn your eyes ...,Thievery Corporation feat. Emiliana Torrini ...

song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De Lucia ...
Stronger - Kanye West
Constellations - Jack Johnson ...
Learn To Fly - Foo Fighters ...
Apuesta Por El Rock 'N' Roll - Héroes del ...
Paper Gangsta - Lady GaGa
Stacked Actors - Foo Fighters ...
Sehr kosmisch - Harmonia
Heaven's gonna burn your eyes - Thievery ...


# 1. Counting unique users:
The method .unique() can be used to select the unique elements in a column of data. In this question, you will compute the number of unique users who have listened to songs by various artists. For example, to find out the number of unique users who listened to songs by 'Kanye West', all you need to do is 
1. select the rows of the song data where the artist is 'Kanye West' 
2. then count the number of unique entries in the ‘user_id’ column. 
3. Compute the number of unique users for each of these artists: 'Kanye West', 'Foo Fighters', 'Taylor Swift' and 'Lady GaGa'. 

Save these results to answer the quiz at the end.

In [4]:
songs_kanye = songs[songs['artist'] == 'Kanye West']

In [9]:
songs_kanye_users = songs_kanye['user_id'].unique()

In [10]:
len(songs_kanye_users)

2522

In [11]:
len(songs['user_id'][songs['artist'] == 'Foo Fighters'].unique())

2055

In [22]:
def get_num_users(artist):
    return str(len(songs['user_id'][songs['artist'] == artist].unique())) + " users are listening to %s" % artist

In [23]:
get_num_users('Taylor Swift')

'3246 users are listening to Taylor Swift'

In [24]:
get_num_users('Lady GaGa')

'2928 users are listening to Lady GaGa'

# 2. Using groupby-aggregate to find the most popular and least popular artist
Each row of song_data contains the number of times a user listened to particular song by a particular artist. If we would like to know how many times any song by 'Kanye West' was listened to, we need to select all the rows where ‘artist’=='Kanye West' and sum the ‘listen_count’ column. If we would like to find the most popular artist, we would need to follow this procedure for each artist, which would be very slow. Instead, you will learn about a very important method: groupby()

In [26]:
songs_groupby = songs.groupby(key_column_names= 'artist',
                              operations={'total_listen_count': turicreate.aggregate.SUM('listen_count')})
songs_groupby.head()

artist,total_listen_count
The Dells,274
16Volt,579
The Stray Cats,411
Billy Preston / Syreeta,189
Emma Shapplin,252
Lil Jon & The East Side Boyz / Ludacris / Usher ...,256
Spoon,1061
Sam & Dave,656
Blue Swede,266
Scooter,1202


In [27]:
songs_groupby.sort('total_listen_count', ascending = False)

artist,total_listen_count
Kings Of Leon,43218
Dwight Yoakam,40619
Björk,38889
Coldplay,35362
Florence + The Machine,33387
Justin Bieber,29715
Alliance Ethnik,26689
OneRepublic,25754
Train,25402
The Black Keys,22184


In [28]:
songs_groupby.sort('total_listen_count', ascending = True)

artist,total_listen_count
William Tabbert,14
Reel Feelings,24
Beyoncé feat. Bun B and Slim Thug ...,26
Boggle Karaoke,30
Diplo,30
harvey summers,31
Nâdiya,36
Jody Bernal,38
Aneta Langerova,38
Kanye West / Talib Kweli / Q-Tip / Common / ...,38


# 3. [OPTIONAL] Using groupby-aggregate to find the most recommended songs
Now that we learned how to use .groupby() to compute aggregates for each value in a column, let’s use to find the song that is most recommended by the personalized_model model we learned in the Jupyter notebook above. Follow these steps to find the most recommended song:
- Split the data into 80% training, 20% testing, using seed=0, as was done in the Jupyter notebook above.
- Train an item_similarity_recommender, as done in the Jupyter notebook, using the training data.
- Next, we are going to make recommendations for the users in the test data, but there are over 200,000 users (58,628 unique users) in the test set. Computing recommendations for these many users can be slow in some computers. 


Thus, we will use only the first 10,000 users only in this question. Using this command to select this subset of users:

In [30]:
train_data, test_data = songs.random_split(0.8, seed = 0)

In [32]:
personalised_model = turicreate.item_similarity_recommender.create(train_data,
                                                           user_id = 'user_id',
                                                           item_id = 'song')

In [33]:
subset_test_users = test_data['user_id'].unique()[0:10000]

- Let’s compute one recommended song for each of these test users. Use this command to compute these recommendations:

In [34]:
personalised_model_recommend = personalised_model.recommend(subset_test_users, k = 1)
#Only get the most similar songs
personalised_model_recommend.head()

user_id,song,score,rank
c067c22072a17d33310d7223d 7b79f819e48cf42 ...,Grind With Me (Explicit Version) - Pretty Ricky ...,0.0459424376487731,1
696787172dd3f5169dc94deef 97e427cee86147d ...,Senza Una Donna (Without A Woman) - Zucchero / ...,0.0170265776770455,1
532e98155cbfd1e1a474a28ed 96e59e50f7c5baf ...,Jive Talkin' (Album Version) - Bee Gees ...,0.0118288653237479,1
18325842a941bc58449ee71d6 59a08d1c1bd2383 ...,Goodnight And Goodbye - Jonas Brothers ...,0.0159257985651493,1
507433946f534f5d25ad1be30 2edb9a2376f503c ...,Find The Cost Of Freedom - Crosby_ Stills_ Nash & ...,0.0165806589303193,1
18fafad477f9d72ff86f7d0bd 838a6573de0f64a ...,Rabbit Heart (Raise It Up) - Florence + The ...,0.0799399726092815,1
fe85b96ba1983219b296f6b48 69dd29eb2b72ff9 ...,Secrets - OneRepublic,0.0788827141125996,1
225ea420b4bede50919d1bfe2 4a599691522d176 ...,Clocks - Coldplay,0.0271030251796428,1
95dc7e2b188b1148b2d25f4e6 b6e94afacc4efc3 ...,Bust a Move - Infected Mushroom ...,0.0534738540649414,1
4a3a1ae2748f12f7ab921a47d 6d79abf82e3e325 ...,Isis (Spam Remix) - Alaska Y Dinarama ...,0.0418030211800023,1


- Finally, we can use .groupby() to find the most recommended song! :) When we used .groupby() in the previous question, we summed up the total ‘listen_count’ for each artist, by setting the parameter SUM in the aggregator. 

For this question, we simply want to count **how often each song is recommended**, so we will use the COUNT aggregator instead of SUM, and store the results in a column we will call ‘count’ by using:

In [35]:
personailised_model_recommend_groupby = personalised_model_recommend.groupby(key_column_names='song',
                                                                             operations = {'total_count': turicreate.aggregate.COUNT()})
personailised_model_recommend_groupby.head()

song,total_count
Arco Arena - Cake,1
Too Deep - Girl Talk,2
Guys Like Me - Eric Church ...,2
Freedom - Akon,2
Nomenclature - Andrew Bird ...,1
Wish You Were Here - Incubus ...,1
Change - Blind Melon,1
Get:On - Moguai,1
Pitter-Pat - Erin McCarley ...,3
Dog Days Are Over (Radio Edit) - Florence + The ...,28


- By sorting the results, you will find out the most recommended song to the first 10,000 users in the test data! **Due to randomness in train-test split, the most recommended song may come out differently for different people. This is why we chose not to assign a quiz question for this section.**

In [37]:
personailised_model_recommend_groupby.sort('total_count', ascending = False)

song,total_count
Undo - Björk,426
Secrets - OneRepublic,381
Revelry - Kings Of Leon,227
You're The One - Dwight Yoakam ...,161
Fireflies - Charttraxx Karaoke ...,110
Sehr kosmisch - Harmonia,105
Hey_ Soul Sister - Train,96
Horn Concerto No. 4 in E flat K495: II. Romance ...,91
OMG - Usher featuring will.i.am ...,60
Bigger - Justin Bieber,43
