# Recommending songs Assignment

In [1]:
import turicreate

## Load some music data

In [3]:
song_data = turicreate.SFrame('../../data/song_data.sframe/')

In [4]:
song_data

user_id,song_id,listen_count,title,artist
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOAKIMP12A8C130995,1,The Cove,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Paco De Lucia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBXHDL12A81C204C0,1,Stronger,Kanye West
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBYHAJ12A6701BF1D,1,Constellations,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODACBL12A8C13C273,1,Learn To Fly,Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODDNQT12A6D4F5F7E,5,Apuesta Por El Rock 'N' Roll ...,Héroes del Silencio
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SODXRTY12AB0180F3B,1,Paper Gangsta,Lady GaGa
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOFGUAY12AB017B0A8,1,Stacked Actors,Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOFRQTD12A81C233C0,1,Sehr kosmisch,Harmonia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOHQWYZ12A6D4FA701,1,Heaven's gonna burn your eyes ...,Thievery Corporation feat. Emiliana Torrini ...

song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De Lucia ...
Stronger - Kanye West
Constellations - Jack Johnson ...
Learn To Fly - Foo Fighters ...
Apuesta Por El Rock 'N' Roll - Héroes del ...
Paper Gangsta - Lady GaGa
Stacked Actors - Foo Fighters ...
Sehr kosmisch - Harmonia
Heaven's gonna burn your eyes - Thievery ...


## Count the unique users who listened to specific artists

In [7]:
artists = ['Kanye West', 'Foo Fighters', 'Taylor Swift', 'Lady GaGa']

In [14]:
for artist in artists:
    song_data_for_artist = song_data[song_data['artist'] == artist]
    users_for_artist = song_data_for_artist['user_id'].unique()
    print(artist + ': ' + str(len(users_for_artist)))

Kanye West: 2522
Foo Fighters: 2055
Taylor Swift: 3246
Lady GaGa: 2928


## Using groupby-aggregate to find the most popular and least popular artist

In [17]:
grouped = song_data.groupby(key_column_names='artist', operations={'total_count': turicreate.aggregate.SUM('listen_count')})

### Most popular artist

In [27]:
grouped.sort('total_count',ascending=False)

artist,total_count
Kings Of Leon,43218
Dwight Yoakam,40619
Björk,38889
Coldplay,35362
Florence + The Machine,33387
Justin Bieber,29715
Alliance Ethnik,26689
OneRepublic,25754
Train,25402
The Black Keys,22184


### Least popular artist

In [28]:
grouped.sort('total_count')

artist,total_count
William Tabbert,14
Reel Feelings,24
Beyoncé feat. Bun B and Slim Thug ...,26
Diplo,30
Boggle Karaoke,30
harvey summers,31
Nâdiya,36
Kanye West / Talib Kweli / Q-Tip / Common / ...,38
Jody Bernal,38
Aneta Langerova,38


## Using groupby-aggregate to find the most recommended songs

### We first need to create a personalized reccomendation model

#### Split the data

In [30]:
train_data,test_data = song_data.random_split(.8,seed=0)

#### Build a recommender model with personalization

In [31]:
item_similarity_recommender = turicreate.item_similarity_recommender.create(train_data,
                                                                  user_id = 'user_id',
                                                                  item_id = 'song')

### Get a subset of users (since they are too many)

In [32]:
subset_test_users = test_data['user_id'].unique()[0:10000]

### Let’s compute one recommended song for each of these test users. 

In [34]:
item_similarity_recommender.recommend(subset_test_users,k=1)

user_id,song,score,rank
c067c22072a17d33310d7223d 7b79f819e48cf42 ...,Grind With Me (Explicit Version) - Pretty Ricky ...,0.0459424376487731,1
696787172dd3f5169dc94deef 97e427cee86147d ...,Senza Una Donna (Without A Woman) - Zucchero / ...,0.0170265776770455,1
532e98155cbfd1e1a474a28ed 96e59e50f7c5baf ...,Jive Talkin' (Album Version) - Bee Gees ...,0.0118288653237479,1
18325842a941bc58449ee71d6 59a08d1c1bd2383 ...,Goodnight And Goodbye - Jonas Brothers ...,0.0159257985651493,1
507433946f534f5d25ad1be30 2edb9a2376f503c ...,Find The Cost Of Freedom - Crosby_ Stills_ Nash & ...,0.0165806589303193,1
18fafad477f9d72ff86f7d0bd 838a6573de0f64a ...,Rabbit Heart (Raise It Up) - Florence + The ...,0.0799399726092815,1
fe85b96ba1983219b296f6b48 69dd29eb2b72ff9 ...,Secrets - OneRepublic,0.0788827141125996,1
225ea420b4bede50919d1bfe2 4a599691522d176 ...,Clocks - Coldplay,0.0271030251796428,1
95dc7e2b188b1148b2d25f4e6 b6e94afacc4efc3 ...,Bust a Move - Infected Mushroom ...,0.0534738540649414,1
4a3a1ae2748f12f7ab921a47d 6d79abf82e3e325 ...,Isis (Spam Remix) - Alaska Y Dinarama ...,0.0418030211800023,1


### Finally, we can use .groupby() to find the most recommended song!

In [40]:
grouped = song_data.groupby(key_column_names='song', operations={'count': turicreate.aggregate.COUNT()})

In [42]:
grouped.sort('count',ascending=False)

song,count
Sehr kosmisch - Harmonia,5970
Undo - Björk,5281
You're The One - Dwight Yoakam ...,4806
Dog Days Are Over (Radio Edit) - Florence + The ...,4536
Revelry - Kings Of Leon,4339
Horn Concerto No. 4 in E flat K495: II. Romance ...,3949
Secrets - OneRepublic,3916
Tive Sim - Cartola,3185
Fireflies - Charttraxx Karaoke ...,3171
Hey_ Soul Sister - Train,3132
