# Building a Song Recommender

## Fire Up GraphLab Create
(See [Getting Started with SFrames](../../week-1/work/Getting-Started-With-SFrames.ipynb) for setup instructions)

In [11]:
# import graphlab

# Use Pandas
import pandas as pd
# Use NumPy
import numpy as np
# Use Seaborn
import seaborn as sns

In [1]:
# Limit number of worker processes. This preserves system memory, which prevents hosted notebooks from crashing.
# graphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 4)

# Load Music Data

In [2]:
# song_data = graphlab.SFrame('song_data.gl/')
song_data = pd.read_csv('song_data.csv')

# Explore Data

Music data shows how many times a user listened to a song, as well as the details of the song.

In [3]:
song_data.head()

Unnamed: 0,user_id,song_id,listen_count,title,artist,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Jack Johnson,The Cove - Jack Johnson
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Paco De Lucia,Entre Dos Aguas - Paco De Lucia
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Kanye West,Stronger - Kanye West
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,Jack Johnson,Constellations - Jack Johnson
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,Foo Fighters,Learn To Fly - Foo Fighters


## Showing The Most Popular Songs in the Data Set

In [None]:
# graphlab.canvas.set_target('ipynb')

In [7]:
# song_data['song'].show()

# Use seaborn to plot histagram
# sns.histplot(song_data['song'])

In [8]:
len(song_data)

1116609

## Count Number of Unique Users in the Data Set

In [9]:
users = song_data['user_id'].unique()

In [10]:
len(users)

66346

# Create a Song Recommender

In [13]:
# train_data,test_data = song_data.random_split(.8,seed=0)

# Split the data using scikit learn
# Use scikit learn
from sklearn.model_selection import train_test_split
# Split the data into `train_data` (0.8) and `test_data` (0.2)
train_data, test_data = train_test_split(song_data, test_size=0.2)

## Simple Popularity-Based Recommender

In [None]:
# popularity_model = graphlab.popularity_recommender.create(train_data,
#                                                          user_id='user_id',
#                                                          item_id='song')

### Use the Popularity Model to Make Some Oredictions

A popularity model makes the same prediction for all users, so provides no personalization.

In [None]:
popularity_model.recommend(users=[users[0]])

In [None]:
popularity_model.recommend(users=[users[1]])

## Build a Song Recommender with Personalization

We now create a model that allows us to make personalized recommendations to each user. 

In [None]:
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')

### Applying the personalized model to make song recommendations

As you can see, different users get different recommendations now.

In [None]:
personalized_model.recommend(users=[users[0]])

In [None]:
personalized_model.recommend(users=[users[1]])

### We can also apply the model to find similar songs to any song in the dataset

In [None]:
personalized_model.get_similar_items(['With Or Without You - U2'])

In [None]:
personalized_model.get_similar_items(['Chan Chan (Live) - Buena Vista Social Club'])

# Quantitative comparison between the models

We now formally compare the popularity and the personalized models using precision-recall curves. 

In [None]:
if graphlab.version[:3] >= "1.6":
    model_performance = graphlab.compare(test_data, [popularity_model, personalized_model], user_sample=0.05)
    graphlab.show_comparison(model_performance,[popularity_model, personalized_model])
else:
    %matplotlib inline
    model_performance = graphlab.recommender.util.compare_models(test_data, [popularity_model, personalized_model], user_sample=.05)

The curve shows that the personalized model provides much better performance. 

# Reference