Building a song recommender 

#------------- Dataset used: ------------- Million Songs Dataset Source: http://labrosa.ee.columbia.edu/millionsong/ Paper: http://ismir2011.ismir.net/papers/OS6-1.pdf The current notebook uses a subset of the above data containing 10,000 songs obtained from: https://github.com/turi-code/tutorials/blob/master/notebooks/recsys_rank_10K_song.ipynb

In [None]:
%matplotlib inline

import pandas
from sklearn.cross_validation import train_test_split
import numpy as np
import time
from sklearn.externals import joblib
import Recommenders as Recommenders
import Evaluation as Evaluation




Load music data

In [None]:
# Readd userid-songid-listen_count triplets
#Thos STep might take time to download data from external sources
triplets_file = 'https://static.turi.com/datasets/millionsong/10000.txt'
songs_metadata_file = 'https://static.turi.com/datasets/millionsong/song_data.csv'

song_df_1 = pandas.read_table(triplets_file,header=None)
song_df_1.columns=['user_id','song_id','listen_count']

#Read song metdata
song_df_2 = pandas.read_csv(songs_metadata_file)

#Merge the two dataframe above to create input dataframe for recommender systems
song_df = pandas.merge(song_df_1,song_df_2.drop_duplicates(['song_id']))


##
Explore data

Music data shows how many times a user listened to a song, as well as the details of the song.

In [None]:
song_df.head()

#
Length of the dataset

In [None]:
len(song_df)

Create a subset of the dataset

In [None]:
song_df = song_df.head(10000)

#Merge song title and artist_namecolumns to make a merged column
song_df['song'] = song_df['title'].map(str) + " - " +song_df['artist_name']

#Showing the most popular songs in the dataset

In [24]:
song_grouped = song_df.groupby(['song']).agg({'listen_count':'count'}).reset_index()
grouped_sum = song_grouped['listen_count'].sum()
song_grouped['percentage']=song_grouped['listen_count'].div(grouped_sum)*100
song_grouped.sort_values(['listen_count','song'],ascending=[0,1])



Unnamed: 0,song,listen_count,percentage
5,Sehr kosmisch - Harmonia,6354,63.54
7,Stronger - Kanye West,1082,10.82
3,Learn To Fly - Foo Fighters,917,9.17
1,Constellations - Jack Johnson,557,5.57
4,Paper Gangsta - Lady GaGa,527,5.27
8,The Cove - Jack Johnson,194,1.94
2,Entre Dos Aguas - Paco De Lucia,157,1.57
6,Stacked Actors - Foo Fighters,136,1.36
0,Apuesta Por El Rock 'N' Roll - Héroes del Sile...,76,0.76


##Count number of unique users in the dataset

In [None]:
users = song_df['user_id'].unique()


In [None]:
len(users)

##Quiz 1. Count the number of unique songs in the dataset

In [None]:
###fill in the code here
songs = song_df['song'].unique()
len(songs)

###Create a song recommender

In [None]:
train_data,test_data = train_test_split(song_df, test_size=0.20, random_state=0)
print(train_data.head(5))

Simple popularity-based recommender class (Can be used as a black box)

In [None]:
#Recommenders.popularity_recommender_py

##Create an of popularity based recommender class

In [None]:
pm = Recommenders.popularity_recommender_py()
pm.create(train_data,'user_id','song')

##
Use the popularity model to make some predictions

In [None]:
user_id=users[5]
pm.recommended(user_id)

###
Quiz 2: Use the popularity based model to make predictions for the following user id (Note the difference in recommendations from the first user id).

In [None]:
####Fill in the code here
user_id = users[8]
pm.recommended(user_id)


######
Build a song recommender with personalization

We now create an item similarity based collaborative filtering model that allows us to make personalized recommendations to each user.

Class for an item similarity based personalized recommender system (Can be used as a black box)

In [None]:
##Recommenders.item_similarity_recommender_py

########Create an instance of item similarity based recommender class

In [None]:
is_model= Recommenders.item_similarity_recommender_py()
is_model.create(train_data,'user_id','song')

##3#####Use the personalized model to make some song recommendations

In [None]:
#Print tbe songs for the user u=in training data
user_id = users[5]
user_items= is_model.get_user_items(user_id)
#
print("----------------------------------------------------------")
print("Training data songs for the userid: %s:" % user_id)
print("----------------------------------------------------------")

for user_item in user_items:
    print(user_item)
    
print("----------------------------------------------------------")
print("Recommendation process going on:")
print("----------------------------------------------------------")

#Recommend songs for the user using personalized model
is_model.recommend(user_id)


#########3
Quiz 3. Use the personalized model to make recommendations for the following user id. (Note the difference in recommendations from the first user id.)

In [None]:
user_id = users[7]
#Fill in the code here
user_items = is_model.get_user_items(user_id)
#

print("----------------------------------------------------------------------")
print("Training data songs for the user userid: %s:" % user_id)
print("----------------------------------------------------------------------")

for user_item in user_items:
    print(user_item)

print("----------------------------------------------------------------------")
print("Recommendation process going on:")
print("----------------------------------------------------------------------")

#Recommend songs for the user using personalized model
is_model.recommended(user_id)


######
We can also apply the model to find similar songs to any song in the dataset

In [None]:
is_model.get_similar_items(['U Smile - Justin Bieber'])

########Quiz 4. Use the personalized recommender model to get similar songs for the following song.

In [None]:
song = 'YELLOW - Coldplay'
#####Fill in the code here
is_model.get_similiar_items([song])

#######Quantitative comparison between the models

We now formally compare the popularity and the personalized models using precision-recall curves.

Class to calculate precision and recall (This can be used as a black box)

In [None]:
#Evaluation.precision_recall_calculator

#######3Use the above precision recall calculator class to calculate the evaluation measures

In [None]:
start = time.time()

#Define what percentage of users to use for precision recall calculation
user_sample = 0.05

#Instantiate the precision_recall_calculator class
pr=Evaluation.precision_recall_calculator(test_data, train_data,pm,is_model)

#Call method to calculate precision and recall values
(pm_avg_precision_list, pm_avg_recall_list,ism_avg_precision_list,ism_avg_recall_list)=pr.calculate_me
asures(user_sample)

end = time.time()
print(end - start)
