# Music Recommender System
How would you design a recommendation system? <br>
Pick Last.fm Dataset as an example
http://ocelma.net/MusicRecommendationDataset/lastfm-360K.html

- Please implement a song recommender system prototype for all female users (in the
dataset).
- You may just pick a subset of the given dataset in order to speed up the development.
- Please document on system design and environment setup.
- Moreover, highlight the possible concerns if we put the prototype into production.

#    
#    

In [1]:
# import libraries 
import pandas as pd
import numpy as np
import matplotlib as plt
from sklearn.neighbors import NearestNeighbors
from scipy.sparse import csr_matrix
from fuzzywuzzy import fuzz
#import os
#from bs4 import BeautifulSoup

# pretty display for notebook
%matplotlib inline



## Import the data

In [2]:
df_usage_info = pd.read_csv("usersha1_artmbid_artname_plays.tsv", sep = '\t', \
                            names = ['user_id', 'musicbrainz_id', 'artist_name', 'plays'])
df_usage_info.info()
df_usage_info.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17535655 entries, 0 to 17535654
Data columns (total 4 columns):
user_id           object
musicbrainz_id    object
artist_name       object
plays             int64
dtypes: int64(1), object(3)
memory usage: 535.1+ MB


Unnamed: 0,user_id,musicbrainz_id,artist_name,plays
0,00000c289a1829a808ac09c00daf10bc3c4e223b,3bd73256-3905-4f3a-97e2-8b341527f805,betty blowtorch,2137
1,00000c289a1829a808ac09c00daf10bc3c4e223b,f2fb0ff0-5679-42ec-a55c-15109ce6e320,die Ärzte,1099
2,00000c289a1829a808ac09c00daf10bc3c4e223b,b3ae82c2-e60b-4551-a76d-6620f1b456aa,melissa etheridge,897
3,00000c289a1829a808ac09c00daf10bc3c4e223b,3d6bbeb7-f90e-4d10-b440-e153c0d10b53,elvenking,717
4,00000c289a1829a808ac09c00daf10bc3c4e223b,bbd2ffd7-17f4-4506-8572-c1ea58c3f9a8,juliette & the licks,706


In [3]:
df_user_info = pd.read_csv("usersha1_profile.tsv", sep = '\t', \
                            names = ['user_id', 'gender', 'age', 'country', 'signup'])
df_user_info.head()

Unnamed: 0,user_id,gender,age,country,signup
0,00000c289a1829a808ac09c00daf10bc3c4e223b,f,22.0,Germany,"Feb 1, 2007"
1,00001411dc427966b17297bf4d69e7e193135d89,f,,Canada,"Dec 4, 2007"
2,00004d2ac9316e22dc007ab2243d6fcb239e707d,,,Germany,"Sep 1, 2006"
3,000063d3fe1cf2ba248b9e3c3f0334845a27a6bf,m,19.0,Mexico,"Apr 28, 2008"
4,00007a47085b9aab8af55f52ec8846ac479ac4fe,m,28.0,United States,"Jan 27, 2006"


#    
#    

## Data processing 

In [4]:
# merge df_usage_info with df_user_info to find the gender of the users 
df_usage_info = df_usage_info.merge(df_user_info[['user_id', 'gender', 'country']], how = 'left', \
                                                   on = 'user_id')

# keep female users only 
df_usage_info = df_usage_info[df_usage_info['gender'] == 'f']

# for prototype, use only United States users to speed up the development 
df_usage_info = df_usage_info[df_usage_info['country'] == 'United States']


In [5]:
df_usage_info.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 875447 entries, 1038 to 17532909
Data columns (total 6 columns):
user_id           875447 non-null object
musicbrainz_id    865378 non-null object
artist_name       875446 non-null object
plays             875447 non-null int64
gender            875447 non-null object
country           875447 non-null object
dtypes: int64(1), object(5)
memory usage: 46.8+ MB


###     

## Stages:
### 1. Create the artist popularity coefficient 
Create the user playing coefficient to capture the popularity of the item (artist) following the pseudocode proposed in Sanchez-Moreno et al, 2016. And in this stage, instead of addressing the gray sheep problem, the artist popularity coefficient is simply used as an item attribute to improve the accuracy of the recommender system.


<img src="pseudocode.png" style="width: 300px; "/>

In [6]:
num_unique_artists = df_usage_info['artist_name'].nunique()
num_unique_users = df_usage_info['user_id'].nunique()
total_num_plays = df_usage_info['plays'].sum()

In [7]:
# create artist x user matrix (1)
artist_user_matrix = df_usage_info.groupby(['artist_name', 'user_id'])['plays'].max().unstack()
artist_user_matrix = artist_user_matrix.fillna(0)

artist_user_matrix.head()

user_id,00032c7933e0eb05f2258f1147ef81a90f2d4d6c,000752c87a61bc4247f5219b4769c347c0062c8a,0008b075deee53a3a090668c7ec581e15c3d8430,0009fbcb5120332beefdb12af5e60957688f6765,000d8c54934cc3a9eab276ccb412dbf52b980a44,000f5ca9514226b8b1589f57f02bbdc839bf8727,00145c6f4477a15b5ea78d86f6e60c28e33f353c,001656f03e1fae9a79239e6e2e9edd641977000a,001f8dbc1a7256151fc46b1a513348cbec02c753,00243767e4ba9ad88986d8da01cfa4e4bb3d07df,...,ffd41e64d50ea0e7ef1c480faef0f2ba4bd87a0b,ffd72327349ac2f382158e028aef0f166f3dc313,ffd98068bbf8f7d3e236a6c9aae1467c9d708c83,ffea7fdee086759e70b01bed160c1c3a886b92d6,ffea89340d45b3fc6cc43e7c7aea73628babaaa7,ffeb193e80fabff1804e71cc2b6bb6bb2a31ac03,ffeba36821730969d92cd74036ea712ae592b95e,ffee40c3deb9ac6576255c51b12276034229613e,fff296f402ecb66864563e55fd669195981db86f,fff58a5c95280b7af63f9c552f9159b58ae5efa3
artist_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
!!!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,113.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!action pact!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!deladap,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!green day,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!hero,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
# artist set, (2-6)
# all artists 
df_artist = artist_user_matrix.T
# num of users played per artist
df_artist.loc['num_users_played'] = df_artist[df_artist >= 1.0].count()
# total num of plays for each artist 
df_artist.loc['total_plays'] = df_artist[:-1].sum()
# average plays per user
df_artist = df_artist.fillna(0).T
df_artist['avg_plays_per_user'] = df_artist['total_plays']/ df_artist['num_users_played']
df_artist = df_artist[['num_users_played', 'total_plays', 'avg_plays_per_user']].fillna(0)

df_artist.head()

user_id,num_users_played,total_plays,avg_plays_per_user
artist_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
!!!,161.0,24916.0,154.757764
!action pact!,1.0,45.0,45.0
!deladap,2.0,239.0,119.5
!green day,1.0,1354.0,1354.0
!hero,1.0,77.0,77.0


In [9]:
# average number of users per artist (7)
avg_num_users_per_artist = df_artist['num_users_played'].sum()/ num_unique_artists
avg_num_users_per_artist

16.805670736389466

In [10]:
# played artists by user j (8-12)
# all users 
# num of artists played per user
df_user = artist_user_matrix.copy(deep = True)
df_user.loc['num_artists_played'] = df_user[df_user >= 1.0].count()
# total num of plays by each user
df_user.loc['total_plays'] = df_user[:-1].sum()
# avg num of plays per artist of each user 
df_user = df_user.fillna(0).T
df_user['avg_plays_per_artist'] = df_user['total_plays']/ df_user['num_artists_played']
df_user = df_user[['num_artists_played', 'total_plays', 'avg_plays_per_artist']]
# compute played artists by user j
artist_played_by_user_matrix  = artist_user_matrix.clip_upper(1)
artist_played_by_user_matrix


  del sys.path[0]


user_id,00032c7933e0eb05f2258f1147ef81a90f2d4d6c,000752c87a61bc4247f5219b4769c347c0062c8a,0008b075deee53a3a090668c7ec581e15c3d8430,0009fbcb5120332beefdb12af5e60957688f6765,000d8c54934cc3a9eab276ccb412dbf52b980a44,000f5ca9514226b8b1589f57f02bbdc839bf8727,00145c6f4477a15b5ea78d86f6e60c28e33f353c,001656f03e1fae9a79239e6e2e9edd641977000a,001f8dbc1a7256151fc46b1a513348cbec02c753,00243767e4ba9ad88986d8da01cfa4e4bb3d07df,...,ffd41e64d50ea0e7ef1c480faef0f2ba4bd87a0b,ffd72327349ac2f382158e028aef0f166f3dc313,ffd98068bbf8f7d3e236a6c9aae1467c9d708c83,ffea7fdee086759e70b01bed160c1c3a886b92d6,ffea89340d45b3fc6cc43e7c7aea73628babaaa7,ffeb193e80fabff1804e71cc2b6bb6bb2a31ac03,ffeba36821730969d92cd74036ea712ae592b95e,ffee40c3deb9ac6576255c51b12276034229613e,fff296f402ecb66864563e55fd669195981db86f,fff58a5c95280b7af63f9c552f9159b58ae5efa3
artist_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
!!!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!action pact!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!deladap,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!green day,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!hero,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Ｓｕｇａｒ,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ｖｅｒｓａｉｌｌｅｓ,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
ａｋｉｋｏ,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
ａｔｔｉｃ,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
# listen coefficient (13-16)
# artist set 
# popularity of this artist = (num users played for this artist/ avg num users per artist) 
artist_popularity = df_artist['num_users_played'] / avg_num_users_per_artist
# user_behavior_matrix = user's beahavior for this artist/ all artist 
user_behavior_matrix = artist_user_matrix/ df_user['avg_plays_per_artist']
# artist_received_user_behavior
artist_received_user_behavior = user_behavior_matrix.sum(axis = 1)
# equ 3
# (num users played for this artist/ avg num users per artist) *  \
# (
# (sum of (num of plays for this artist per user/ avg num of plays per artist per user) for this artist) /
# ((sum of (sum of (num of plays for this artist per user/ avg num of plays per artist per user)) for all artist) / 
#   num of artist)
# )
# => artist_popularity * (artist_received_user_behavior/ (artist_received_user_behavior, for all artist)/ num artist)
listening_coef = artist_popularity * \
                    (artist_received_user_behavior/ (artist_received_user_behavior.sum()/ num_unique_artists))
listening_coef = listening_coef.to_frame().rename(columns = {0: 'listening_coef'})
listening_coef.head()

Unnamed: 0_level_0,listening_coef
artist_name,Unnamed: 1_level_1
!!!,70.400841
!action pact!,0.001952
!deladap,0.013545
!green day,0.035787
!hero,0.006633


###     

### 2. Create the item-user rating matrix  
First, create a user-item matrix based on the counts of plays. Then the rating of the item is created using the counts of plays the user j plays an artist i divided by the total number of plays of user j. Secondly, transpose the matrix so it will become an item-user rating matrix.  


In [12]:
user_artist_matrix = artist_user_matrix.T
user_artist_matrix = user_artist_matrix.merge(df_user[['total_plays']], how = 'right', \
                                              left_index = True, right_index = True)
user_artist_matrix = user_artist_matrix.div(user_artist_matrix['total_plays'], axis=0)
user_artist_matrix = user_artist_matrix.drop(columns = 'total_plays')
item_user_rating_matrix = user_artist_matrix.T
item_user_rating_matrix

user_id,00032c7933e0eb05f2258f1147ef81a90f2d4d6c,000752c87a61bc4247f5219b4769c347c0062c8a,0008b075deee53a3a090668c7ec581e15c3d8430,0009fbcb5120332beefdb12af5e60957688f6765,000d8c54934cc3a9eab276ccb412dbf52b980a44,000f5ca9514226b8b1589f57f02bbdc839bf8727,00145c6f4477a15b5ea78d86f6e60c28e33f353c,001656f03e1fae9a79239e6e2e9edd641977000a,001f8dbc1a7256151fc46b1a513348cbec02c753,00243767e4ba9ad88986d8da01cfa4e4bb3d07df,...,ffd41e64d50ea0e7ef1c480faef0f2ba4bd87a0b,ffd72327349ac2f382158e028aef0f166f3dc313,ffd98068bbf8f7d3e236a6c9aae1467c9d708c83,ffea7fdee086759e70b01bed160c1c3a886b92d6,ffea89340d45b3fc6cc43e7c7aea73628babaaa7,ffeb193e80fabff1804e71cc2b6bb6bb2a31ac03,ffeba36821730969d92cd74036ea712ae592b95e,ffee40c3deb9ac6576255c51b12276034229613e,fff296f402ecb66864563e55fd669195981db86f,fff58a5c95280b7af63f9c552f9159b58ae5efa3
artist_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
!!!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012096,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!action pact!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!deladap,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!green day,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
!hero,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Ｓｕｇａｒ,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ｖｅｒｓａｉｌｌｅｓ,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
ａｋｉｋｏ,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
ａｔｔｉｃ,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


###     

### 3. Merge the artist popularity coefficient with the item-user rating matrix  


In [13]:
#item_matrix = item_user_rating_matrix.unstack().reset_index().rename(columns = {'level_0': 'user_id', 0: 'rating'})
# pivot ratings into item features
#item_matrix = item_matrix.pivot(index='artist_name', columns='user_id', values='rating').fillna(0)

item_matrix = item_user_rating_matrix.merge(listening_coef, how = 'left', left_index = True, right_index = True).fillna(0)
item_matrix.head()

Unnamed: 0_level_0,00032c7933e0eb05f2258f1147ef81a90f2d4d6c,000752c87a61bc4247f5219b4769c347c0062c8a,0008b075deee53a3a090668c7ec581e15c3d8430,0009fbcb5120332beefdb12af5e60957688f6765,000d8c54934cc3a9eab276ccb412dbf52b980a44,000f5ca9514226b8b1589f57f02bbdc839bf8727,00145c6f4477a15b5ea78d86f6e60c28e33f353c,001656f03e1fae9a79239e6e2e9edd641977000a,001f8dbc1a7256151fc46b1a513348cbec02c753,00243767e4ba9ad88986d8da01cfa4e4bb3d07df,...,ffd72327349ac2f382158e028aef0f166f3dc313,ffd98068bbf8f7d3e236a6c9aae1467c9d708c83,ffea7fdee086759e70b01bed160c1c3a886b92d6,ffea89340d45b3fc6cc43e7c7aea73628babaaa7,ffeb193e80fabff1804e71cc2b6bb6bb2a31ac03,ffeba36821730969d92cd74036ea712ae592b95e,ffee40c3deb9ac6576255c51b12276034229613e,fff296f402ecb66864563e55fd669195981db86f,fff58a5c95280b7af63f9c552f9159b58ae5efa3,listening_coef
artist_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
!!!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012096,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,70.400841
!action pact!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001952
!deladap,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013545
!green day,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035787
!hero,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006633


###     

### 4. K-nearest neighbor (K-NN) to find similar items and generate a item recommendation table
K-NN builds a neighboard of K items similar to the item the user is listening to. It will select the top N most similar items and recommend them to the user. This data will be used to generate a item recommendation table.  


In [14]:
# create mapper from item (artist name) to index
item_to_idx = {
    artist: i for i, artist in 
    enumerate(list(item_matrix.index))
}
# get reverse mapper
reverse_mapper = {v: k for k, v in item_to_idx.items()}

# transform matrix to scipy sparse matrix
item_matrix_csr = csr_matrix(item_matrix.values)

In [15]:
# knn, NearestNeighbors
model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=5, n_jobs=-1)
# fit
model_knn.fit(item_matrix_csr)

NearestNeighbors(algorithm='brute', leaf_size=30, metric='cosine',
                 metric_params=None, n_jobs=-1, n_neighbors=5, p=2, radius=1.0)

In [16]:
# make predictions 
distances, indices = model_knn.kneighbors(item_matrix, n_neighbors= 5+1)

# create the item prediction df
item_prediction = pd.DataFrame(indices)
# drop the query point
item_prediction = item_prediction.drop(columns = 0)

# map the value back to artist name
for column, column_values in item_prediction.iteritems():
    item_prediction[column] = item_prediction[column].map(reverse_mapper)
item_prediction['query_point']=item_prediction.index.map(reverse_mapper.get)
item_prediction = item_prediction[['query_point', 1, 2, 3, 4, 5]]


#### Item Recommendation Table
The item recommendation table contains the top 5 most similar item to the query_point and is sorted by the similarilty. Column 1 contains the most similar item, and column 5 contains the 5th most similar item. 

In [17]:
item_prediction

Unnamed: 0,query_point,1,2,3,4,5
0,!!!,hot chip,girl talk,animal collective,lcd soundsystem,ladytron
1,!action pact!,tokyo electron,imperial leather,baseball furies,cpc gangbangs,wolfbrigade
2,!deladap,teddy rok seven,herbert & dani siciliano,black zone ensemble,black spade,zo!
3,!green day,the old bethpage brass band,the new american brass band,janice strand,jacqueline schwab,whole wheat bread
4,!hero,dr reanimator,pirates of the caribbean 2,apache,partystylerz,teriyaki boys
...,...,...,...,...,...,...
52087,Ｓｕｇａｒ,moll'e node,echostream,robert,hikawa kiyoshi,noir fleurir
52088,Ｖｅｒｓａｉｌｌｅｓ,the tellers,ark sano,mademoiselle k,holden,the babys
52089,ａｋｉｋｏ,ａｋｉｋｏ,akiko,ivana santilli,clara hill's folkwaves,ayuse kozue
52090,ａｔｔｉｃ,kein,マカロニ,ａｔｔｉｃ,the novembers,ネガ


###     

### Make prediction
When a user is listening to artist x, we can query item recommendation table and make recommendations to him/her.

In [18]:
# example 
listening_to = 'avril lavigne'

# make prediction 
prediction_list = [1, 2, 3, 4, 5]
recommended_item_list = item_prediction[item_prediction['query_point'] == listening_to] \
                            [prediction_list].values.tolist()[0]

print('The user is listening to: ', listening_to)
print('The system recommends: ', recommended_item_list)

The user is listening to:  avril lavigne
The system recommends:  ['paramore', 'the beatles', 'death cab for cutie', 'coldplay', 'radiohead']


#    
#    

#    
#    