[View in Colaboratory](https://colab.research.google.com/github/divsinha99/Million_Song_Recommender/blob/master/Recommendation_System.ipynb)

## Recommendation Engines

The basis of a Recommendation Engine --> Recorded interaction between the Users & Products.

For Example: A movie Recommendation Engine will be based on the Ratings (interaction) provided by the Users to different movies.


## Types of Recommendation Engine:
Depends upon the entity that they assume is the most important for generating Recommendations.

### - User-Based Recommendation Engines: 
Central Entity --> User (will look for similarities among Users)
### - Content-Based Recommendation Engines: 
Central Entity --> Content (tries to find features about the content & find similar Content. Here, entity will be Songs we are trying to recommend).
### - Hybrid-Recommendation Engines (Collaborative Filtering): 
Uses both features of User & Content  to develop recommendations.

In [0]:
import pandas as pd

In [0]:
triplet_dataset_sub_song_merged = pd.read_csv("triplet_dataset_sub_song_merged.csv")
triplet_dataset_sub_song_merged.head()

Unnamed: 0,user,song,listen_count,title,release,artist_mbid,artist_name,year
0,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOADQPP12A67020C82,12,You And Me Jesus,Tribute To Jake Hess,d8881a78-e6c6-4cd7-bd43-1a6c4b8ef4ba,Jake Hess,2004.0
1,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOAFTRR12AF72A8D4D,1,Harder Better Faster Stronger,Discovery,056e4f3e-d505-4dad-8ec1-d04f521cbb56,Daft Punk,2007.0
2,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOANQFY12AB0183239,1,Uprising,Uprising,fd857293-5ab8-40de-b29e-55a69d4e4d0f,Muse,0.0
3,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOAYATB12A6701FD50,1,Breakfast At Tiffany's,Home,ae3f6a8a-c465-4707-8667-8ce0172bc417,Deep Blue Something,1993.0
4,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOBOAFP12A8C131F36,7,Lucky (Album Version),We Sing. We Dance. We Steal Things.,82eb8936-7bf6-4577-8320-a2639465206d,Jason Mraz & Colbie Caillat,0.0


In [0]:
import sklearn

In [0]:
import Recommenders as Recommenders

In [0]:
from sklearn.cross_validation import train_test_split

## 1. Popularity based recommendations 
                                                                                    ## Simplest Recommendation Engine 

If some item is liked (or listened to) by a vast majority of User database, then it is a good idea to recommend that item to users who have not interacted with that item.
So, we will try to find out which songs have been listened to by the users most number of times, so that will be our standard recommendation set for each user.

In [0]:
triplet_dataset_sub_song_merged_set = triplet_dataset_sub_song_merged
train_data, test_data = train_test_split(triplet_dataset_sub_song_merged_set, test_size=0.4, random_state=0)

In [0]:
train_data.head()

Unnamed: 0,user,song,listen_count,title,release,artist_mbid,artist_name,year
54517,7ac815e4eddf5f892fd06b55ef666e25fceb930c,SONQBUB12A6D4F8ED0,2,Angie (1993 Digital Remaster),Jump Back - The Best Of The Rolling Stones_ '7...,b071f9fa-14b0-4217-8e97-eb41da73f598,The Rolling Stones,0.0
223603,d23fb44ab5693625b66717a7aa166de85f84cd55,SOKVUKL12A8C1357B5,1,Mula Noon Hanggang Ngayon,OPM Timeless Collection,67f4ab30-bfe1-483a-a5ae-c6e6067497b8,Lea Salonga,0.0
92635,686f2add4eb9a011fee804ee0e9b843333d33838,SOVUEEM12A8C135974,1,No Such Thing,Carry On,cbf9738d-8f81-4a92-bc64-ede09341652d,Chris Cornell,2007.0
157848,0345a2367c91563dbb46adcfa611dd066b9e3e08,SOSIDTC12A8C1383E1,2,Alanson_ Crooked River,Greetings From Michigan_ The Great Lake State,01d3c51b-9b98-418a-8d8e-37f6fab59d8c,Sufjan Stevens,0.0
251153,2f3e2b0ade854f990ae61833345b6381566544be,SOLXQEG12A67AE2285,3,Pushing Me Away (Album Version),Hybrid Theory,f59c5520-5f46-4d2c-b2c4-822eabf53419,Linkin Park,2000.0


In [0]:

def  create_popularity_recommendation(train_data, user_id, item_id):
    #Get a count of user_ids for each unique song as recommendation score
    train_data_grouped = train_data.groupby([item_id]).agg({user_id: 'count'}).reset_index()
    train_data_grouped.rename(columns = {user_id: 'score'},inplace=True)
    
    #Sort the songs based upon recommendation score
    train_data_sort = train_data_grouped.sort_values(['score', item_id], ascending = [0,1])
    
    #Generate a recommendation rank based upon score
    train_data_sort['Rank'] = train_data_sort['score'].rank(ascending=0, method='first')
        
    #Get the top 10 recommendations
    popularity_recommendations = train_data_sort.head(20)
    return popularity_recommendations

### Generate top 20 Recommendations to each of our Users:

In [0]:
recommendations  = create_popularity_recommendation(triplet_dataset_sub_song_merged,'user','title')
recommendations

Unnamed: 0,title,score,Rank
18773,Sehr kosmisch,448,1.0
5536,Dog Days Are Over (Radio Edit),440,2.0
24036,Undo,382,3.0
18737,Secrets,370,4.0
26175,You're The One,370,5.0
17871,Revelry,342,6.0
9511,Horn Concerto No. 4 in E flat K495: II. Romanc...,327,7.0
23295,Tive Sim,315,8.0
24176,Use Somebody,313,9.0
7208,Fireflies,301,10.0


## 2. Item similarity based recommendations

In [0]:
song_count_df = pd.read_csv("song_count_df.csv")
play_count_df = pd.read_csv("play_count_df.csv")

In [0]:
play_count_subset = play_count_df.head(n=100000)
user_subset = list(play_count_subset.user)

In [0]:
song_count_subset = song_count_df.head(n=5000)
user_subset = list(play_count_subset.user)
song_subset = list(song_count_subset.song)
triplet_dataset_sub_song_merged_sub = triplet_dataset_sub_song_merged[triplet_dataset_sub_song_merged.song.isin(song_subset)]

In [0]:
triplet_dataset_sub_song_merged_sub.head()

Unnamed: 0,user,song,listen_count,title,release,artist_mbid,artist_name,year
0,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOADQPP12A67020C82,12,You And Me Jesus,Tribute To Jake Hess,d8881a78-e6c6-4cd7-bd43-1a6c4b8ef4ba,Jake Hess,2004.0
1,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOAFTRR12AF72A8D4D,1,Harder Better Faster Stronger,Discovery,056e4f3e-d505-4dad-8ec1-d04f521cbb56,Daft Punk,2007.0
2,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOANQFY12AB0183239,1,Uprising,Uprising,fd857293-5ab8-40de-b29e-55a69d4e4d0f,Muse,0.0
3,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOAYATB12A6701FD50,1,Breakfast At Tiffany's,Home,ae3f6a8a-c465-4707-8667-8ce0172bc417,Deep Blue Something,1993.0
4,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOBOAFP12A8C131F36,7,Lucky (Album Version),We Sing. We Dance. We Steal Things.,82eb8936-7bf6-4577-8320-a2639465206d,Jason Mraz & Colbie Caillat,0.0


In [0]:
train_data, test_data = train_test_split(triplet_dataset_sub_song_merged_sub, test_size = 0.30, random_state=0)
is_model = Recommenders.item_similarity_recommender_py()
is_model.create(train_data, 'user', 'title')
user_id = list(train_data.user)[7]
user_items = is_model.get_user_items(user_id)

#Recommend songs for the user using personalized model
is_model.recommend(user_id)

No. of unique songs for the user: 79
no. of unique songs in the training set: 4864
Non zero values in cooccurence_matrix :101497


Unnamed: 0,user_id,song,score,rank
0,9f421913b988809407f87de9dc6377637abdf4e7,Intervention,0.040527,1.0
1,9f421913b988809407f87de9dc6377637abdf4e7,A-Punk (Album),0.036398,2.0
2,9f421913b988809407f87de9dc6377637abdf4e7,Crown Of Love,0.035368,3.0
3,9f421913b988809407f87de9dc6377637abdf4e7,Horchata,0.034175,4.0
4,9f421913b988809407f87de9dc6377637abdf4e7,Neon Bible,0.033561,5.0
5,9f421913b988809407f87de9dc6377637abdf4e7,They Might Follow You,0.033173,6.0
6,9f421913b988809407f87de9dc6377637abdf4e7,Mia,0.032267,7.0
7,9f421913b988809407f87de9dc6377637abdf4e7,The Kids Dont Stand A Chance (Album),0.031778,8.0
8,9f421913b988809407f87de9dc6377637abdf4e7,White Sky,0.03044,9.0
9,9f421913b988809407f87de9dc6377637abdf4e7,Kiss With A Fist,0.030204,10.0


## 3. Matrix factorization based recommendations

In [0]:
triplet_dataset_sub_song_merged_sum_df = triplet_dataset_sub_song_merged[['user','listen_count']].groupby('user').sum().reset_index()
triplet_dataset_sub_song_merged_sum_df.rename(columns={'listen_count':'total_listen_count'},inplace=True)
triplet_dataset_sub_song_merged = pd.merge(triplet_dataset_sub_song_merged,triplet_dataset_sub_song_merged_sum_df)
triplet_dataset_sub_song_merged['fractional_play_count'] = triplet_dataset_sub_song_merged['listen_count']/triplet_dataset_sub_song_merged['total_listen_count']

In [0]:
triplet_dataset_sub_song_merged[triplet_dataset_sub_song_merged.user =='d6589314c0a9bcbca4fee0c93b14bc402363afea'][['user','song','listen_count','fractional_play_count']].head()

Unnamed: 0,user,song,listen_count,fractional_play_count
0,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOADQPP12A67020C82,12,0.036474
1,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOAFTRR12AF72A8D4D,1,0.00304
2,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOANQFY12AB0183239,1,0.00304
3,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOAYATB12A6701FD50,1,0.00304
4,d6589314c0a9bcbca4fee0c93b14bc402363afea,SOBOAFP12A8C131F36,7,0.021277


In [0]:
from scipy.sparse import coo_matrix

small_set = triplet_dataset_sub_song_merged
user_codes = small_set.user.drop_duplicates().reset_index()
song_codes = small_set.song.drop_duplicates().reset_index()
user_codes.rename(columns={'index':'user_index'}, inplace=True)
song_codes.rename(columns={'index':'song_index'}, inplace=True)
song_codes['so_index_value'] = list(song_codes.index)
user_codes['us_index_value'] = list(user_codes.index)
small_set = pd.merge(small_set,song_codes,how='left')
small_set = pd.merge(small_set,user_codes,how='left')
mat_candidate = small_set[['us_index_value','so_index_value','fractional_play_count']]
data_array = mat_candidate.fractional_play_count.values
row_array = mat_candidate.us_index_value.values
col_array = mat_candidate.so_index_value.values

data_sparse = coo_matrix((data_array, (row_array, col_array)),dtype=float)

In [0]:

data_sparse

<2480x28674 sparse matrix of type '<class 'numpy.float64'>'
	with 264437 stored elements in COOrdinate format>

In [0]:
user_codes[user_codes.user =='9f421913b988809407f87de9dc6377637abdf4e7']

Unnamed: 0,user_index,user,us_index_value
1886,202831,9f421913b988809407f87de9dc6377637abdf4e7,1886


In [0]:
import math as mt
from scipy.sparse.linalg import * #used for matrix multiplication
from scipy.sparse.linalg import svds
from scipy.sparse import csc_matrix

In [0]:
def compute_svd(urm, K):
    U, s, Vt = svds(urm, K)

    dim = (len(s), len(s))
    S = np.zeros(dim, dtype=np.float32)
    for i in range(0, len(s)):
        S[i,i] = mt.sqrt(s[i])

    U = csc_matrix(U, dtype=np.float32)
    S = csc_matrix(S, dtype=np.float32)
    Vt = csc_matrix(Vt, dtype=np.float32)
    
    return U, S, Vt

def compute_estimated_matrix(urm, U, S, Vt, uTest, K, test):
    rightTerm = S*Vt 
    max_recommendation = 250
    estimatedRatings = np.zeros(shape=(MAX_UID, MAX_PID), dtype=np.float16)
    recomendRatings = np.zeros(shape=(MAX_UID,max_recommendation ), dtype=np.float16)
    for userTest in uTest:
        prod = U[userTest, :]*rightTerm
        estimatedRatings[userTest, :] = prod.todense()
        recomendRatings[userTest, :] = (-estimatedRatings[userTest, :]).argsort()[:max_recommendation]
    return recomendRatings

In [0]:
import numpy as np
K=2479
urm = data_sparse
MAX_PID = urm.shape[1]
MAX_UID = urm.shape[0]

U, S, Vt = compute_svd(urm, K)

In [0]:
uTest = [4,5,6,7,8,873,23]

uTest_recommended_items = compute_estimated_matrix(urm, U, S, Vt, uTest, K, True)

In [0]:
for user in uTest:
    print("Recommendation for user with user id {}". format(user))
    rank_value = 1
    for i in uTest_recommended_items[user,0:10]:
        song_details = small_set[small_set.so_index_value == i].drop_duplicates('so_index_value')[['title','artist_name']]
        print("The number {} recommended song is {} BY {}".format(rank_value, list(song_details['title'])[0],list(song_details['artist_name'])[0]))
        rank_value+=1

Recommendation for user with user id 4
The number 1 recommended song is Relax BY Frankie Goes To Hollywood
The number 2 recommended song is Dress Me Like a Clown BY Margot & The Nuclear So And So's
The number 3 recommended song is Volvere BY Corazones Estrangulados
The number 4 recommended song is Picture BY Sheryl Crow
The number 5 recommended song is Tennessee (Pirate Radio Mix) BY ARRESTED DEVELOPMENT
The number 6 recommended song is Shine On (Album Version) BY NEEDTOBREATHE
The number 7 recommended song is Gone Going BY Black Eyed Peas
The number 8 recommended song is Plaisir d'amour BY Charlotte Church
The number 9 recommended song is Haley (Album Version) BY NEEDTOBREATHE
The number 10 recommended song is If I Could BY Jack Johnson
Recommendation for user with user id 5
The number 1 recommended song is Fascination BY Alphabeat
The number 2 recommended song is Boy Who Stopped The World_ The  (Lackluster Album Version) BY Aaron Sprinkle
The number 3 recommended song is All Men Are 

In [0]:
uTest = [2478]
#Get estimated rating for test user
print("Predictied ratings:")
uTest_recommended_items = compute_estimated_matrix(urm, U, S, Vt, uTest, K, True)

Predictied ratings:


In [0]:
for user in uTest:
    print("Recommendation for user with user id {}". format(user))
    rank_value = 1
    for i in uTest_recommended_items[user,0:10]:
        song_details = small_set[small_set.so_index_value == i].drop_duplicates('so_index_value')[['title','artist_name']]
        print("The number {} recommended song is {} BY {}".format(rank_value, list(song_details['title'])[0],list(song_details['artist_name'])[0]))
        rank_value+=1

Recommendation for user with user id 2478
The number 1 recommended song is As I Em BY Asher Roth / Chester French
The number 2 recommended song is Good Things BY Rich Boy / Polow Da Don / Keri Hilson
The number 3 recommended song is Ching BY SWAMI featuring SPEE
The number 4 recommended song is What If (Album Version) BY Jason Derulo
The number 5 recommended song is Kiss the sky BY Shawn Lee feat. Nino Moschella
The number 6 recommended song is Suicide On Downing St. BY Tim Finn
The number 7 recommended song is The Joker BY Fatboy Slim
The number 8 recommended song is Gimme That BY Chris Brown
The number 9 recommended song is If You Don't_ Don't BY Jimmy Eat World
The number 10 recommended song is A Strangely Isolated Place BY Ulrich Schnauss
