In [None]:
# Look alternative style SVD in this same notebook, first style is optional, alternative style is imp.

## Recommendation based on Matrix Factorization

The starting point of any matrix factorization-based method is the utility matrix, a matrix of user Vs item dimension. Not, this is a sparse matrix, since not all item is used by the user. The process of matrix factorization means finding out a low rank approximation of the utility matrix. So we want to break down the utility matrix U into two low rank matrices so that we can recreate the matrix U by multiplying those two matrices:

Assuming the process helps us identify latent factors/features, meaning as K, our aim is to find two matrices X and Y such that their product (matrix multiplication) approximates R.

X = |U| x K matrix (A matrix with dimensions of num_users * factors)

Y = |P| x K matrix (A matrix with dimensions of factors * num_songs)

![iamge](https://image.slidesharecdn.com/dlrsworkshop-161201170151/95/deep-learning-for-audiobased-music-recommendation-15-638.jpg?cb=1480612282)

To make a recommendation to the user, we can multiply the corresponding user's row from the first matrix by the item matrix and determine the items from the row with maximum ratings. That will become our recommendations for the user. The first matrix represents the association between the users and the latent features, while the second matrix takes care of the associations between items (songs in our case) and the latent features. 

#### Matrix Factorization and Singular Value Decomposition (SVD)
There are multiple algorithms available for determining factorization of any matrix. We use one of the simplest algorithms, which is the singular value decomposition or SVD. You can follow these steps to determine the factorization of a matrix using the output of SVD function.
- Factorize the matrix to obtain U, S, and V matrices.
- Reduce the matrix S to first k components. (The function we are using will only provide k dimensions, so we can skip this step.)
- Compute the square root of reduced matrix S<sub>k</sub> to obtain the matrix S<sub>k</sub><sup>1/2</sup>.
- Compute the two resultant matrix U\*S<sub>k</sub><sup>1/2</sup> and S<sub>k</sub><sup>1/2</sup>\*V as these will serve as our two factorized matrices

We can then generate the prediction of user i for product j by taking the dot product of the i<sup>th</sup> row of the first matrix with the j<sup>th</sup> column of the second matrix. 

Let's create tree functions to help us on it:
- ***compute_svd***: It use the svds function provided by the scipy library to break down our utility matrix into three different matrices, and prosed with the others tree steps above.
- ***compute_estimated_matrix***: use the decomposed matrices by SVD and provide the predictions.

## Import

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import scipy.sparse as sp
from scipy.sparse.linalg import svds
from scipy.sparse import coo_matrix
import math as mt
from scipy.sparse import csc_matrix

%matplotlib inline

* i/p -> user numbers/ids
* o/p -> songs recommended for reels
* Hypothesis: If a song is performing well above avg. for many of the users, then recommend that song, it will surely derive the more than avg engagement rate for the influencer

## DataFrame

In [2]:
df1 = pd.read_excel('clean_train1.xlsx')
display(df1.head(2) )
df2 = pd.read_excel('clean_train2.xlsx')
display(df2.head(2) )

Unnamed: 0,Song,use_count,Name,Followers,Posts,Bio,URL,ER,title,release,artist_name,year,Delta,Curr_ER,Comments,Likes,Time
0,SOAKIMP12A8C130995,1,tejaswini wagh,83789,348,“ You'll never find peace of mind until you li...,https://www.instagram.com/waghtejaswini/,9.9,The Cove,Thicker Than Water,Jack Johnson,0,-7.719191,2.180809,1827,16445,3d
1,SOAKIMP12A8C130995,1,Shounak Nayak,7151,505,UK/India All posts my own unless stated. Email:,https://www.instagram.com/shounaknayak/,3.4,The Cove,Thicker Than Water,Jack Johnson,0,1.337458,4.737458,161,967,1d


Unnamed: 0,Song,use_count,Name,Followers,Posts,Bio,URL,ER,title,release,artist_name,year,Delta,Curr_ER,Comments,Likes,Time
0,SOBQYCF12AC909726F,8,bét.🌿 Witcheswoods currently,15151,215,✦ ORNAMENTS & BLACKWORK ✦ Resident at @oryxtat...,https://www.instagram.com/beatriceblxck/,0.6,Cooler Than Me,Cooler Than Me,Mike Posner,2010,3.37136,3.97136,7220,28881,3w
1,SOBQYCF12AC909726F,1,Pout Pretty,43285,2518,"Indian beauty and lifestyle blogger, history b...",https://www.instagram.com/poutpretty/,0.4,Cooler Than Me,Cooler Than Me,Mike Posner,2010,2.763777,3.163777,7173,43039,2w


In [3]:
#append two dataset
df = df1.append(df2, ignore_index=False)
df.tail(2)

Unnamed: 0,Song,use_count,Name,Followers,Posts,Bio,URL,ER,title,release,artist_name,year,Delta,Curr_ER,Comments,Likes,Time
33903,SOGCHYZ12AF72A69EC,2,Riya Deepsi,25955,515,Human being Email for any collaboration or enq...,https://www.instagram.com/riya_d_m/,3.2,That Tree (feat. Kid Cudi),That Tree Featuring Kid Cudi,Snoop Dogg featuring Kid Cudi,2010,-0.05021,3.14979,7721,38605,2w
33904,SOSSZPW12A8C13843D,20,Kashika Kapur Makeupartist,49130,1921,Dreamer of making this world a better place🙏🏻 ...,https://www.instagram.com/kashikakapurmua/,2.9,Figures,Dreams,The Whitest Boy Alive,2006,-2.33451,0.56549,92,833,1d


In [4]:
test = pd.read_excel('clean_test.xlsx')
test.head(2)

Unnamed: 0,Song,use_count,Name,Followers,Posts,Bio,URL,ER,title,release,artist_name,year,Delta,Curr_ER,Comments,Likes,Time
0,SOBBMDR12A8C13253B,1,Pratima Dagar,33797,1059,Celebrity MakeupArtist & Hairstylist BRANDS |C...,https://www.instagram.com/pratimadagar/,1.0,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,4.949297,5.949297,17128,137024,3w
1,SOBXHDL12A81C204C0,1,Nigar Khan نگار,6592,422,🇮🇳(MA)@inega.in 🇫🇷M P P A R I S 🇮🇹 M A J O R M...,https://www.instagram.com/nigarkhan21/,0.0,Stronger,Graduation,Kanye West,2007,4.830947,4.830947,3715,26007,4w


## Preparing DataFrame for use of Scipy Sparse Matrix Transformation

In [5]:
df = df.rename(columns={'URL':'Profile_URL', 'Song':'Song_Name'}) #rename
test = test.rename(columns={'URL':'Profile_URL', 'Song':'Song_Name'}) #rename
df.head(3)

Unnamed: 0,Song_Name,use_count,Name,Followers,Posts,Bio,Profile_URL,ER,title,release,artist_name,year,Delta,Curr_ER,Comments,Likes,Time
0,SOAKIMP12A8C130995,1,tejaswini wagh,83789,348,“ You'll never find peace of mind until you li...,https://www.instagram.com/waghtejaswini/,9.9,The Cove,Thicker Than Water,Jack Johnson,0,-7.719191,2.180809,1827,16445,3d
1,SOAKIMP12A8C130995,1,Shounak Nayak,7151,505,UK/India All posts my own unless stated. Email:,https://www.instagram.com/shounaknayak/,3.4,The Cove,Thicker Than Water,Jack Johnson,0,1.337458,4.737458,161,967,1d
2,SOAKIMP12A8C130995,3,JD Institute of Fashion,16047,3120,"Empowering creative minds since 1988, JD insti...",https://www.instagram.com/jdinstitute/,0.5,The Cove,Thicker Than Water,Jack Johnson,0,0.130806,0.630806,1012,9110,4w


In [6]:
user_codes = df.Profile_URL.drop_duplicates().reset_index()
user_codes.rename(columns={'index':'user_old_index'}, inplace=True)
user_codes['user_new_index'] = list(user_codes.index)
user_codes.head(3)

Unnamed: 0,user_old_index,Profile_URL,user_new_index
0,0,https://www.instagram.com/waghtejaswini/,0
1,1,https://www.instagram.com/shounaknayak/,1
2,2,https://www.instagram.com/jdinstitute/,2


In [7]:
song_codes = df.Song_Name.drop_duplicates().reset_index()
song_codes.rename(columns={'index':'song_old_index'}, inplace=True)
song_codes['song_new_index'] = list(song_codes.index)
song_codes.head(7)

Unnamed: 0,song_old_index,Song_Name,song_new_index
0,0,SOAKIMP12A8C130995,0
1,9,SOBBMDR12A8C13253B,1
2,13,SOBXHDL12A81C204C0,2
3,55,SOBYHAJ12A6701BF1D,3
4,77,SODACBL12A8C13C273,4
5,114,SODDNQT12A6D4F5F7E,5
6,120,SODXRTY12AB0180F3B,6


In [8]:
df = pd.merge(df, song_codes,how='left')
df = pd.merge(df, user_codes,how='left')
df.head(3)

Unnamed: 0,Song_Name,use_count,Name,Followers,Posts,Bio,Profile_URL,ER,title,release,...,year,Delta,Curr_ER,Comments,Likes,Time,song_old_index,song_new_index,user_old_index,user_new_index
0,SOAKIMP12A8C130995,1,tejaswini wagh,83789,348,“ You'll never find peace of mind until you li...,https://www.instagram.com/waghtejaswini/,9.9,The Cove,Thicker Than Water,...,0,-7.719191,2.180809,1827,16445,3d,0,0,0,0
1,SOAKIMP12A8C130995,1,Shounak Nayak,7151,505,UK/India All posts my own unless stated. Email:,https://www.instagram.com/shounaknayak/,3.4,The Cove,Thicker Than Water,...,0,1.337458,4.737458,161,967,1d,0,0,1,1
2,SOAKIMP12A8C130995,3,JD Institute of Fashion,16047,3120,"Empowering creative minds since 1988, JD insti...",https://www.instagram.com/jdinstitute/,0.5,The Cove,Thicker Than Water,...,0,0.130806,0.630806,1012,9110,4w,0,0,2,2


In [9]:
test = pd.merge(test, song_codes,how='left')
test = pd.merge(test, user_codes,how='left')
test.head(3)

Unnamed: 0,Song_Name,use_count,Name,Followers,Posts,Bio,Profile_URL,ER,title,release,...,year,Delta,Curr_ER,Comments,Likes,Time,song_old_index,song_new_index,user_old_index,user_new_index
0,SOBBMDR12A8C13253B,1,Pratima Dagar,33797,1059,Celebrity MakeupArtist & Hairstylist BRANDS |C...,https://www.instagram.com/pratimadagar/,1.0,Entre Dos Aguas,Flamenco Para Niños,...,1976,4.949297,5.949297,17128,137024,3w,9.0,1.0,4423,2729
1,SOBXHDL12A81C204C0,1,Nigar Khan نگار,6592,422,🇮🇳(MA)@inega.in 🇫🇷M P P A R I S 🇮🇹 M A J O R M...,https://www.instagram.com/nigarkhan21/,0.0,Stronger,Graduation,...,2007,4.830947,4.830947,3715,26007,4w,13.0,2.0,5661,3103
2,SOBXHDL12A81C204C0,1,©সৌম্যাজিৎ sarkar🔵,47490,217,"Kolkata,🇮🇳✨ বাঙালি Content creator : @divineco...",https://www.instagram.com/saumyajitsarkar1963/,0.2,Stronger,Graduation,...,2007,1.949696,2.149696,14973,59892,3w,13.0,2.0,5611,3092


## Relevant info for matrix

In [10]:
mat_candidate = df[['user_new_index','song_new_index','Delta'] ]
mat_candidate.head(3)

Unnamed: 0,user_new_index,song_new_index,Delta
0,0,0,-7.719191
1,1,0,1.337458
2,2,0,0.130806


In [11]:
data_array = mat_candidate.Delta.values

row_array = mat_candidate.user_new_index.values

col_array = mat_candidate.song_new_index.values

## Conversion To Sparse Matrix: Scalability and Data Compaction Reasons
The next transformation of data that is required is to convert our dataframe into a numpy matrix in the format of utility matrix. We will convert our dataframe into a sparse matrix, as we will have a lot of missing values and sparse matrices are suitable for representation of such a matrix.

In [12]:
data_sparse = coo_matrix((data_array, (row_array, col_array)),dtype=float)
#sparse matrix coordinate  (i, j, value)

display(data_sparse)

<4218x9873 sparse matrix of type '<class 'numpy.float64'>'
	with 83905 stored elements in COOrdinate format>

## Factorise the Matrix Using SVD

In [13]:
def compute_svd(urm, K):
    U, s, Vt = svds(urm, K)       
    
    #s is given in form of 1d array, convert it to 2d diagonal matrix
    dim = (len(s), len(s))
    S = np.zeros(dim, dtype=np.float32)
    for i in range(0, len(s)):
        S[i,i] = mt.sqrt(s[i])

    #Compressed Sparse Column matrix: Not to csr, becoz we want to access columns easily    
    U = csc_matrix(U, dtype=np.float32)
    S = csc_matrix(S, dtype=np.float32)
    Vt = csc_matrix(Vt, dtype=np.float32)   #still need to know csc
    
    return U, S, Vt

In [14]:
#define parameters


K= 100   #must be less than our existing data of urm => 24*13 
#I have selected K=5 features


urm = data_sparse
MAX_PID = urm.shape[1]
MAX_UID = urm.shape[0]

U, S, Vt = compute_svd(urm, K)

## Select User as I/p and Recommend

In [15]:
def compute_estimated_matrix(urm, U, S, Vt, uTest, K, test):
    max_recommendation = 10                        #total unique songs are 13 in our data
    
    rightTerm = S*Vt                                #this is the right term S^0.5 * V
    
    #create matrix of specified size and fill with zeros
    estimatedRatings = np.zeros(shape=(MAX_UID, MAX_PID), dtype=np.float16)
    recomendRatings = np.zeros(shape=(MAX_UID,max_recommendation ), dtype=np.float16)
    
    #make recommendation for every user
    for userTest in uTest:
        
        prod = U[userTest, :]*rightTerm                      #have to understand this
        estimatedRatings[userTest, :] = prod.todense()       
        recomendRatings[userTest, :] = (-estimatedRatings[userTest, :]).argsort()[:max_recommendation]     #have to understand this
    return recomendRatings

In [16]:
uTest = [4,5,16,7,20]

uTest_recommended_items = compute_estimated_matrix(urm, U, S, Vt, uTest, K, True)

## Show Recommendations on Users Chosen

In [17]:
def show_recomendations(uTest, num_recomendations = 5):
    
    #iterate for every user
    for user in uTest:
        #design
        print('-'*70)                             
        print("Recommendation for user id {}".format(user))
        
        
        rank_value = 1    #if max 8recommendations are made, stop urself
        i = 0             #iterator
        
        
        while (rank_value <  num_recomendations + 1):
            so = uTest_recommended_items[user,i:i+1][0]   #go to each song
            
            #if song is not used previously by current user
            if (df.Profile_URL[(df.song_new_index == so) & (df.user_new_index == user)].count()==0):
                
                song_details = df[(df.song_new_index == so)].\
                    drop_duplicates('song_new_index')[['Song_Name']]
                
                print("The number {} recommended song is {}".format(rank_value, list(song_details['Song_Name'])[0]) )
                rank_value+=1  #1 recommended, increase the count
            i += 1

In [18]:
show_recomendations(uTest)

----------------------------------------------------------------------
Recommendation for user id 4
The number 1 recommended song is SOIBLKQ12AB0183E85
The number 2 recommended song is SOSROFB12AAF3B4C5D
The number 3 recommended song is SOULTKQ12AB018A183
The number 4 recommended song is SOBADEB12AB018275F
The number 5 recommended song is SOSDIHQ12A8C13C23F
----------------------------------------------------------------------
Recommendation for user id 5
The number 1 recommended song is SOLRGNF12AB0187CF4
The number 2 recommended song is SOYMIMI12AB0181E5C
The number 3 recommended song is SOQGVCS12AF72A078D
The number 4 recommended song is SODCLQR12A67AE110D
The number 5 recommended song is SOLHXCQ12A6D4F403E
----------------------------------------------------------------------
Recommendation for user id 16
The number 1 recommended song is SOADJQJ12A8C141D38
The number 2 recommended song is SODCADR12AF72A1A99
The number 3 recommended song is SOYMIMI12AB0181E5C
The number 4 recommende

## Verify:

In [None]:
df[df['user_new_index']==20].head(1)

In [21]:
kf = df.groupby('Song_Name').mean().reset_index()
kf.head(2)

display(kf[kf['Song_Name']=='SOLGPOU12A58A7EA20'])
display(kf[kf['Song_Name']=='SOPVQLJ12A67AE2281'])
display(kf[kf['Song_Name']=='SOXNZOW12AB017F756'])
display(kf[kf['Song_Name']=='SOLWZVR12AB01849C6'])
display(kf[kf['Song_Name']=='SOBADEB12AB018275F'])

Unnamed: 0,Song_Name,use_count,Followers,Posts,ER,year,Delta,Curr_ER,Comments,Likes,song_old_index,song_new_index,user_old_index,user_new_index
4402,SOLGPOU12A58A7EA20,3.596154,32710.307692,969.711538,2.5975,2004.0,1.722993,4.320493,11312.673077,60442.634615,31574.0,1765.0,4586.961538,2192.326923


Unnamed: 0,Song_Name,use_count,Followers,Posts,ER,year,Delta,Curr_ER,Comments,Likes,song_old_index,song_new_index,user_old_index,user_new_index
6190,SOPVQLJ12A67AE2281,2.703125,46486.15625,1281.0,2.367344,2000.0,1.427392,3.794736,10957.84375,65921.171875,22967.0,1100.0,3802.890625,1916.234375


Unnamed: 0,Song_Name,use_count,Followers,Posts,ER,year,Delta,Curr_ER,Comments,Likes,song_old_index,song_new_index,user_old_index,user_new_index
9025,SOXNZOW12AB017F756,2.240964,36082.036145,1315.53012,1.998795,0.0,1.483538,3.482333,8296.457831,50842.975904,10123.0,335.0,4228.915663,2250.518072


Unnamed: 0,Song_Name,use_count,Followers,Posts,ER,year,Delta,Curr_ER,Comments,Likes,song_old_index,song_new_index,user_old_index,user_new_index
4667,SOLWZVR12AB01849C6,2.634146,33336.341463,1102.817073,2.088293,2009.0,1.794056,3.882349,8808.426829,53947.585366,22411.0,1091.0,4374.878049,2183.02439


Unnamed: 0,Song_Name,use_count,Followers,Posts,ER,year,Delta,Curr_ER,Comments,Likes,song_old_index,song_new_index,user_old_index,user_new_index
389,SOBADEB12AB018275F,1.8,37622.325,1200.0125,2.180625,2009.0,1.635128,3.815753,10616.8625,69072.7125,20683.0,1056.0,4197.8125,2164.0875


## SVD in different Style

In [130]:
df1 = pd.read_excel('cleaned_data1.xlsx')
display(df1.head(2) )
df2 = pd.read_excel('cleaned_data2.xlsx')
display(df2.head(2) )

Unnamed: 0,Song,use_count,Name,Followers,Posts,Bio,URL,ER,title,release,artist_name,year,Delta,Curr_ER,Comments,Likes,Time
0,SOAKIMP12A8C130995,1,tejaswini wagh,83789,348,“ You'll never find peace of mind until you li...,https://www.instagram.com/waghtejaswini/,9.9,The Cove,Thicker Than Water,Jack Johnson,0,-7.719191,2.180809,1827,16445,3d
1,SOAKIMP12A8C130995,1,Shounak Nayak,7151,505,UK/India All posts my own unless stated. Email:,https://www.instagram.com/shounaknayak/,3.4,The Cove,Thicker Than Water,Jack Johnson,0,1.337458,4.737458,161,967,1d


Unnamed: 0,Song,use_count,Name,Followers,Posts,Bio,URL,ER,title,release,artist_name,year,Delta,Curr_ER,Comments,Likes,Time
0,SOPZKGR12A6D4F3F3A,1,RIDHIMA,3322,400,Wooden works by hand Shop: +998 97 543 0000 Ma...,https://www.instagram.com/ridz5014/,0.7,Red Red Wine (Edit),Original Hits - Party,UB40,2008,4.623247,5.323247,2750,13754,4w
1,SOPZKGR12A6D4F3F3A,1,,26241,912,Si vede bene solo con il cuore. L'essenziale è...,https://www.instagram.com/lady_golf_mk4/,0.3,Red Red Wine (Edit),Original Hits - Party,UB40,2008,3.376269,3.676269,9325,83928,4w


In [131]:
df = df1.append(df2, ignore_index=False)
df.tail(2)

Unnamed: 0,Song,use_count,Name,Followers,Posts,Bio,URL,ER,title,release,artist_name,year,Delta,Curr_ER,Comments,Likes,Time
44904,SOGCHYZ12AF72A69EC,2,Riya Deepsi,25955,515,Human being Email for any collaboration or enq...,https://www.instagram.com/riya_d_m/,3.2,That Tree (feat. Kid Cudi),That Tree Featuring Kid Cudi,Snoop Dogg featuring Kid Cudi,2010,-0.05021,3.14979,7721,38605,2w
44905,SOSSZPW12A8C13843D,20,Kashika Kapur Makeupartist,49130,1921,Dreamer of making this world a better place🙏🏻 ...,https://www.instagram.com/kashikakapurmua/,2.9,Figures,Dreams,The Whitest Boy Alive,2006,-2.33451,0.56549,92,833,1d


In [132]:
n_users = df.URL.nunique()
n_items = df.Song.nunique()

print('Num. of Users: '+ str(n_users))
print('Num of Movies: '+str(n_items))

Num. of Users: 4218
Num of Movies: 9921


* user_codes and song_codes

In [133]:
user_codes = df.URL.drop_duplicates().reset_index()
user_codes.rename(columns={'index':'user_old_index'}, inplace=True)
user_codes['user_new_index'] = list(user_codes.index)
user_codes.head(3)

Unnamed: 0,user_old_index,URL,user_new_index
0,0,https://www.instagram.com/waghtejaswini/,0
1,1,https://www.instagram.com/shounaknayak/,1
2,2,https://www.instagram.com/jdinstitute/,2


In [134]:
song_codes = df.Song.drop_duplicates().reset_index()
song_codes.rename(columns={'index':'song_old_index'}, inplace=True)
song_codes['song_new_index'] = list(song_codes.index)
song_codes.head(7)

Unnamed: 0,song_old_index,Song,song_new_index
0,0,SOAKIMP12A8C130995,0
1,9,SOBBMDR12A8C13253B,1
2,14,SOBXHDL12A81C204C0,2
3,62,SOBYHAJ12A6701BF1D,3
4,89,SODACBL12A8C13C273,4
5,134,SODDNQT12A6D4F5F7E,5
6,140,SODXRTY12AB0180F3B,6


In [135]:
df = pd.merge(df, song_codes,how='left')
df = pd.merge(df, user_codes,how='left')
df.head(3)

Unnamed: 0,Song,use_count,Name,Followers,Posts,Bio,URL,ER,title,release,...,year,Delta,Curr_ER,Comments,Likes,Time,song_old_index,song_new_index,user_old_index,user_new_index
0,SOAKIMP12A8C130995,1,tejaswini wagh,83789,348,“ You'll never find peace of mind until you li...,https://www.instagram.com/waghtejaswini/,9.9,The Cove,Thicker Than Water,...,0,-7.719191,2.180809,1827,16445,3d,0,0,0,0
1,SOAKIMP12A8C130995,1,Shounak Nayak,7151,505,UK/India All posts my own unless stated. Email:,https://www.instagram.com/shounaknayak/,3.4,The Cove,Thicker Than Water,...,0,1.337458,4.737458,161,967,1d,0,0,1,1
2,SOAKIMP12A8C130995,3,JD Institute of Fashion,16047,3120,"Empowering creative minds since 1988, JD insti...",https://www.instagram.com/jdinstitute/,0.5,The Cove,Thicker Than Water,...,0,0.130806,0.630806,1012,9110,4w,0,0,2,2


* spilt the dataset

In [136]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(df, test_size=0.25)

In [137]:
train_data

Unnamed: 0,Song,use_count,Name,Followers,Posts,Bio,URL,ER,title,release,...,year,Delta,Curr_ER,Comments,Likes,Time,song_old_index,song_new_index,user_old_index,user_new_index
884,SORWLTW12A670208FA,1,Always Dry India,1270,161,The last Car Coat you'll ever need !!!! Mail: ...,https://www.instagram.com/alwaysdryindia/,5.60,The Middle,Bleed American,...,2001,-3.716298,1.883702,191,1722,3w,871,28,884,825
28678,SOXEZLY12A8C137AB0,1,Serendipity Delhi,51600,2983,INDIA'S TOP 50 BEST BOUTIQUES - CNT Travel ins...,https://www.instagram.com/serendipitydelhi/,1.30,Your Hand In Mine,The Earth Is Not A Cold Dead Place,...,2003,3.664397,4.964397,12808,89657,2w,28672,1226,2575,1958
31711,SOMESIV12A6D4FC6F2,5,Rushikesh pawar,26004,45,⬛ Luxury Lifestyle Young and Hot business Man ...,https://www.instagram.com/millionboy7/,1.90,The Kindness Of Strangers,The World We Live In,...,2006,-0.439310,1.460690,3657,29261,4w,31711,1474,350,343
89181,SOZAQGS12A6D4FB4F5,3,Drink Mumbai,22028,290,Curating the city's best drinks 𝙆𝙚𝙚𝙥 𝙮𝙤𝙪𝙧 𝙝𝙚𝙖𝙙...,https://www.instagram.com/drinkmumbai/,1.00,Hollow (LP Version),Vulgar Display Of Power,...,1992,0.750042,1.750042,5782,28912,4w,39181,8298,354,347
52902,SOCWJDB12A58A776AF,2,"Vikas Sharma |Delhi, India 🇮🇳|",41561,2027,Combination of #food & #fitness Use #virus_sha...,https://www.instagram.com/virus_sharma/,1.50,Never Gonna Give You Up,Big Tunes - Back 2 The 80s,...,1987,4.631113,6.131113,2548,22933,3d,2888,3292,878,821
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55097,SOCNYYO12A6D4F910B,1,Mishti😘,3523,316,*Miss.LAD17-18#MissBeautifulHair Title Award❤️...,https://www.instagram.com/sakshi657/,0.00,Conspiracy (Album Version),All We Know Is Falling,...,0,3.833838,3.833838,450,4051,1w,5091,3524,2358,1826
2626,SOAUWYT12A81C206F1,4,Rajat Panđey 'Amît' 🇮🇳,4024,557,||Celebrity EMCEE| Compere| Voice over Artist|...,https://www.instagram.com/ksvde_rajat/,0.01,Undo,Vespertine Live,...,2001,2.427256,2.437256,326,1961,1w,2369,88,210,205
17495,SONNSYV12A8C146BEC,1,,2669,410,➴ Marketing & Communications @hyattregencyahme...,https://www.instagram.com/geevargram/,3.70,Float On,Float On,...,2003,3.846034,7.546034,1074,9667,2w,17418,779,7250,3498
5010,SOUSMXX12AB0185C24,1,Aamir_hameeda,65718,189,Facebook page aamirandhameeda partner @aamirsh...,https://www.instagram.com/aamir_hameeda/,4.30,OMG,OMG - The Remixes,...,2010,0.387174,4.687174,3422,17112,2d,4898,106,1404,1229


* create matrix

In [138]:
#Create two user-item matrices, one for training and another for testing
train_data_matrix = np.zeros((n_users, n_items))
for line in train_data.itertuples():
    train_data_matrix[line[21]-1, line[19]-1] = line[13] 

test_data_matrix = np.zeros((n_users, n_items))
for line in test_data.itertuples():
    test_data_matrix[line[21]-1, line[19]-1] = line[13]

* define root mean square error

In [139]:
from sklearn.metrics import mean_squared_error
from math import sqrt
def rmse(prediction, ground_truth):
    prediction = prediction[ground_truth.nonzero()].flatten() 
    ground_truth = ground_truth[ground_truth.nonzero()].flatten()
    return sqrt(mean_squared_error(prediction, ground_truth))

* svds for prediction

In [140]:
#get SVD components from train matrix. Choose k.
u, s, vt = svds(train_data_matrix, k = 100)
s_diag_matrix=np.diag(s)
X_pred = np.dot(np.dot(u, s_diag_matrix), vt)
print('SVD MSE: ' + str(rmse(X_pred, test_data_matrix)))

User-based CF MSE: 2.8741408852921997


# End of the Project 

## Extra Stuff: Please Ignore

In [141]:
def mapk(actual, predicted, k=10):
    """
    Computes the mean average precision at k.
    This function computes the mean average prescision at k between two lists
    of lists of items.
    Parameters
    ----------
    actual : list
             A list of lists of elements that are to be predicted 
             (order doesn't matter in the lists)
    predicted : list
                A list of lists of predicted elements
                (order matters in the lists)
    k : int, optional
        The maximum number of predicted elements
    Returns
    -------
    score : double
            The mean average precision at k over the input lists
    """
    return np.mean([apk(a,p,k) for a,p in zip(actual, predicted)])

In [142]:
def apk(actual, predicted, k=10):
    """
    Computes the average precision at k.
    This function computes the average prescision at k between two lists of
    items.
    Parameters
    ----------
    actual : list
             A list of elements that are to be predicted (order doesn't matter)
    predicted : list
                A list of predicted elements (order does matter)
    k : int, optional
        The maximum number of predicted elements
    Returns
    -------
    score : double
            The average precision at k over the input lists
    """
    if len(predicted)>k:
        predicted = predicted[:k]

    score = 0.0
    num_hits = 0.0
    
    #i from [0 to 19]
    for i,p in enumerate(predicted):
        #if the current is in actual and not yet included in predicted then increase count and add it to score
        if p in actual and p not in predicted[:i]:
            num_hits += 1.0
            score += num_hits / (i+1.0)

    if not actual:
        return 0.0

    return score / min(len(actual), k)