## Group 6:
### AU1741001: Akash Tike
### AU1741011: Smit Mandavia
### AU1741068: Parth Maniyar
### AU1741095: Shaunak Vyas

## collaborative filtering (stochastic gradient descent for matrix facotrization)

In [0]:
# libraries

import numpy as np
import random

In [0]:
#100 items and 500 users are taken
n = 100
m = 500

#randomly ratings are given to random items by each user from 1 to 10
R = np.zeros((m, n))
for i in range(m):
    #randomly choosing number of rated items per user
    items = random.randint(0,n//2)
    for j in range(items):
        R[i][random.randint(0,n-1)]=random.randint(1,10)
        

print("Rating matrix R:")
print(R)
print("shape of R:",R.shape)

Rating matrix R:
[[0. 0. 0. ... 0. 0. 0.]
 [7. 0. 4. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 ...
 [9. 9. 0. ... 0. 5. 6.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 6. 0. 0.]]
shape of R: (500, 100)


In [0]:
#binary matrix to check if any user has given rating or not
is_rated = (R != 0)*1

print("Binary matrix to check if the item is rated by a real user or not")
print(is_rated)

Binary matrix to check if the item is rated by a real user or not
[[0 0 0 ... 0 0 0]
 [1 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 0]
 ...
 [1 1 0 ... 0 1 1]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 1 0 0]]


## algorithm for stochastic gradient descent for matrix factorization
<img src="image4.jpg">

In [0]:
#number of hidden features of items is assumed as 6
k = 6

print("\nnumber of hidden features: k=",k)

#randomly generated features matrix for each item and user preference matrix for each user
V = np.random.uniform( low=0,high=5, size = (k,n) )
U = np.random.uniform( low=0,high=5, size = (k,m) )

print("shape of User matrix U:",U.shape)
print("shape of Item matrix V:",V.shape) 

#learning rate
eta = 0.01

#regularization parameter to control overfitting
lamda = 30

#number of iterations as stoping criteraia 
epoch=50

initial_prediction = np.dot(U.T,V) 

print("\n\nParameters and hyperparameters:")
print("learning rate: eta=",eta)
print("regularization parameter: lamda=",lamda)
print("number of iterations: epoch=",epoch)


print("\n\ninitial error in prediction, before training")
print("MAE: ",np.sum(abs(R-initial_prediction)*is_rated)/np.sum(is_rated))
print("MSE: ",np.sum((np.square(R - initial_prediction)*is_rated))/np.sum(is_rated))


print("\n\ntraining model")
for itr in range(epoch):
    for j in range(n):
        for i in range(m):
            if R[i][j]!=0:
                
                new_Ui = U[:,i] - 2*eta*((np.dot(U[:,i],V[:,j])-R[i][j])*V[:,j] + (lamda/m)*U[:,i])
                new_Vj = V[:,j] - 2*eta*((np.dot(U[:,i],V[:,j])-R[i][j])*U[:,i] + (lamda/n)*V[:,j])
                                
                U[:,i] = new_Ui
                V[:,j] = new_Vj
                
    if (itr+1)%10==0:
        print("\nerror after epoch",itr+1)
        new_prediction = np.dot(U.T,V)
        print("MAE: ",np.sum(abs(R-new_prediction)*is_rated)/np.sum(is_rated))
        print("MSE: ",np.sum((np.square(R - new_prediction)*is_rated))/np.sum(is_rated))

        
new_prediction = np.dot(U.T,V)

print("\n\nfinal error in prediction, after training")
print("MAE: ",np.sum(abs(R-new_prediction)*is_rated)/np.sum(is_rated))
print("MSE: ",np.sum((np.square(R - new_prediction)*is_rated))/np.sum(is_rated))


number of hidden features: k= 6
shape of User matrix U: (6, 500)
shape of Item matrix V: (6, 100)


Parameters and hyperparameters:
learning rate: eta= 0.01
regularization parameter: lamda= 30
number of iterations: epoch= 50


initial error in prediction, before training
MAE:  33.24184111438905
MSE:  1316.6309888294961


training model

error after epoch 10
MAE:  2.318024850162026
MSE:  7.942661538710121

error after epoch 20
MAE:  2.1298809724415744
MSE:  6.725962836587995

error after epoch 30
MAE:  2.074547006567158
MSE:  6.422227275662345

error after epoch 40
MAE:  2.0461670968600947
MSE:  6.285359718979099

error after epoch 50
MAE:  2.028758431196917
MSE:  6.203435816071627


final error in prediction, after training
MAE:  2.028758431196917
MSE:  6.203435816071627
