# Collaborative Filtering

In this CF method, I am exploring the model based CF method through matrix factorization. This will build predictions for user-item rankings 


## Preliminaries

### Imports

In [1]:
import os
import pickle

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split,KFold

import warnings
warnings.filterwarnings('ignore')

### Random Seed

In [2]:
seed=5543
np.random.seed(seed)

### Loading Data

In [3]:
#Parameterize directory path 
raw_dir="../raw/ml-1m"
raw_file="/ratings.dat"

#In CF techniques, reading only the user-item interaction data: ratings
filename=raw_dir+raw_file
data_all=pd.read_csv(filename,sep="::",header=None,names=["userId","movieId","rating","TimeStamp"])
data_all.head()

Unnamed: 0,userId,movieId,rating,TimeStamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [4]:
#Quick Data Stats
print("Dimension",data_all.shape)
print("Unique Users",data_all['userId'].nunique())
print("Unique Movie",data_all['movieId'].nunique())
print("Data Sparsity {0:.2f}".format(len(data_all)*100/(data_all['userId'].nunique()*data_all['movieId'].nunique())))

Dimension (1000209, 4)
Unique Users 6040
Unique Movie 3706
Data Sparsity 4.47


### Train Test Split

Note: Production product to treat new users/movies separately

In [5]:
#Currently, the new users or new movies are asigned at random
userEncoder=LabelEncoder()
movieEncoder=LabelEncoder()

In [6]:
# Creating arrays of users, movies and ratings
users_all=userEncoder.fit_transform(data_all[["userId"]].values.ravel())
movies_all=movieEncoder.fit_transform(data_all[["movieId"]].values.ravel())
ratings_all=data_all[["rating"]].values.ravel()

In [7]:
# Create train, validation & test datasets
users,users_test,movies,movies_test,ratings, ratings_test=train_test_split(users_all,movies_all,ratings_all,test_size=0.15)
users_train,users_val, movies_train,movies_val, ratings_train,ratings_val=train_test_split(users,movies,ratings,test_size=0.15)
print("Training Size", len(users_train))
print("Validation Size", len(users_val))
print("Test Size", len(users_test))


Training Size 722650
Validation Size 127527
Test Size 150032


In [8]:
#Create labeling and counts of unique users & movies
unique_users=userEncoder.classes_
unique_movies=movieEncoder.classes_

N_users=len(unique_users) # all users
N_movies=len(unique_movies) # all movies
print("N_users",N_users)
print("N_movies",N_movies)

N_users 6040
N_movies 3706


## Collaborative Filtering




Collaborative filtering
analyzes relationships between users and
interdependencies among products to
identify new user-item associations.
It's major appeal is that it is domain free, yet it can
address data aspects that are often elusive
and difficult to profile using content filtering. 

### Matrix Factorization


A successful approach to the prediction of the ranking matrixri,uis basedon amatrix decompositionmodel, where our ranking predictions are:

\begin{equation}
	\hat{r}_{i,u} =\mu + b_u + b_i + p_u^T q_i
\end{equation}


The model parameters θ= (μ,bu,bi,pu,qi) are defined as:

*μ  Mean rating, its the average rating of all users over all movies in our trainingset
bu User Bias, it will  be  higher  for  users  that  give  high average ratings to all movies
bi Item Bias, it will be higher for the more popular (higherranked) movies
pu User Embedding, a user F-dimensional vector that maps user u into some kind of abstract taste space
qi Item Embedding, an item F-dimensional vector that maps itemsi into the taste space

### Mean Rating

Mean rating over the training set is 
	\begin{equation}
			\mu = \frac{1}{N_\mathcal{T}}\sum_{(i,u)\in\mathcal{T}} r_{u,i}
		\end{equation}
And, we define the differential rating

\begin{equation}
    \Delta r_{u,i} = r_{u,i} - \mu
\end{equation}

In [9]:
# de-meaning the train ratings array
mu=ratings_train.mean()
drating=np.mean((ratings_train-mu)**2) 
print("overall mean rating of training data:", round(mu,2))


overall mean rating of training data: 3.58


### Parameter Initialization

We implement the following initialization
\begin{align}
	b_u^0  &\sim \mathcal{N}(0, 10^{-4}) \\
	b_i^0  &\sim \mathcal{N}(0, 10^{-4}) \\
	p_{u,f}^0  &\sim \mathcal{N}\left(0, \frac{1}{\max({1},\sqrt{F})}\right) \\
	q_{i,f}^0  &\sim \mathcal{N}\left(0, \frac{1}{\max({1},\sqrt{F})}\right) \\
\end{align}

In [10]:
def initialize_params(F,N_users,N_movies):
    b_users=np.random.normal(0,0.0001,N_users)
    b_movies=np.random.normal(0,0.0001,N_movies)
    p_users=np.random.normal(0,1/max(1,np.sqrt(F)),(N_users,F))
    p_movies=np.random.normal(0,1//max(1,np.sqrt(F)),(N_movies,F))
    return b_users,b_movies,p_users,p_movies

In [11]:
F=2

In [12]:
#initializing params for the unique users and movies
b_users,b_movies,p_users,p_movies=initialize_params(F,N_users,N_movies) 
params=[mu,b_users,b_movies,p_users,p_movies]

In [13]:
b_users.shape, b_movies.shape , p_users.shape , p_movies.shape

((6040,), (3706,), (6040, 2), (3706, 2))

### Rating Model

The model prediction for $r$ is given by
\begin{equation}
	\hat{r}_{i,u} =\mu + b_u + b_i + p_u^T q_i
\end{equation}

The interaction dot  product pu*qi drives  users  and  items  pointing  in  nearly parallel directions in the latent space towards higher ratings. The  dimension F of  the  latent  space is  a  modelhyper-parameter that will be chosen by cross validation.



```Popularity Model```
In  the  special  case F=0,  where  there  is  no  interaction  term,  the relative ranking of items are the same for all users ru,i−ru,i′=bi−bi′ and we say that items are ranked by popularity.

```Personalized Model```
A recommender system that wishes to providepersonalized rankings instead of just suggesting the same items to all users needs an interaction term, because the preferences of each user for particular kinds of items is encoded exclusively on the interaction term.

In [14]:
def predict_rating(users,movies,params):
    mu,b_users,b_movies,p_users,p_movies=params 
    b_u=b_users[users] # is label encoding at work here? as the dim are of mismatch due to subsetting
    b_m=b_movies[movies]
    p_u=p_users[users]
    p_m=p_movies[movies]
    prod=np.sum(p_u*p_m,axis=1)
    r_hat=mu+b_u+b_m+prod
    return r_hat

In [15]:
R=predict_rating(users_train,movies_train,params)
R[:10]

array([3.5811178 , 3.58137423, 3.58115786, 3.58146872, 3.58122582,
       3.58112864, 3.58093923, 3.58123367, 3.58094814, 3.58119053])

### Evaluation Metric: MSE

The error function is 
\begin{equation}
	L(\theta;\{r\}) = \frac{1}{N_\mathcal{S}} \sum_{{u,i}\in \mathcal{S}} \left( r_{u,i} - \hat{r}_{u,i}\right)^2
\end{equation}


Assess performance using the mean square error between observed rankings ru,i and the predicted rankings ˆru,i

For a test set S of observations (u,i) that have not been use during model training. NS is the number of elements of S, and θ is a vector of model parametersthat we will have to learn from the pairs (u,i) in the training set T.

In [16]:
def rating_error(users,movies,rating,params0):
    dr=rating-predict_rating(users,movies,params0)
    return np.mean(dr**2)

In [17]:
rating_error(users_train,movies_train,ratings_train,params)

1.248742966734941

In [18]:
batch_size = 10
N = 100
for i1 in range(0,N,batch_size):
    print(i1)

0
10
20
30
40
50
60
70
80
90


### Loss Function

The loss function is 
\begin{equation}
	L(\ b_u,b_i,p_u,q_i;\{r\}) = \frac{1}{N_\mathcal{S}} \sum_{{u,i}\in \mathcal{S}} \left( r_{u,i} - \hat{r}_{u,i}\right)^2 + \frac{λ}{2}(p_u^2 +p_q^2)
\end{equation}

We can write the mean square error loss function expliclity in terms of the model parameters, including a regularization penalty λ


where, for convenience, we have assumed no regularization penalty for the biases

### Learning Step

We will implement a step of stochastic gradient descent as:
\begin{equation}
θ_{t+1}=θ_{t}−\frac{γ∂L}{∂θ}
\end{equation}


We  can  learn  the  model  parametersθ=  (bu,bi,pu,qi)  by  gradient  descent. Assuming a learning rate γ we have

\begin{align}
	b_u & \leftarrow  b_u +  \gamma \Delta r_{u,i} \\
	b_i &\leftarrow b_i + \gamma \Delta r_{u,i} \\
	p_u & \leftarrow p_u + \gamma\left(q_{i}\Delta r_{u,i} - \lambda p_u\right) \\
	q_i &\leftarrow q_i + \gamma \left(p_{u}\Delta r_{u,i} - \lambda q_u \right)  
	\label{eq:step}
\end{align}


In [19]:
def learning_step(user,movie,rating,parms0,penalty,batch_size, learning_rate):
    N=len(rating)
    mu,b_users,b_movies,p_users,p_movies=parms0
    perm=np.random.permutation(len(rating)) # shuffling the order
    for i1 in range(0,N,batch_size):
        idx=perm[i1:i1+batch_size]  # defining the batches    
        u=user[idx] # creating batches of user, movie & corresponding ratings
        m=movie[idx]
        r=rating[idx]
        # running on batches
        b_u=b_users[u] 
        b_m=b_movies[m]
        p_u=p_users[u]
        p_m=p_movies[m]
        prod=np.sum(p_u*p_m,axis=1)
        r_hat=mu+b_u+b_m+prod
        dr=r-r_hat
        # all parameters being updated for the batch 
        b_users[u] +=learning_rate*(dr) 
        b_movies[m]+=learning_rate*(dr) 
        p_users[u] +=learning_rate*(dr[:,np.newaxis]*p_m-penalty*p_u) 
        p_movies[m]+=learning_rate*(dr[:,np.newaxis]*p_u-penalty*p_m) 
    return 

### Training Function

Given the hyperparameters we just train for a fixed number of epochs


In [20]:
def fit_ratings(users_train,movies_train,ratings_train,users_val,movies_val,ratings_val,
                F,learning_rate,penalty,steps,batch_size):
    
    mu=ratings_train.mean()
    b_users,b_movies,p_users,p_movies=initialize_params(F,N_users,N_movies)
    parms=[mu,b_users,b_movies,p_users,p_movies]
    for i1 in range(steps):  # for each iteration 
        # calculate loss
        loss=rating_error(users_train,movies_train,ratings_train,parms)  
        # calculate gradient : this step updates the param
        learning_step(users_train,movies_train,ratings_train,parms,penalty,batch_size,learning_rate)
        if i1 % (steps//10)==0:
            val_loss=rating_error(users_val,movies_val,ratings_val,parms)
            print("\t",i1,loss,val_loss)
    loss=rating_error(users_train,movies_train,ratings_train,parms)
    val_loss=rating_error(users_val,movies_val,ratings_val,parms)
    print("\tFinal",loss,val_loss)
    return val_loss,parms

### Model Hyper-parameters

In [21]:
learning_rate=0.005
penalty=0.1
batch_size=50
steps=200

### Train Popularity Model

Here we asume the embedding spaceis not present $F=0$

In [22]:
loss,params=fit_ratings(users_train,movies_train,ratings_train,
            users_val,  movies_val, ratings_val,
           0,learning_rate,penalty,10,batch_size
           )

	 0 1.248746642725307 0.8909740526539852
	 1 0.8789293209282619 0.8551414073649105
	 2 0.8413121945710145 0.8422269447426097
	 3 0.827094512576697 0.8356778554860488
	 4 0.8199053667259332 0.831931600744101
	 5 0.8155451055453977 0.8298184162152527
	 6 0.8130133859324009 0.8284141083235405
	 7 0.8111815395205997 0.8274597323768608
	 8 0.8099879740889979 0.8269773267236272
	 9 0.8090191069252736 0.8264105330218547
	Final 0.8081858378145389 0.8264105330218547


### Train Model with interaction Term

Here we asume the embedding space has dimension $F=2$.

In [23]:
loss,params=fit_ratings(users_train,movies_train,ratings_train,
            users_val,  movies_val, ratings_val,
           F,learning_rate,penalty,steps,batch_size
           )

	 0 1.2487413272977004 0.8919647278968952
	 20 0.8005053824001939 0.8255740981556862
	 40 0.7887831455433789 0.8150789791241544
	 60 0.758732024001011 0.7911134361845616
	 80 0.7444282606397932 0.781016275974047
	 100 0.737458723836546 0.7765548494867625
	 120 0.7336375008227668 0.7746195277755272
	 140 0.7312443829111536 0.7729298978347575
	 160 0.7299435936606753 0.7722617994295322
	 180 0.7288682180958788 0.7721030687230466
	Final 0.7282453362354298 0.7719968017142065


### Colaborative Filter Model

We graph the hyperparameters, training and prediction in a single model for ease of use.

In [25]:
class Recommender:
    def __init__(self,F,penalty,learning_rate,steps,batch_size):
        self.F=F
        self.penalty=penalty
        self.learning_rate=learning_rate
        self.steps=steps
        self.batch_size=batch_size
    def fit(self,users,movies,ratings, users_val,movies_val,ratings_val):
        print(self.learning_rate)
        loss,params=fit_ratings(users,movies,ratings,
                                users_val,movies_val,ratings_val,
                                self.F,self.learning_rate,self.penalty,self.steps,self.batch_size
                               )
        self.params=params
        return loss
    def predict(users,movies):
        return predict_ratings(users,movies,self.params)
        

In [26]:
model=Recommender(F=5,penalty=0.1,learning_rate=0.05,steps=10,batch_size=50)

model.fit(users_train,movies_train,ratings_train,users_val,movies_val,ratings_val)

0.05
	 0 1.2487469417709074 0.862490541990662
	 1 0.8372222075003591 0.8541518090455639
	 2 0.8289942646731444 0.841746795804403
	 3 0.8119672643828387 0.8255759124791926
	 4 0.7958220560133544 0.8199763143835254
	 5 0.7877187927183887 0.8161607958539047
	 6 0.7804145261714563 0.8133187552081782
	 7 0.7726392392335325 0.8091079521261225
	 8 0.764157733836076 0.806333153899414
	 9 0.7594965349311226 0.8055844535672791
	Final 0.7554680341329729 0.8055844535672791


0.8055844535672791

## Parameter Search

We do a grid search using a single validation set to find the range of penalties and embedding dimension that seems to perform best

In [None]:
results=[]
best_loss=1e10
best_F=None
best_penalty=None

if True:
 for F in [1,5,10,20,30,50,100,150]:
    for penalty in [0,0.01,0.05,0.1,0.15,0.2,1]:
        print()
        print(f"F {F}, penalty {penalty} :")
        model=Recommender(F,penalty,learning_rate,steps,batch_size)
        loss=model.fit(users_train,movies_train,ratings_train,
                               users_val,movies_val,ratings_val)
        results.append((F,penalty,loss))
        if loss<best_loss:
            best_loss=loss
            best_F=F
            best_penalty=penalty
        print()
        print(f"==> {F},{penalty},{loss} == best ({best_F},{best_penalty},{best_loss}) =============")


F 1, penalty 0 :
0.005
	 0 2.2344801906517 0.959256220584027
	 20 0.8002396962671804 0.8312209200203222
	 40 0.7803912003181386 0.8140237061299194
	 60 0.7595285194941104 0.794964023337011
	 80 0.7492251876246103 0.7877851508586758
	 100 0.7443787589575674 0.7849951256437588
	 120 0.7419771298225459 0.7833840405070284
	 140 0.7405735156601039 0.7831632036380954
	 160 0.7397415128263173 0.7827135662563025
	 180 0.7393891671239669 0.7828090357866355
	Final 0.7389985939702544 0.783025690314756


F 1, penalty 0.01 :
0.005
	 0 2.1727577648255756 0.9619170246177703
	 20 0.799817323624675 0.8314801540021648
	 40 0.7744193903725163 0.8087748595117703
	 60 0.7539163386299534 0.7913928340905486
	 80 0.745699449495427 0.7857732181669718
	 100 0.7423456279385277 0.7836465732214088
	 120 0.7406458363490719 0.7829601330529306
	 140 0.739913160501363 0.7825950475279763
	 160 0.7393758644733875 0.7824914394617295
	 180 0.7391145232808307 0.7824549477446708
	Final 0.7389318797554232 0.782245544238934


	 20 0.6946459172164565 0.8059196806693047
	 40 0.6178624498328793 0.7877722701764298
	 60 0.5851009917881986 0.785602886001359
	 80 0.5682867103908575 0.7880250157092928
	 100 0.5586461648801608 0.7910344514433932
	 120 0.5523904894296855 0.79515608307838
	 140 0.5481496991741914 0.7987258318116642
	 160 0.5451225068111829 0.8024894688294515
	 180 0.5427293886802181 0.8066435594603161
	Final 0.5409430577934915 0.8098907319819626


F 10, penalty 0.01 :
0.005
	 0 1.2487413560250222 0.891063950697708
	 20 0.6950510096497438 0.7948938380044696
	 40 0.6146140700633023 0.7760206381607712
	 60 0.583601459746858 0.7734589120928691
	 80 0.5687043496305516 0.7748738339199879
	 100 0.560034358370441 0.7779814149876839
	 120 0.554586915595641 0.7804912043460952
	 140 0.550685478572145 0.7828372783959465
	 160 0.5478774390083434 0.7855160850725289
	 180 0.5456484844098666 0.7875930298983886
	Final 0.5438713206257556 0.7895522686935582


F 10, penalty 0.05 :
0.005
	 0 1.2487444407401112 0.891164793

	 40 0.4428035244004919 0.8766881417649168
	 60 0.3981245577373359 0.9187884261731454
	 80 0.3759688516452823 0.9530717618113255
	 100 0.3626812795005577 0.9832192241437021
	 120 0.35373335097595715 1.0091397575345704
	 140 0.34719719761503803 1.0333565597768315
	 160 0.3422753898100813 1.0548251505823083
	 180 0.3383466906082895 1.074290908788388
	Final 0.33514100322162094 1.0920977661389086


F 30, penalty 0.01 :
0.005
	 0 1.2487399806463462 0.8905193005811562
	 20 0.58305582096466 0.7943801435677353
	 40 0.4535569069469815 0.8225152134017165
	 60 0.40773785047171446 0.8504066700599652
	 80 0.38534176731043573 0.8724005904314591
	 100 0.37202087030725695 0.8892364430771884
	 120 0.3631411550199517 0.9037442831317904
	 140 0.35671681195685345 0.9150977350921367
	 160 0.35192093900623833 0.9252738651925596
	 180 0.34816515374816503 0.9339123262631603
	Final 0.3450455189007698 0.9412031657122283


F 30, penalty 0.05 :
0.005
	 0 1.248746408122378 0.8905333583827507
	 20 0.693830731407255

In [35]:
fits=pd.DataFrame(results,columns=["F","penalty","val_rms"])
fits.head()

Unnamed: 0,F,penalty,val_rms
0,1,0.0,0.782696
1,1,0.01,0.782128
2,1,0.05,0.781258
3,1,0.1,0.787065
4,1,0.15,0.798355


In [36]:
summary=pd.pivot_table(fits,index="F",columns="penalty",values="val_rms")
summary

penalty,0.0,0.01,0.05,0.1,0.15,0.2,1.0
F,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,0.782696,0.782128,0.781258,0.787065,0.798355,0.815372,0.825193
5,0.764203,0.758633,0.741395,0.754068,0.788218,0.813963,0.825245
10,0.813056,0.787762,0.737906,0.747374,0.786918,0.813867,0.825278
20,0.952808,0.870673,0.749341,0.744736,0.787138,0.813556,0.82529
30,1.083229,0.935764,0.754461,0.742779,0.786446,0.813682,0.825612
50,1.302623,1.032195,0.754754,0.741674,0.786159,0.813525,0.825358
100,1.582101,1.053754,0.748517,0.741486,0.786485,0.813546,0.825746
150,1.522999,0.979739,0.738657,0.741149,0.786583,0.813672,0.825255


In [37]:
best=fits.iloc[fits["val_rms"].idxmin()]
best

F          10.000000
penalty     0.050000
val_rms     0.737906
Name: 16, dtype: float64

In [38]:
import seaborn as sns

cm = sns.light_palette("#60FF60", reverse=True, as_cmap=True)
s = summary.style.highlight_min(axis=None)
s

penalty,0.0,0.01,0.05,0.1,0.15,0.2,1.0
F,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,0.782696,0.782128,0.781258,0.787065,0.798355,0.815372,0.825193
5,0.764203,0.758633,0.741395,0.754068,0.788218,0.813963,0.825245
10,0.813056,0.787762,0.737906,0.747374,0.786918,0.813867,0.825278
20,0.952808,0.870673,0.749341,0.744736,0.787138,0.813556,0.82529
30,1.08323,0.935764,0.754461,0.742779,0.786446,0.813682,0.825612
50,1.30262,1.0322,0.754754,0.741674,0.786159,0.813525,0.825358
100,1.5821,1.05375,0.748517,0.741486,0.786485,0.813546,0.825746
150,1.523,0.979739,0.738657,0.741149,0.786583,0.813672,0.825255


In [39]:
F_best=int(best["F"])
penalty_best=best["penalty"]
F_best,penalty_best

(10, 0.05)

## Cross Validation

It seems $F\approx 10$ with penalty around 0.05 gives best results.

We use 5-Fold cross validation to find the optimal value of $F$

In [40]:
K=5
kfold=KFold(K,shuffle=True)
folds=[]
for fold in kfold.split(users):
    folds.append(fold)

In [41]:
def ratings_cross_validate(model,users,movies,ratings,folds):
    accuracies=[]
    count=0
    for train,val in folds:
        print()
        print("============= Fold",count+1,"===========")
        users_train=users[train]
        movies_train=movies[train]
        ratings_train=ratings[train]
        users_val=users[val]
        movies_val=movies[val]
        ratings_val=ratings[val]
        loss=model.fit(users_train,movies_train,ratings_train,
                                   users_val,movies_val,ratings_val)
        
        accuracies.append(loss)
        print("======= fold",count+1,"loss",loss,"============")
        print()
        count+=1
    accuracies=np.array(accuracies)
    return accuracies.mean()

In [58]:
results=[]
best_loss=1e10
best_F=None
steps=100
if True:
  for F in [5,10,15]: 
        print()
        print(f"F {F} :")
        model=Recommender(F,best_penalty,learning_rate,steps,batch_size)
        loss=ratings_cross_validate(model,users,movies,ratings,
                               folds)
        results.append((F,loss))
        if loss<best_loss:
            best_loss=loss
            best_F=F
        print()
        print(f"==> {F},{penalty},{loss} == best ({best_F},{best_penalty},{best_loss}) =============")


F 5 :

	 0 1.2485533530336144 0.8932260820557525
	 10 0.792153645151391 0.8284441461561335
	 20 0.7596851350083867 0.8073201016757684
	 30 0.7291479115139694 0.7905257533022896
	 40 0.7057719194647919 0.7788425942450067
	 50 0.6892993059143258 0.7706883615736058
	 60 0.6777306704150264 0.7652558488252055
	 70 0.6692703522347676 0.7611975269798796
	 80 0.6627616923298618 0.7585839984881941
	 90 0.6580722714403028 0.7560365320775337
	Final 0.6544426100756603 0.754407832790789


	 0 1.2478524832474827 0.8962957102144979
	 10 0.7922599406899052 0.8333267782960897
	 20 0.7670963705646401 0.8174016801086219
	 30 0.7343180194957236 0.7978551599139871
	 40 0.7100195764880174 0.7845717707513513
	 50 0.6914439061011414 0.7748161832518049
	 60 0.6778810060704996 0.7676659654676805
	 70 0.6686360374904579 0.763427531750924
	 80 0.6619130533879918 0.760341507843067
	 90 0.6572000510507808 0.758578890886326
	Final 0.6536456243057528 0.757479807183501


	 0 1.2492642792078534 0.8947221627768374
	 10

	 50 0.6009716181301853 0.7580625474067376
	 60 0.582379256415953 0.7558669642638263
	 70 0.5692704228510136 0.7543656458231661
	 80 0.5596501254639026 0.7537558141946534
	 90 0.5524028808630612 0.7537394345606137
	Final 0.5466941424695451 0.7537372173462676




## Test Best model

Seems best model is really $F=10$ with penalty $0.02$, so we test performance on the test set

In [59]:
model=Recommender(best_F,penalty_best,learning_rate,steps,batch_size)
loss=model.fit(users,movies,ratings,users_test,movies_test,ratings_test)
print(best_F,penalty_best,loss)

	 0 1.2487151147154587 0.8734996712312967
	 10 0.7854459353697039 0.8168954476243606
	 20 0.7298827082827767 0.7789219089035535
	 30 0.6857647008788721 0.7546617468115456
	 40 0.6565707493132016 0.7414681641613389
	 50 0.6377613938778485 0.7344088473281033
	 60 0.6254117611629383 0.7306500346148762
	 70 0.6170301391922364 0.7286495844998203
	 80 0.611048728955314 0.7274774636837298
	 90 0.606653455554716 0.7263877950958051
	Final 0.6032863549775624 0.7261302778562422
10 0.05 0.7261302778562422


In [60]:
loss

0.7261302778562422

We have achieved a $\approx 0.73$ mean square error, a 12% improvement in accuracy over the 0.83 mean square error of the popularity model. 