# Recommendation with User Ratings (Explicit Feedback) 
In this part, we focus on the rating prediction recommendation task with explicit feedback. We will:

* load and process the MovieLens 1M dataset, 
* build a baseline estimation model,
* build a collaborative filtering model,
* build a matrix factorization model,
* and try to improve upon these models.

First, we need to load and preprocess the experiment dataset. We use the MovieLens 1M data from https://grouplens.org/datasets/movielens/1m/ in this homework. The code has been provided in the next cell, and you need to run it. The resulting data variables are: train_mat is the numpy array variable for training data of size (#users, #items) with non-zero entries representing user-item ratings, and zero entries representing unknown user-item ratings; and test_mat is the numpy array variable for testing data of size (#users, #items).

In [None]:
# if you're using colab, this is a clunky way to load the ratings.dat file we need
# navigate in your finder to ratings.dat when asked
from google.colab import files
uploaded = files.upload()

Saving ratings (1).dat to ratings (1).dat


In [None]:
import io
import pandas as pd
import numpy as np
from scipy.sparse import coo_matrix

#data_df = pd.read_csv(io.BytesIO(uploaded['ratings (1).dat']), sep='::', names=["UserID", "MovieID", "Rating", "Timestamp"])

# if you are running this notebook locally, you can replace above with something like this:
data_df = pd.read_csv('./ratings.dat', sep='::', names=["UserID", "MovieID", "Rating", "Timestamp"])

# First, generate dictionaries for mapping old id to new id for users and movies
unique_MovieID = data_df['MovieID'].unique()
unique_UserID = data_df['UserID'].unique()
j = 0
user_old2new_id_dict = dict()
for u in unique_UserID:
    user_old2new_id_dict[u] = j
    j += 1
j = 0
movie_old2new_id_dict = dict()
for i in unique_MovieID:
    movie_old2new_id_dict[i] = j
    j += 1
    
# Then, use the generated dictionaries to reindex UserID and MovieID in the data_df
user_list = data_df['UserID'].values
movie_list = data_df['MovieID'].values
for j in range(len(data_df)):
    user_list[j] = user_old2new_id_dict[user_list[j]]
    movie_list[j] = movie_old2new_id_dict[movie_list[j]]
data_df['UserID'] = user_list
data_df['movieID'] = movie_list

# generate train_df with 70% samples and test_df with 30% samples, and there should have no overlap between them.
train_index = np.random.random(len(data_df)) <= 0.7
train_df = data_df[train_index]
test_df = data_df[~train_index]

# generate train_mat and test_mat
num_user = len(data_df['UserID'].unique())
num_movie = len(data_df['MovieID'].unique())

train_mat = coo_matrix((train_df['Rating'].values, (train_df['UserID'].values, train_df['MovieID'].values)), shape=(num_user, num_movie)).astype(float).toarray()
test_mat = coo_matrix((test_df['Rating'].values, (test_df['UserID'].values, test_df['MovieID'].values)), shape=(num_user, num_movie)).astype(float).toarray()

  return func(*args, **kwargs)


In [None]:
x = [i for i in test_mat[0] if i!=0]
print(x)

[5.0, 3.0, 4.0, 5.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0]


In [None]:
len(train_mat),len(train_mat[0])

(6040, 3706)

In [None]:
len(test_mat),len(test_mat[0])

(6040, 3706)

In [None]:
train_mat == test_mat

array([[False, False, False, ...,  True,  True,  True],
       [False,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       ...,
       [ True,  True,  True, ...,  True,  True,  True],
       [ True, False, False, ...,  True,  True,  True],
       [False,  True,  True, ...,  True,  True,  True]])

## Build the Baseline Estimation Model

First, let's implement a simple personalized recommendation model -- the baseline estimate -- introduced in class: $b_{u,i}=\mu+b_i+b_u$, where $\mu$ is the overall mean rating for all items, $b_u$ = average rating of user $u-\mu$, $b_i$ = average rating of item $i-\mu$. Store your prediction as a numpy array variable 'prediction_mat' of size (#users, #movies) with each entry showing the predicted rating for the corresponding user-movie pair.

* Hint: for users who do not have ratings in train_mat, set $b_u=0$ for them; and for movies which do not have ratings in train_mat, set $b_i=0$ for them

In [None]:
# calculate the prediction_mat by the baseline estimation recommendation algorithm
# Your Code Here...
user_sum = train_mat.sum(axis=1)
movie_sum = train_mat.sum(axis=0)

In [None]:
matrix_sum = train_mat.sum()
global_average = matrix_sum/len(train_mat.nonzero()[0])
global_average

3.5811211716986286

In [None]:
prediction_mat = train_mat.copy()
prediction_mat[prediction_mat==0] = np.nan
user_average = np.nanmean(prediction_mat,axis=1)
movie_average = np.nanmean(prediction_mat,axis=0)

  after removing the cwd from sys.path.


In [None]:
print('User average:{}\nMovie average:{}'.format(user_average,movie_average))

User average:[4.17647059 3.65625    3.86486486 ... 3.76470588 3.80898876 3.54693878]
Movie average:[4.39899413 3.50285714 4.14058957 ... 1.                nan        nan]


In [None]:
b_i = movie_average - global_average
b_u = user_average - global_average
print('b_i:{}\nb_u:{}'.format(b_i,b_u))

b_i:[ 0.81787296 -0.07826403  0.5594684  ... -2.58112117         nan
         nan]
b_u:[ 0.59534942  0.07512883  0.28374369 ...  0.18358471  0.22786759
 -0.0341824 ]


In [None]:
#Setting users without rating and items without rating to 0 as suggested in the hint
b_i[np.isnan(b_i)]=0
b_u[np.isnan(b_u)]=0

In [None]:
for user in range(prediction_mat.shape[0]):
  for movie in range(prediction_mat.shape[1]):
      prediction_mat[user][movie] = global_average + b_i[movie] + b_u[user]

print(prediction_mat)

[[5.04349033 4.10612182 4.76316276 ... 1.63405478 5.63405478 4.21621622]
 [4.47013126 3.53276275 4.18980369 ... 1.06069571 5.06069571 3.64285714]
 [4.57727412 3.6399056  4.29694655 ... 1.16783857 5.16783857 3.75      ]
 ...
 [4.95227412 4.0149056  4.67194655 ... 1.54283857 5.54283857 4.125     ]
 [4.69312778 3.75575926 4.4128002  ... 1.28369222 5.28369222 3.86585366]
 [4.39147645 3.45410794 4.11114888 ... 0.9820409  4.9820409  3.56420233]]


Now, with this prediction_mat based on the baseline estimate, let's calculate the RMSE to evaluate the quality of the baseline estimate model. Please print out the RMSE of your prediction_mat using test_mat in the next cell.


In [None]:
# calculate and print out the RMSE for your prediction_df and the test_df
# Your Code Here...
import math

non_zero = 0
sum = 0
for u in range(len(prediction_mat)):
    for m in range(len(prediction_mat[0])):
        if test_mat[u][m]!=0:
            non_zero+=1
            sum += math.pow((test_mat[u][m] - prediction_mat[u][m]),2)

rmse_error = math.sqrt(sum/non_zero)

In [None]:
print('RMSE of prediction_mat using test_mat: {}'.format(rmse_error))

RMSE of prediction_mat using test_mat: 0.9344457404253174


#Collaborative Filtering with Jaccard Similarity

In this part, you need to build a collaborative filtering recommendation model with **Jaccard similarity** to predict user-movie ratings. 

The prediction of the score for a user-item pair $(u,i)$ should use the formulation: $p_{u,i}=\bar{r}_u+\frac{\sum_{u^\prime\in N}s(u,u^\prime)(r_{u^\prime,i}-\bar{r}_{u^\prime})}{\sum_{u^\prime\in N}|s(u, u^\prime)|}$ as introduced in class, where $s(u, u^\prime)$ is the Jaccard similarity. We set the size of $N$ as 10.

In the next cell, you need to write your code to implement this algorithm, and generate a numpy array variable named 'prediction_mat' of size (#user, #movie) with each entry showing the predicted rating for the corresponding user-movie pair.

* Hint: when you find the nearest neighbor set $N$ of a user $u$, do not include user $u$ in $N$  


In [None]:
# calculate the prediction_mat by your user-user collaborative filtering recommendation algorithm
# Your Code Here...

#Pre-computation
train_hashable = map(tuple, train_mat)
y = set(train_hashable)
print(y)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Please print out the RMSE of your prediction_mat using test_mat in the next cell.

In [None]:
# calculate and print out the RMSE for your prediction_df and the test_df
# Your Code Here...
import torch
import math
torch.cuda.is_available()
cuda = torch.device('cuda')

In [None]:

torch.cuda.synchronize()
torch.cuda.empty_cache()

def jaccard_similarity(a,b):
  intersect = torch.mul(a,b).to("cuda")
  union = a.add(b).to("cuda")
  return (intersect!=0).sum(1)/(union!=0).sum(1)

In [None]:
# Vectorising each row of the matrix : train_mat[i] and multiplying with complete_matrix of dimension mxn
def get_jaccard_matrix_s():
    jaccard_matrix=[]
    complete_matrix = torch.Tensor(train_mat).to("cuda")

    for i in range(train_mat.shape[0]):
        jaccard_matrix.append(jaccard_similarity(torch.Tensor(train_mat[i]).to("cuda") ,complete_matrix))

    jaccard_matrix = torch.stack(jaccard_matrix).to ("cuda")
    return jaccard_matrix

In [None]:
jaccard_matrix = get_jaccard_matrix_s()
print(jaccard_matrix)
sorted, sorted_indices = torch.sort(jaccard_matrix)

tensor([[1.0000, 0.0236, 0.0441,  ..., 0.0000, 0.0885, 0.0333],
        [0.0236, 1.0000, 0.0472,  ..., 0.0089, 0.0221, 0.0723],
        [0.0441, 0.0472, 1.0000,  ..., 0.0385, 0.0413, 0.0330],
        ...,
        [0.0000, 0.0089, 0.0385,  ..., 1.0000, 0.0392, 0.0397],
        [0.0885, 0.0221, 0.0413,  ..., 0.0392, 1.0000, 0.0705],
        [0.0333, 0.0723, 0.0330,  ..., 0.0397, 0.0705, 1.0000]],
       device='cuda:0')


In [None]:
#Getting the prediction_matrix based on the formula
def runPrediction(top_jaccard_val,top_idx):
    train_mat_tensor = torch.Tensor(train_mat).to('cuda')
    train_mat_tensor_copy = train_mat_tensor.clone().detach()
    train_mat_tensor_copy[train_mat_tensor_copy == 0] = torch.nan
    prediction_matrix = train_mat.copy()
    userMeans = torch.nanmean(train_mat_tensor_copy, axis=1)
    b_user = userMeans
    
    prediction_matrix = torch.Tensor(train_mat.copy()).to("cuda")
    top_jaccard_val=top_jaccard_val.to("cpu")
    for i in range(train_mat.shape[0]):
        idxs=top_idx[i].to("cpu")
        jacSimVal=torch.Tensor(top_jaccard_val[i]).to("cuda")
        sub_n=[]
        jaccard_n=[]
        pred=[]
        for dim1,dim2 in enumerate(idxs):
            r_u=torch.Tensor(train_mat[dim2,:]).to("cuda")
            ru_i=r_u.clone().detach()
            ru_i[ru_i>0]=1
            subt=r_u-b_user[dim2]
            subt=torch.mul(subt,ru_i).to("cuda")
            subt_nz=torch.mul(subt,jacSimVal[dim1]).to("cuda")
            sub_n.append(subt_nz)
            jaccard_n.append(jacSimVal[dim1]*ru_i)

    
        sub_n=torch.stack(sub_n).to ("cuda")
        jaccard_n=torch.stack(jaccard_n).to("cuda")
        sub_n=torch.sum(sub_n,0).to("cuda")
        jaccard_n=torch.sum(jaccard_n,0).to("cuda")

        prediction_matrix[i]=torch.divide(sub_n,jaccard_n).to("cuda")
    return prediction_matrix

In [None]:
def getRmse(prediction_matrix,neighbors):
    prediction_matrix[torch.isnan(prediction_matrix)] = 0
    prediction_matrix[prediction_matrix == float('inf')] = 0
    prediction_matrix[prediction_matrix == -float('inf')] = 0
    print()
    print('Prediction Matrix')
    print()
    print (prediction_matrix)
    print (prediction_matrix.shape)
    print()


    for i in range(train_mat.shape[0]):
        prediction_matrix[i]=b_user[i]+prediction_matrix[i]
    

    err = 0
    count = 0

    for i in range(test_mat.shape[0]):
        for j in range(test_mat.shape[1]):
            if test_mat[i][j] > 0:
                count=count+1
                err= err+ math.pow(test_mat[i][j] - prediction_matrix[i][j], 2)

    print('Calculated RMSE is for {} nearest neighbors is: {}'.format(neighbors,math.sqrt(err/count)))
    print()

In [None]:
def getPredictionForNeighbors(neighbors):
    
    idx_required = torch.LongTensor([i for i in range(-(neighbors+1),-1,1)])
    top_jaccard_val=sorted[:,idx_required]
    top_idx = sorted_indices[:, idx_required]
    print('-----------------------Number of neighbors:{}-------------------------'.format(neighbors))
    prediction_mat = runPrediction(top_jaccard_val,top_idx)
    getRmse(prediction_mat,neighbors)


In [None]:
neighbors = [10,30,50,100]
for n in neighbors:
    getPredictionForNeighbors(n)


-----------------------Number of neighbors:10-------------------------

Prediction Matrix

tensor([[ 1.0000, -1.1949,  1.1627,  ...,  0.0000,  0.0000,  0.0000],
        [ 1.1000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.9300,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        ...,
        [ 1.5882,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.1379, -0.0822,  0.6320,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.8764,  0.3087, -0.3091,  ...,  0.0000,  0.0000,  0.0000]],
       device='cuda:0')
torch.Size([6040, 3706])

Calculated RMSE is for 10 nearest neighbors is: 1.0461161821434874

-----------------------Number of neighbors:30-------------------------

Prediction Matrix

tensor([[ 0.2948, -0.8474,  0.5717,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.5339,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.6528,  0.0000,  1.6875,  ...,  0.0000,  0.0000,  0.0000],
        ...,
        [ 0.6906,  0.0000,  0.0000,  ...,  

Comparing the RMSE results of this collaborative filtering and the baseline estimate algorithm, what do you observe? Is the  collaborative filtering the one producing the best performance? What reasons do you think can explain what you observe?

### **Answer:**
1. RMSE of baseline estimate : 0.934
2. RMSE of collaborative filtering : 0.934 with 100 neighbors, 1.04 with 10 neighbors.

In collaborative filtering, for each  user, we pick the most similar N users to them. Based on the preferences of these simialr users, we predict how much rating this user might give for a particular movie. 

**Observations:**
1. RMSE of collaborative filtering is higher when N=10. However, as we increase the number of neighbors to 100, the RMSE is much better when compared to baseline estimate. 
2. We have around 6k users, by increasing the number of nearest neighbors being considered, the prediction becomes more accurate. However, increasing N to more than 1000 may not ensure lower RMSE value. This is because, including too many users as neighbors to predict the ratings may result in too much generalization. 
3. Similarly, N=10 CF  performs worse(RMSE = 1.04) than baseline estimation since it considers very few neighbors. 


# Matrix Factorization

Now we turn to matrix factorization. First, let's implement the matrix factorization (MF for short) model introduced in class. The MF model can be mathematically represented as: 

<center>$\underset{\mathbf{P},\mathbf{Q}}{\text{min}}\,\,L=\sum_{(u,i)\in\mathcal{O}}(\mathbf{P}_u\cdot\mathbf{Q}^\top_i-r_{u,i})^2+\lambda(\lVert\mathbf{P}\rVert^2_{\text{F}}+\lVert\mathbf{Q}\rVert^2_{\text{F}})$,</center>
    
where $\mathbf{P}$ is the user latent factor matrix of size (#user, #latent); $\mathbf{Q}$ is the movie latent factor matrix of size (#movie, #latent); $\mathcal{O}$ is a user-movie pair set containing all user-movie pairs having ratings in train_mat; $r_{u,i}$ represents the rating for user u and movie i; $\lambda(\lVert\mathbf{P}\rVert^2_{\text{F}}+\lVert\mathbf{Q}\rVert^2_{\text{F}})$ is the regularization term to overcome overfitting problem, $\lambda$ is the regularization weight (a hyper-parameter manually set by developer, i.e., you), and $\lVert\mathbf{P}\rVert^2_{\text{F}}=\sum_{x}\sum_{y}(\mathbf{P}_{x,y})^2$, $\lVert\mathbf{Q}\rVert^2_{\text{F}}=\sum_{x}\sum_{y}(\mathbf{Q}_{x,y})^2$. Such an L function is called the **loss function** for the matrix factorization model. The goal of training an MF model is to find appropriate $\mathbf{P}$ and $\mathbf{Q}$ to minimize the loss L.

To implement such an MF, here we will write a Python class for the model. There are three functions in this MF class: init, train, and predict. 

* The 'init' function (**already provided**) is to initialize the variables the MF class needs, which takes 5 inputs: train_mat, test_mat, latent, lr, and reg. 'train_mat' and 'test_mat' are the corresponfing training and testing matrices we have. 'latent' represents the latent dimension we set for the MF model. 'lr' represents the learning rate, i.e., the update step in each optimization iteration, default is 0.01. 'reg' represents the regularization weight, i.e., the $\lambda$ in the MF formulation.

* The 'train' function (**partially provided and need to complete**) is to train the MF model given the training data train_mat. There is only one input to this function: an int variable 'epoch' to indicate how many epochs for training the model. The main body of this function should be a loop for 'epoch' iterations. In each iteration, following the algorithm to update the MF model:

        1. Randomly shuffle training user-movie pairs  (i.e., user-movie pairs having ratings in train_mat)
        2. Have an inner loop to iterate each user-movie pair:
                a. given a user-movie pair (u,i), update the user latent factor and movie latent factor by gradient decsent:    
<center>$\mathbf{P}_u=\mathbf{P}_u-\gamma [2(\mathbf{P}_u\cdot\mathbf{Q}_i^\top-r_{u,i})\cdot\mathbf{Q}_i+2\lambda\mathbf{P}_u]$</center>    
<center>$\mathbf{Q}_i=\mathbf{Q}_i-\gamma [2(\mathbf{P}_u\cdot\mathbf{Q}_i^\top-r_{u,i})\cdot\mathbf{P}_u+2\lambda\mathbf{Q}_i]$</center>    
<center>where $\mathbf{P}_u$ and $\mathbf{Q}_i$ are row vectors of size (1, #latent), $\gamma$ is learning rate (default is 0.01), $\lambda$ is regularization weight.</center>
        
        3. After iterating over all user-movie pairs, we have finished the training for the current epoch. Now calculate and print out the value of the loss function L after this epoch, and the RMSE on test_mat by the current MF model. Then append them to lists to keep a record of them.
The train function needs to return two lists: 'epoch_loss_list' recording the loss after each training epoch, and 'epoch_test_RMSE_list' recording the RMSE on test_mat after each training epoch.

* The 'predict' function (**already provided**) is to calculate the prediction_mat by the learned $\mathbf{P}$ and $\mathbf{Q}$.

In the next cell, we provide the 'init' and 'predict' functions. You will need to fill in the 'train' function based on the description above. 

**NOTE that you should not delete or modify the provided code.**

In [None]:
class MF:
    def __init__(self, train_mat, test_mat, latent=5, lr=0.01, reg=0.01):
        self.train_mat = train_mat  # the training rating matrix of size (#user, #movie)
        self.test_mat = test_mat  # the training rating matrix of size (#user, #movie)
        
        self.latent = latent  # the latent dimension
        self.lr = lr  # learning rate
        self.reg = reg  # regularization weight, i.e., the lambda in the objective function
        
        self.num_user, self.num_movie = train_mat.shape
        
        self.sample_user, self.sample_movie = self.train_mat.nonzero()  # get the user-movie paris having ratings in train_mat
        self.num_sample = len(self.sample_user)  # the number of user-movie pairs having ratings in train_mat

        self.train_indicator_mat = 1.0 * (train_mat > 0)  # binary matrix to indicate whether s user-movie pair has rating or not in train_mat
        self.test_indicator_mat = 1.0 * (test_mat > 0)  # binary matrix to indicate whether s user-movie pair has rating or not in test_mat

        self.P = np.random.random((self.num_user, self.latent))  # latent factors for users, size (#user, self.latent), randomly initialized
        self.Q = np.random.random((self.num_movie, self.latent))  # latent factors for users, size (#movie, self.latent), randomly initialized

    def train(self, epoch=20, verbose=True):
        """
        Goal: Write your code to train your matrix factorization model for epoch iterations in this function
        Input: epoch -- the number of training epoch 
        Output: epoch_loss_list -- a list recording the training loss for each epoch
                epoch_test_RMSE_list -- a list recording the testing RMSE after each training epoch
        """
        epoch_loss_list = []
        epoch_test_RMSE_list = []
        for ep in range(epoch):
            """ 
            Write your code here to implement the training process for one epoch, 
            and at the end of each epoch, print out the epoch number, the training loss after this epoch, 
            and the test RMSE after this epoch
            """

            #Picking random user and item using shuffle function
            training_indices = np.arange(self.num_sample)
            np.random.shuffle(training_indices)
            for index in training_indices:  
                u=self.sample_user[index]
                i=self.sample_movie[index]
                
                # Updating the P and Q latent factor matrices using gradient descent. Formula from the equation mentioned in the question
                error_ui = train_mat[u,i] - self.P[u,:].dot(self.Q[i,:])
                for j in range(self.latent):
                    update = self.P[u][j].copy() + self.lr * (2 * error_ui * self.Q[i][j] - 2*self.reg * self.P[u][j])
                    self.Q[i][j] += self.lr * (2 * error_ui * self.P[u][j] - 2*self.reg * self.Q[i][j])
                    self.P[u][j] = update

            # Computing training loss after each epoch.
            E = (train_mat - self.P.dot(self.Q.T))**2
            loss = E[train_mat.nonzero()].sum() + self.reg*((self.P**2).sum() +(self.Q**2).sum())
            pred_matrix=self.predict()

            # Computing test RMSE after each epoch
            rmse_score=self.calculate_rmse(test_mat,pred_matrix)

           
            print('Epoch:{}  Loss:{}  RMSE_score:{}'.format(ep+1,loss,rmse_score))
            epoch_loss_list.append(loss)
            epoch_test_RMSE_list.append(rmse_score)

            """
            End of your code for this function
            """
        return epoch_loss_list, epoch_test_RMSE_list
        

    def calculate_rmse(self,actual,pred):
        count=0
        sum=0
        for i in range(self.num_user):
          for j in range(self.num_movie):
            if actual[i][j]!=0:
              sum+=(actual[i][j]-pred[i][j])**2
              count+=1
        return pow(sum/count,0.5)


    def predict(self):
        prediction_mat = np.matmul(self.P, self.Q.T)
        return prediction_mat

Now, let's train an MF model based on your implementation. The code is provided, you just need to excute the next cell. The expectations are: 

* first, the code can be successfully excuted without error; 
* and second, the training loss and RMSE on **test_mat** of each training epoch should be printed out for all 20 epochs.


In [None]:
mf = MF(train_mat, test_mat, latent=5, lr=0.01, reg=0.001)
epoch_loss_list, epoch_test_RMSE_list = mf.train(epoch=20)

Epoch:1  Loss:610865.7308371149  RMSE_score:0.9515669804252719
Epoch:2  Loss:597781.3863400008  RMSE_score:0.945730356571922
Epoch:3  Loss:586311.4749284006  RMSE_score:0.9410168630513732
Epoch:4  Loss:566651.8929471895  RMSE_score:0.9312108354967252
Epoch:5  Loss:551184.2714029644  RMSE_score:0.923508778671051
Epoch:6  Loss:538083.8138885034  RMSE_score:0.9165114945769472
Epoch:7  Loss:528263.305247537  RMSE_score:0.9119019908006063
Epoch:8  Loss:518894.6717339124  RMSE_score:0.9066756600530428
Epoch:9  Loss:513709.2376504036  RMSE_score:0.9043481764557498
Epoch:10  Loss:508178.56970574206  RMSE_score:0.9022163638451258
Epoch:11  Loss:505112.0421342366  RMSE_score:0.9009106267038767
Epoch:12  Loss:500074.0463960121  RMSE_score:0.8973389827368815
Epoch:13  Loss:500118.0778517156  RMSE_score:0.8992484857294937
Epoch:14  Loss:500116.37269078067  RMSE_score:0.8993579460518053
Epoch:15  Loss:498174.25312779826  RMSE_score:0.8993615231211485
Epoch:16  Loss:494796.4372613662  RMSE_score:0.89