# CX 4803 / CSE 6240 Web Search and Text Mining
## Homework 5: Recommender Systems

This homework asks you to build various recommender systems based on real-world movie rating data called MovieLens.

In [203]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import time
from collections import defaultdict
from scipy.spatial import distance

Please run the following cell substituting your student and user names.

In [204]:
def author_honor_code (student_name='Jiarui_Xu', user_name='jxu605'):
  print (f'I, {student_name} ({user_name}), state that I performed the tasks in this assignment following the Georgia Tech honor code(https://osi.gatech.edu/content/honor-code).')

# print the honor code before submission (substitute your name and username)
author_honor_code ()

I, Jiarui_Xu (jxu605), state that I performed the tasks in this assignment following the Georgia Tech honor code(https://osi.gatech.edu/content/honor-code).


## Section 1: **Data Parsing & Preprocessing** [1.0 points]

Lets Mount the notebook on the google drive. This lets you access and load files from your drive. When you run the below cell, you will be asked to connect this notebook to your google drive. Select your google account and then select _Allow_ to let this notebook access content of your google drive.

In [205]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Set path to working directory. This is command line agrument. In python notebook, one can execute all the terminal commands by placing `!` or `%`in front of it. These are known as **magic commands**. Here we set the working directory to _HW5_ folder so that that we are able to access the datafiles with the help of relative path.
To learn more about magic commands: [Read Documentation](https://ipython.readthedocs.io/en/stable/interactive/magics.html)


**NOTE**: If you have different path for HW5 in your google drive, change the variable **`path`** accordingly. 

In [206]:
path = '/content/drive/MyDrive/HW5'
%cd $path

/content/drive/MyDrive/HW5


In [207]:
#Check if your present working directory has changed to the path specified in the previous cel.
!pwd

/content/drive/MyDrive/HW5


### Data Description: **[MovieLens](https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html)**

MovieLens dataset (ml-latest-small) contains 100836 ratings and 3683 tag applications across 9742 movies. In this homework, we offer two files `train.csv` and `test.csv`, which will be used for training and testing the model, respectively. Each line of these files represents one rating of one movie by one user, and has the following format:

    userId,movieId,rating,timestamp

The lines within this file are ordered first by userId, then, within user, by movieId.
Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).
Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.


### 1.1 Write a function to parse and normalize the training and test data from files [0.5 points]
Store only **user_id, movie_id, and ratings** since we will ignore timestamps in this homework.  
**All ratings should be divided by 5**, so they are in 0-1 scale.

In [208]:
def load_dataset (filename):
  """
  Arguments:
  filename (str): The name of the file which contains the movie ratings.

  Returns:
  processed_data (str-type numpy.array): the str-type numpy.array containing
   - ex:) [user_id, movie_id, normalized rating (0-1)] in each row.

  Steps:
  1. split each row(type: str) into a list [userID,movieId,rating,timestamp] using delimiter ','.
  2. append the list [userID,movieID,normalized rating] on input_data.
  """
  input_data = []
  with open (filename) as fin:
    for i, line in enumerate (fin):
      ## Add code below [0.5 points] ##
      line2=line.split(',')
      input_data.append([line2[0],line2[1],str(float(line2[2])/5)])
      #################################
  return np.array(input_data).astype(str)

In [209]:
training_data = load_dataset ('train.csv')
test_data = load_dataset ('test.csv')
print (f'=== Dataset statistics ===')
print (f'Number of Training Data: {training_data.shape[0]}')
print (f'Number of Test Data: {test_data.shape[0]}')
print (f'Min/max Ratings of Training Data: {min(training_data[:,2]),max(training_data[:,2])}')
print (f'======')

=== Dataset statistics ===
Number of Training Data: 90718
Number of Test Data: 9715
Min/max Ratings of Training Data: ('0.1', '1.0')


**[Sanity Check]**  
=== Dataset statistics ===  
Number of Training Data: 90718  
Number of Test Data: 9715  
Min/max Ratings of Training Data: ('0.1', '1.0')

### 1.2 Write code to load pre-trained embeddings and rating dictionaries of each user and item  [0.5 points]
Pre-trained user and item embeddings will be used in recommendation models 
to accelerate our computations.   
Rating dictionaries of users and items will be used for item-item collaborative filtering and matrix factorization.

In [210]:
def preprocessing(training_data):
  """
  Arguments:
  training data (str-type numpy.array): the training data containing user_id, movie_id, and normalized rating (0-1) information.

  Returns:
  R_ui (dictionary of dictionaries): this dictionary contains rating information of each user. 
   - The key is user_id (string), the value is a dictionary whose key is item_id (string) and value is rating (float).
   - Thus, R_ui['1']['260'] = 1.0. R_ui should be computed with training data.
  R_iu (dictionary of dictionaries): it is similar to R_ui, but the key of a dictionary is item_id (string). 
   - Thus, R_ui['260']['1'] = 1.0. R_iu should be computed with training data.
  user_emb and item_emb (dictionaries of numpy.array): pre-trained embeddings of users and items. 
   - The keys are user_id (string) and item_id (string).
   - The values are the corresponding embeddings (numpy.array; dim=32) of user_id and item_id, respectively.

  Steps:
  1. for each training example with (user u, item i, and rating r), R_ui[u][i] should be r (float). R_iu can be computed similarly.
  2. Raw embedding files can be read by the pd.read_csv(file_name,header=None,sep=' ').values command.
      - file name: 'pre_trained_user_emb', 'pre_trained_item_emb'
  3. Convert the raw embeddings to dictionaries of numpy.array by taking the first column as keys and the rest of columns as values.
      - ex:) for user_emb, [user_id, corresponding user embedding(dim=32)] > {user_id: corresponding user embedding(dim=32)}
  """
  R_ui,R_iu = defaultdict(dict),defaultdict(dict)

  ## Add code below [0.5 points] ##
  user_emb={}
  item_emb={}
  for column in training_data:
    R_ui[column[0]][column[1]]=float(column[2])
    R_iu[column[1]][column[0]]=float(column[2])
  pre_trained_user_emb=pd.read_csv('pre_trained_user_emb',header=None,sep=' ')
  pre_trained_item_emb=pd.read_csv('pre_trained_item_emb',header=None,sep=' ')
  for i in range(len(pre_trained_user_emb[0])):
    user_emb[str(pre_trained_user_emb[0][i])]=np.zeros(32)
    for j in range(32):
      user_emb[str(pre_trained_user_emb[0][i])][j]=pre_trained_user_emb[j+1][i]
  for i in range(len(pre_trained_item_emb[0])):
    item_emb[str(pre_trained_item_emb[0][i])]=np.zeros(32)
    for j in range(32):
      item_emb[str(pre_trained_item_emb[0][i])][j]=pre_trained_item_emb[j+1][i]
  #################################
  return R_ui,R_iu,user_emb,item_emb

In [211]:
R_ui,R_iu,user_emb,item_emb = preprocessing(training_data)
print('R_ui of user 1 and item 260 = {}'.format(R_ui['1']['260']))
print('R_iu of user 1 and item 260 = {}'.format(R_iu['260']['1']))
print('Pre-trained embedding of user 1 = {}'.format(user_emb['1']))

R_ui of user 1 and item 260 = 1.0
R_iu of user 1 and item 260 = 1.0
Pre-trained embedding of user 1 = [ 0.516262  0.0786   -0.903148 -0.125587 -0.670954 -0.282254 -0.00769
 -0.455289  0.190866  0.143283 -0.10494  -0.489509 -0.740679  0.596688
  0.491465  0.22938   0.757515 -0.131054 -0.046257  0.181893 -0.751261
  0.448121 -0.315641 -0.09361  -0.229705  0.224912  0.065417  0.025272
 -0.070755 -0.075175 -0.090409 -0.365512]


**[Sanity check]**   
R_ui of user 1 and item 260 = 1.0  
R_iu of user 1 and item 260 = 1.0  
pre-trained embedding of user 1 = [ 0.516262  0.0786   -0.903148 -0.125587 -0.670954 -0.282254 -0.00769
 -0.455289  0.190866  0.143283 -0.10494  -0.489509 -0.740679  0.596688
  0.491465  0.22938   0.757515 -0.131054 -0.046257  0.181893 -0.751261
  0.448121 -0.315641 -0.09361  -0.229705  0.224912  0.065417  0.025272
 -0.070755 -0.075175 -0.090409 -0.365512 ]

## Section 2 : **Item-Item Collaborative Filtering** [2.0 points]

The first recommendation model we are developing is a collaborative filtering algorithm using an item-oriented approach. 

First, we use the cosine similarity to compute similarity between items $i$ and $j$ as follows.

$$
sim(i,j) = \frac{E^{item}_{i} \bullet E^{item}_{j}}{||E^{item}_{i}||_2 ||E^{item}_{j}||_2} 
$$

where $E^{item}_{i}$ is the pre-trained embedding of item $i$, $\bullet$ denotes an inner product between vectors, and $||X||_2$ is a L2-norm of a vector $X$.

With the above similarity function, we can compute **the predicted rating** $P_{u,i}$ on an item $i$ for a user $u$ by computing the weighted sum of the ratings given by the user on the other items **except $i$**.
$$P_{u,i} = \frac{\sum_{\forall j \neq i, (u,j) \in R}(sim(i,j)R_{u,j})}{\sum_{\forall j \neq i, (u,j) \in R}(|sim(i,j)|)}$$


### 2.1 Write a function to get the **similarity** of items $i$ and $j$ [0.5 points]

In [156]:
def item_similarity(item_emb, item_i, item_j):
    """
    Arguments: 
    item_emb (dictionary of numpy.array): pre-trained embeddings of items. 
     - The key is item_id (string), and the value is the corresponding item embedding (numpy.array; dim=32).
    item_i, item_j (strings): two item_ids for similarity computation.

    Returns:
    sim(i,j) (float): cosine similarity of two embeddings of item i and j.

    Steps: 
    1. You can use scipy.spatial.distance.cosine to compute the similarity.
    """
    ## Add code below [0.5 points] ##
    cosine_similarity=1-distance.cosine(item_emb[item_i],item_emb[item_j])
    #################################
    return cosine_similarity

In [None]:
print('sim(1,163) = {:.6f}'.format(item_similarity(item_emb,'1','163')))

sim(1,163) = 0.391101


**[Sanity check]**     
sim(1,163) = 0.391101

### 2.2 Write a function to compute **the predicted rating $P_{u,i}$** of on an item $i$ for a user $u$ [1.0 points] 


In [157]:
def item_item_collaborative_filtering(item_emb, user_u, item_i):
    """
    Arguments: 
    item_emb (dictionary of numpy.array): pre-trained embeddings of items. 
     - The key is item_id (string), and the value is the corresponding item embedding (numpy.array; dim=32).
    user_u (string): user_id.
    item_j (string): item_id.

    Returns:
    P_{u,i} (float): the predicted rating of user i on item j based on the item-item collaborative filtering.

    Steps:
    1. retrieve the set of items I a user rated using the keys of R_ui.
    2. for each item in I (must be different from item_i), compute sim(current_item, item_i).
    3. update numerator and denominator values for the current item based on the above P_{u,i} equation. 
        - Don't forget to use absolute value of the similarity while computing denominator.
    4. repeat 2 and 3 for all items in I, and return numerator/denominator.
    """
    numerator,denominator = 0,0
    ## Add code below [1.0 points] ##
    I=R_ui[user_u].keys()
    for item in I:
      if not item==item_i:
        sim=item_similarity(item_emb, item, item_i)
        numerator+=sim*float(R_ui[user_u][item])
        denominator+=abs(sim)
    #################################
    return numerator/denominator

In [None]:
print('P_(1,296) = {:.6f}'.format(item_item_collaborative_filtering(item_emb,'1','296')))

P_(1,296) = 0.872167


**[Sanity check]**          
P_(1,296) = 0.872167

### 2.3 Write a function to measure **root-mean-square-error (RMSE)** on test data. [0.5 points] 

**Root-mean-square error (RMSE)** is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. 
Mathematically, test RMSE of a recommendation model for movie rating prediction is calculated as follows.
$$ Test-RMSE = \sqrt{\frac{\sum_{(u,i)\in R^{test}} (R_{u,i}^{test} - P_{u,i})^2}{N}} $$
where $R^{test}$ is test data and $N$ is the number of test data. Note that training RMSE is computed similarly to test RMSE on training data.

In [158]:
def test_RMSE_of_collaborative_filtering(test_data,item_emb):
  """
  Arguments: 
  test data (str-type numpy.array): the test data containing user_id, movie_id, and normalized rating (0-1) information.
  item_emb (dictionary of numpy.arraay): pre-trained embeddings of items. 
   - The key is item_id (string), and the value is the corresponding item embedding (numpy.array; dim=32).

  Returns:
  test_RMSE (float): the test RMSE of item-item collaborative filtering model.

  Steps:
  1. for each test example in the test data, compute P_{u,i} using the item_item_collaborative_filtering function.
  2. compute the error (R_{u,i}^{test} - P_{u,i}) for the current test example.
  3. sum the square of the error for all test examples.
  4. divide the sum by the number of test examples and compute the root of it.
  """
  test_RMSE = 0
  ## Add code below [0.5 points] ##
  square=0
  for test_sample in test_data:
    P_ui=item_item_collaborative_filtering(item_emb, test_sample[0], test_sample[1])
    error=float(test_sample[2])-P_ui
    square+=error*error
  test_RMSE=np.sqrt(square/test_data.shape[0])
  #################################
  return test_RMSE

In [None]:
print('test_RMSE = {:.6f}'.format(test_RMSE_of_collaborative_filtering(test_data,item_emb)))

test_RMSE = 0.182710


**[Sanity check]**       
   test RMSE = 0.182710

## Section 3: **Matrix Factorization** [3.0 points]

The second recommendation algorithm is called matrix factorization. Specifically, the matrix factorization algorithm works by decomposing the rating matrix into the product of two lower dimensionality rectangular matrices. Let’s define $K$ as the number of latent factors (or reduced dimensionality). Then, we learn a user profile $U \in  \mathbb{R}^{N \times K}$ and an item profile $V \in  \mathbb{R}^{M \times K}$ ($N$ and $M$ are the number of users and items, respectively).
We want to approximate a rating by an inner product of two length-$K$ vectors, one representing user profile and the other item profile. Mathematically, a
rating $R_{u,i}$ of the user $u$ on the movie $i$ is approximated by
$$R_{u,i} \approx \sum_{k=1}^{K} U_{u,k} V_{i,k}$$


### 3.1 Write a function to learn **$U$ and $V$** with gradient descent. [2.0 points]

We want to fit each element of $U$ and $V$ by minimizing squared reconstruction error over all training data points. That is, the objective function we minimize is given by
$$ L(U,V,\lambda) = \sum_{(u,i) \in R} (R_{u,i} - \sum_{k=1}^{K} U_{u,k} V_{i,k})^2 + \lambda \sum_{u,k} U_{u,k}^2 + \lambda \sum_{i,k} V_{i,k}^2 $$
where $U_{u}$ is the $u$th row of $U$ and $V_{i}$ is the $i$th row of $V$, and $\lambda$ is a regularization hyperparameter controlling the degree of penalization of large values in $U$ and $V$. As $U$ and $V$ are interrelated, there is no closed form solution. Thus, we update each element of $U$ and $V$
using the gradient descent formula.

Since the gradient descent update is quite complicated, **we will provide the code of it**. 
We can iteratively update $U$ and $V$ by calling the gradient descent update function multiple times.
Use the number of latent factors **$K=10$** and **initialize $U$ and $V$ randomly between 0 and $\sqrt{\frac{avg(R_{u,i})}{K}}$**, where $avg(R_{u,i})$ is the average ratings of all training examples in the training data. 

In [177]:
def gradient_descent_update(U,V,K):
  """
  Do not modify this function. There is a -2.0 point penalty if you modify this function.

  Arguments: 
  U,V (dictionary of numpy.array): current user and item profile dictionaries. 
   - The key is either user_id or item_id, and the value is the corresponding user or item profile (numpy.array; dim:K).
  K (int): the number of latent factors.

  Returns:
  Updated U,V (dictionary of numpy.array): updated user and item profile dictionaries. 
   - The key is either user_id or item_id, and the value is the corresponding user or item profile (numpy.array; dim:K).
  """
  mu = 0.001
  lambda_value = 0.001
  for user in U.keys():
    updates = np.zeros(K)
    for item in R_ui[user].keys():
      pred = np.inner(U[user],V[item])
      error = R_ui[user][item] - pred
      updates += error*V[item]
    final_updates = 2*mu*updates - 2*lambda_value*U[user]
    U[user] += final_updates

  for item in V.keys():
    updates = np.zeros(K)
    for user in R_iu[item].keys():
      pred = np.inner(U[user],V[item])
      error = R_iu[item][user] - pred
      updates += error*U[user]
    final_updates = 2*mu*updates - 2*lambda_value*V[item]
    V[item] += final_updates
  return U,V

In [175]:
def matrix_factorization (training_data, K=10, epochs = 200):
  """
  Arguments:
  training data (str-type numpy.array): the training data containing user_id, movie_id, and normalized rating (0-1) information.
  K (int): number of latent factors used for matrix factorization.
  epochs (int): number of repetitions of the updates of U and V.

  Returns:
  U,V (dictionary of float-type numpy.array): learned user and item profile dictionaries. 
   - The key is either user_id or item_id, and the value is the corresponding user or item profile (float-type numpy.array; dim:K).

  Steps for the first code block:
  1. compute the maximum value using 'sqrt(avg(ratings of all training examples)/K)' for the initialization (ratings of all training examples should be float-type, not str-type here).
  2. for each user u in training_data, initialize the value of U[u] with a size-K numpy.array (float) filled with random values between 0 the maximum value.
  3. initialize V[v] for each item v in training_data like step 2.
      - when you assign the initial value, please use R_ui.keys(), R_iu.keys() to keep the order and to avoid multiple initialization.
  """

  np.random.seed(0)
  U,V = defaultdict(np.array),defaultdict(np.array)

  ## Add code below [1.0 points] ##
  sum_rating=0
  for item in training_data:
    sum_rating+=float(item[2])
  maximum=np.sqrt((sum_rating/training_data.shape[0])/K)
  for u in R_ui.keys():
    U[u]=np.random.sample(K)*maximum
  for v in R_iu.keys():
    V[v]=np.random.sample(K)*maximum
  #################################

  """
  Steps for the second code block:
  1. for each iteration, call the gradient_descent_update with current U and V.
  2. update the user and item profile matrices with the returned U and V.
  """
  ## Add code below [1.0 points] ##
  for i in range(epochs):
    U,V=gradient_descent_update(U,V,K)
  #################################
  return U,V

In [180]:
(W,H) = matrix_factorization(training_data)
print('the user profile of user 1 =\n{}'.format(W['1']))

the user profile of user 1 =
[0.4514257  0.5078636  0.5814194  0.44791872 0.49933625 0.51698918
 0.47100742 0.6167688  0.57151657 0.49267464]


**[Sanity check]** (*The running time should be less than 10 minutes)   
the user profile of the user 1 =      
[0.4514257 , 0.5078636 , 0.5814194 , 0.44791872, 0.49933625, 0.51698918,      
0.47100742, 0.6167688 , 0.57151657, 0.49267464]


### 3.2 Write a function to compute **training/test RMSE** of the matrix factorization model. [1.0 points]

Compute RMSE values of the matrix factorization model on training/test data, and report the two RMSE values.  
You can refer to Part 2.3 for the definition of training/test RMSE values.
$$ RMSE = \sqrt{\frac{\sum_{(u,i)\in R} (R_{u,i} - P_{u,i})^2}{N}} $$
where $R$, $P$ are ground-truth and predicted rating respectively, and $N$ is the number of examples.

In [187]:
def training_and_test_RMSE_of_MF(training_data, test_data):
  """
  Arguments: 
  training_data and test_data (str-type numpy.array): the training and test data containing user_id, movie_id, and normalized rating (0-1) information.

  Returns:
  training_RMSE and test_RMSE (float): the training and test RMSE of the matrix factorization model.

  Steps for the first code block:
  1. for each training example in the training data, compute P_{u,i} by the inner product of user and item profile vectors of u and i.
  2. compute the error (R_{u,i} - P_{u,i}) for the current training example.
  3. sum the square of the error for all training examples.
  4. derive training RMSE by dividing the sum by the number of training examples and computing the root of it.
  """
  training_RMSE,test_RMSE = 0,0

  ## Add code below [0.5 points] ##
  #U,V=matrix_factorization(training_data)
  sum_error=0
  for example in training_data:
    P_ui=np.inner(W[example[0]],H[example[1]])
    error=float(example[2])-P_ui
    sum_error+=np.square(error)
  training_RMSE=np.sqrt(sum_error/training_data.shape[0])
  #################################

  """
  Steps for the second code block:
  1. for each test example in the test data, compute P_{u,i} by the inner product of user and item profile vectors of u and i.
  2. compute the error (R_{u,i} - P_{u,i}) for the current test example.
  3. sum the square of the error for all test examples.
  4. derive test RMSE by dividing the sum by the number of test examples and computing the root of it.
  """
  ## Add code below [0.5 points] ##
  #U2,V2=matrix_factorization(test_data)
  sum2_error=0
  for example in test_data:
    P_ui2=np.inner(W[example[0]],H[example[1]])
    error2=float(example[2])-P_ui2
    sum2_error+=np.square(error2)
  test_RMSE=np.sqrt(sum2_error/test_data.shape[0])
  #################################
  return training_RMSE,test_RMSE

In [188]:
training_RMSE, test_RMSE = training_and_test_RMSE_of_MF(training_data,test_data)
print('training RMSE = {:.6f}'.format(training_RMSE))
print('test RMSE = {:.6f}'.format(test_RMSE))

training RMSE = 0.164524
test RMSE = 0.180798


**[Sanity check]**       
training RMSE = 0.164524        
test RMSE = 0.180798

## Section 4: **Deep learning-based Recommendation** [4.0 points]

Finally, we will utilize a neural network that can accurately predict ratings and recommend items to users. We will implement a simple multi-layer perceptron (MLP) model with PyTorch library, which offers simple and straightforward neural network training process.

<img src='https://drive.google.com/uc?id=1kEFTCa7ragS44aYmF89V958TbJpcPv-u'>

As shown in the above figure, the MLP model consists of **embedding layer, hidden layer** (or neural CF layers in the figure), and **output layer**. 

The embedding layer transforms user_id and item_id to user and item latent vectors, respectively. Although the weights of the embedding layer are usually trainable, we will use **pre-trained embeddings** of users and items for faster learning speed. 

The hidden layer consists of **multiple fully connected (FC) layers** with different sizes. Each FC layer 1) takes previous layer's output as input, 2) multiply them with its weight, and 3) applies an activation function like ReLU, Sigmoid, etc. (also bias term), and 4) send the output to the next layer. In this homework, we will use **two FC layers with size of [64,512] and [512,512]** and **ReLU activation function**. 

The output layer is also **fully-connected (with ReLU activation) with size of [512,1]** to produce a single prediction value **since our task is a regression problem**. The error between the ground-truth value and our prediction will be used for **backpropagation**, which updates all **trainable parameters** (e.g., weight matrices of FC layers) of the MLP model. After enough iterations, we will have trained parameters and can use them for predicing ratings for new items. In our mini-batch training setting, we will compare ground-truth ratings in the training batch and our prediction values of the current batch. 

Fortunately, we do not need to calculate the error and gradients for backpropagation by ourselves. **PyTorch** offers very simple functions for defining and training the MLP model! Please refer to the following documentation https://pytorch.org/tutorials/beginner/basics/intro.html for detailed information.

### 4.1 Write a class to **define the MLP model**. [1.0 points] 

As mentioned above, the MLP class consists of **two hidden layers (fully-connected with ReLU activation)** and **one output layer** (which is also fully-connected with ReLU activation). 
We will define the each component of the MLP model with PyTorch commands (e.g., nn.Linear() and nn.ReLU()), and we will also write a prediction function for a given input data, which will be used for model updates and inference.

In [212]:
class MLP(nn.Module):
    def __init__(self):
        """
        Arguments: 
        self: the MLP model class.

        Steps:
        1. define the first hidden layer (self.fc1) with size of [64,512] (use nn.Linear()).
        2. define the second hidden layer (self.fc2) with size of [512,512] (use nn.Linear()).
        3. define the output layer (self.output) with size of [512,1] (use nn.Linear()).
        4. define the ReLU activation layer (use nn.ReLU() with default hyperparameters). 
            - Please use one ReLU layer for the MLP model, instead of using multiple ReLU layers.
        """
        
        torch.manual_seed(0)
        np.random.seed(0)
        super(MLP, self).__init__()

        self.fc1 = None
        self.fc2 = None
        self.output = None
        self.relu = None

        ## Add code below [0.5 points] ##
        self.fc1=nn.Linear(64,512)
        self.fc2=nn.Linear(512,512)
        self.output=nn.Linear(512,1)
        self.relu=nn.ReLU()
        #################################

    def forward(self, input_emb):
        """
        Arguments: 
        self: the MLP model class.
        input_emb: the input data which is a concatenation of user and item embeddings.
      
        Returns:
        prediction (torch.FloatTensor): the prediction values (torch.FloatTensor format) for a given input.

        Steps:
        1. compute the first intermediate output by feeding the input_emb to the first hidden layer (self.fc1) and applying ReLU activation.
        2. compute the second intermediate output by feeding the first intermediate output to the second hidden layer (self.fc2) and applying ReLU activation. 
        3. compute the prediction values by feeding the second intermediate output to the output layer (self.output) and applying ReLU activation.
        4. return the prediction values.
        """
        ## Add code below [0.5 points] ##
        output1=self.fc1(input_emb)
        output1=self.relu(output1)
        output2=self.fc2(output1)
        output2=self.relu(output2)
        prediction=self.output(output2)
        prediction=self.relu(prediction)
        output=prediction
        #################################
        return output

In [213]:
MLP_model = MLP()
print(MLP_model)

MLP(
  (fc1): Linear(in_features=64, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=512, bias=True)
  (output): Linear(in_features=512, out_features=1, bias=True)
  (relu): ReLU()
)


**[Sanity check]** (*The output should be like the below (the layer names can be different)

MLP(  
  (fc1): Linear(in_features=64, out_features=512, bias=True)  
  (fc2): Linear(in_features=512, out_features=512, bias=True)  
  (output): Linear(in_features=512, out_features=1, bias=True)  
  (relu): ReLU()  
)

#### 4.2 Write a function to **initialize and train the MLP model**. [2.0 points] 

The detailed steps for initializing and training the MLP model are given as follows.  

Step 1) Define the MLP model using the above class.  
Step 2) Construct input data for the MLP model by concatenating the user and item embeddings.  
Step 3) Feed the input data to the MLP and obtain the prediction values.  
Step 4) Perform Backpropagation to update the model parameters.   
Step 5) Repeat Steps 3-4 for several epochs (or iterations) and print training RMSE every 10 epochs.  

In [214]:
def MLP_train(training_data,user_emb,item_emb,MLP_model,epochs=200):
  """
  Arguments: 
  training_data (str-type numpy.array): the training data containing user_id, movie_id, and normalized rating (0-1) information.
  user_emb and item_emb (dictionaries of numpy.array): pre-trained embeddings of users and items. 
   - The keys are user_id (string) and item_id (string), and the values are the corresponding embeddings (numpy.array; dim=32) of user_id and item_id, respectively.
  MLP_model (MLP class): the untrained MLP model.
  epochs (int): number of iterations required for updating the MLP model.
  
  Returns:
  MLP_model (MLP class): the MLP model trained with given training data and pre-trained embeddings.

  Steps for the first code block:
  1. for each training example, retrieve pre-trained embeddings for a user u and item i.
  2. concatenate those embeddings by np.concatenate((user_emb_of_u,item_emb_of_i)) and append it to a list.
  3. convert a list of concatenated embeddings to a PyTorch FloatTensor by torch.FloatTensor(np.array(list)).
  """
  torch.manual_seed(0)
  np.random.seed(0)

  input_emb = []
  ## Add code below [0.5 points] ##
  for example in training_data:
    u=example[0]
    v=example[1]
    input_emb.append(np.concatenate((user_emb[u],item_emb[v])))
  emb_input=torch.FloatTensor(np.array(input_emb))
  #################################
  
  criterion = nn.MSELoss()
  optimizer = torch.optim.Adam(MLP_model.parameters())
  
  for epoch in range(epochs):
    """  
    Steps for the second code block:
    1. set the gradients to zero before backpropragation by optimizer.zero_grad().
    2. call the prediction function of the MLP model by MLP_model(input_data).flatten() and obtain the prediction result for the input data, 
       where the input data is a PyTorch FloatTensor from Step 3 of the previous code block.
    3. obtain the ground-truth rating values from training_data (in torch.FloatTensor format; use torch.FloatTensor(np.array) for type conversion).
    4. compute the loss by loss=criterion(prediction,ground_truth).
    5. perform backpropragation by loss.backward().
    6. update the model parameters by optimizer.step().
    7. compute the differences between the prediction and ground_truth by (prediction-ground_truth).detach().numpy().
    8. compute and print training RMSE using the differences for every 10 epochs (epoch 0, epoch 10, ..., epoch 190).
        - Don't use loss in Step 4 to derive training RMSE
    """
    ## Add code below [1.5 points] ##
    optimizer.zero_grad()
    prediction=MLP_model(emb_input).flatten()
    ground_truth=torch.FloatTensor(training_data[:,2].astype(np.float64))
    loss=criterion(prediction,ground_truth)
    loss.backward()
    optimizer.step()
    difference=(prediction-ground_truth).detach().numpy()
    if epoch%10==0:
      sum = 0
      num = 0
      for error in difference:
        sum += np.square(error)
        num += 1
      train_RMSE = np.sqrt(sum/num)
      print('@ epoch :',epoch)
      print('train_RMSE:',train_RMSE)
    #################################
  return MLP_model

In [215]:
trained_MLP_model = MLP_train(training_data,user_emb,item_emb,MLP_model)

@ epoch : 0
train_RMSE: 0.7276686443456213
@ epoch : 10
train_RMSE: 0.26427001787751536
@ epoch : 20
train_RMSE: 0.20403357031448344
@ epoch : 30
train_RMSE: 0.19665153696338888
@ epoch : 40
train_RMSE: 0.1950174310652665
@ epoch : 50
train_RMSE: 0.18971142885549117
@ epoch : 60
train_RMSE: 0.18791412144294312
@ epoch : 70
train_RMSE: 0.18545279366003994
@ epoch : 80
train_RMSE: 0.18328055667826693
@ epoch : 90
train_RMSE: 0.18133406298295532
@ epoch : 100
train_RMSE: 0.17948955019813978
@ epoch : 110
train_RMSE: 0.17779194069933832
@ epoch : 120
train_RMSE: 0.1762556726172096
@ epoch : 130
train_RMSE: 0.17487426905641276
@ epoch : 140
train_RMSE: 0.17361098129822228
@ epoch : 150
train_RMSE: 0.1724533243367808
@ epoch : 160
train_RMSE: 0.17137432608297912
@ epoch : 170
train_RMSE: 0.17034822884743708
@ epoch : 180
train_RMSE: 0.16936454061878048
@ epoch : 190
train_RMSE: 0.16841782512579018


**[Sanity check]** (*The running time should be less than 30 minutes)       
train_RMSE @ epoch 0 should be around 0.727669.  
train_RMSE @ epoch 10 should be around 0.264270.  
train_RMSE @ epoch 20 should be around 0.204034.  
train_RMSE @ epoch 190 should be around 0.168418.  

#### 4.3 Write a function to compute **test RMSE** of the MLP model. [1.0 points]

Compute and report the test RMSE of the MLP model on test data.  
You can refer to Part 2.3 for the definition of training/test RMSE values.
$$ RMSE = \sqrt{\frac{\sum_{(u,i)\in R} (R_{u,i} - P_{u,i})^2}{N}} $$
where $R$, $P$ are ground-truth and predicted rating respectively, and $N$ is the number of examples.

In [199]:
def test_RMSE_of_MLP(test_data,user_emb,item_emb,MLP_model):
  """
  Arguments: 
  test_data (str-type numpy.array): the test data containing user_id, movie_id, and normalized rating (0-1) information.
  user_emb and item_emb (dictionaries of numpy.array): pre-trained embeddings of users and items. 
   - The keys are user_id (string) and item_id (string), and the values are the corresponding embeddings (numpy.array; dim=32) of user_id and item_id, respectively.
  MLP_model (the MLP class): the trained MLP model.

  Returns:
  test_RMSE (float): the test RMSE of the MLP model.

  Steps for the first code block:
  1. for each test example, retrieve pre-trained embeddings for a user u and item i.
  2. concatenate those embeddings by np.concatenate((user_emb_of_u,item_emb_of_i)) and append it to a list.
  3. convert a list of concatenated embeddings to a PyTorch FloatTensor by torch.FloatTensor(np.array(list))
  """
  
  ## Add code below [0.5 points] ##
  input_emb = []
  for example in test_data:
    u=example[0]
    v=example[1]
    input_emb.append(np.concatenate((user_emb[u],item_emb[v])))
  emb_input=torch.FloatTensor(np.array(input_emb))
  #################################

  """
  Steps for the second code block:
  1. call the prediction function of the trained MLP model by MLP(input_data).flatten() and obtain the prediction result for the input data, 
     where the input data is a PyTorch FloatTensor from Step 3 of the previous code block.
  2. obtain the ground truth rating values from test_data.
  3. compute the differences between the prediction and ground_truth by (prediction-ground_truth).detach().numpy().
  4. compute the test RMSE using the differences and return the value.
  """
  ## Add code below [0.5 points] ##
  prediction=MLP_model(emb_input).flatten()
  ground_truth=torch.FloatTensor(test_data[:,2].astype(np.float64))
  difference=(prediction-ground_truth).detach().numpy()
  sum = 0
  num = 0
  for error in difference:
    sum += np.square(error)
    num += 1
  test_RMSE = np.sqrt(sum/num)
  #################################
  return test_RMSE

In [200]:
print('test RMSE = {:.6f}'.format(test_RMSE_of_MLP(test_data,user_emb,item_emb,trained_MLP_model)))

test RMSE = 0.172493


**[Sanity check]**       
test RMSE = 0.172493