<p style="text-align:center">
    <a href="https://skills.network/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML321ENSkillsNetwork817-2022-01-01" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Course Rating Prediction using Neural Networks**


Estimated time needed: **60** minutes


In the previous labs, we have crafted several types of user and item feature vectors.  For example, given a user `i`, we may build its profile feature vector and course rating feature vector, and given an item `j`, we may create its genre vector and user enrollment vectors.



With these explicit features vectors, we can perform machine learning tasks such as calculating the similarities among users or items, finding nearest neighbors, and using dot-product to estimate a rating value. 

The main advantage of using these explicit features is they are highly interpretable and yield very good performance as well. The main disadvantage is we need to spend quite some effort to build and store them.


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-ML321EN-SkillsNetwork/labs/module_4/images/explicit_user_item_features.png)


Is it possible to predict a rating without building explicit feature vectors beforehand?  

Yes, as you may recall, the Non-negative Matrix Factorization decomposes the user-item interaction matrix into user matrix and item matrix, which contain the latent features of users and items and you can simply dot-product them to get an estimated rating.


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-ML321EN-SkillsNetwork/labs/module_4/images/nmf.png)


In addition to NMF, neural networks can also be used to extract the latent user and item features?  In fact,  neural networks are very good at learning patterns from data and are widely used to extract latent features.  When training neural networks, it gradually captures and stores the features within its hidden layers as weight matrices and can be extracted to represent the original data.


In this lab, you will be training neural networks to predict course ratings while simultaneously extracting users' and items' latent features. 


## Objectives


After completing this lab you will be able to:


* Use `tensorflow` to train neural networks to extract the user and item latent features from the hidden's layers  
* Predict course ratings with trained neural networks


----


## Prepare and setup lab environment


Install tensorflow 2.7 if not installed before in your Python environment


and import required libraries:


In [26]:
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import torch
import numpy as np
from math import sqrt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from tensorflow import keras
from tensorflow.keras import layers

In [27]:
# also set a random state
rs = 123

### Load and processing rating dataset


In [28]:
rating_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-ML321EN-SkillsNetwork/labs/datasets/ratings.csv"
rating_df = pd.read_csv(rating_url)
rating_df.head()

Unnamed: 0,user,item,rating
0,1889878,CC0101EN,3.0
1,1342067,CL0101EN,3.0
2,1990814,ML0120ENv3,3.0
3,380098,BD0211EN,3.0
4,779563,DS0101EN,3.0


This is the same rating dataset we have been using in previous lab, which contains the three main columns: `user`, `item`, and `rating`. 


Next, let's figure out how many unique users and items, their total numbers will determine the sizes of one-hot encoding vectors.


In [29]:
num_users = len(rating_df['user'].unique())
num_items = len(rating_df['item'].unique())
print(f"There are total `{num_users}` of users and `{num_items}` items")

There are total `33901` of users and `126` items


It means our each user can be represented as a `33901 x 1` one-hot vector and each item can be represented as `126 x 1` one-hot vector.


The goal is to create a neural network structure that can take the user and item one-hot vectors as inputs and outputs a rating estimation or the probability of interaction (such as the probability of completing a course). 

While training and updating the weights in the neural network, its hidden layers should be able to capture the pattern or features for each user and item. Based on this idea, we can design a simple neural network architecture like the following:


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-ML321EN-SkillsNetwork/labs/module_4/images/embedding_feature_vector.png)


The network inputs are two one-hot encoding vectors, the blue one is for the user and the green one is for the item. Then on top of them, we added two embedding layers. Here embedding means embedding the one-hot encoding vector into a latent feature space. The embedding layer is a fully-connected layer that outputs the embedding feature vectors. For example, the user embedding layer takes `33901 x 1` one-hot vector as input and outputs a `16 x 1` embedding vector.


The embedding layer outputs two embedding vectors, which are similar to Non-negative matrix factorization. Then we could simply dot the product the user and item embedding vector to output a rating estimation.


#### Implementing the recommender neural network using tensorflow 


This network architecture could be defined and implemented as a sub-class inheriting the `tensorflow.keras.Model` super class, let's call it `RecommenderNet()`.


In [30]:
class RecommenderNet(nn.Module):
    """
    PyTorch version of the Keras RecommenderNet.
    """
    def __init__(self, num_users, num_items, embedding_size=16):
        super(RecommenderNet, self).__init__()
        self.user_embedding = nn.Embedding(num_users, embedding_size)
        self.item_embedding = nn.Embedding(num_items, embedding_size)
        self.user_bias      = nn.Embedding(num_users, 1)
        self.item_bias      = nn.Embedding(num_items, 1)

        # He‐normal initialization for embeddings
        nn.init.kaiming_normal_(self.user_embedding.weight, nonlinearity='relu')
        nn.init.kaiming_normal_(self.item_embedding.weight, nonlinearity='relu')

    def forward(self, user_idx, item_idx):
        u_b   = self.user_bias(user_idx).squeeze() # (B,)
        i_b   = self.item_bias(item_idx).squeeze() # (B,)
        u_vec = self.user_embedding(user_idx)      # (B, E)
        i_vec = self.item_embedding(item_idx)      # (B, E)
        dot   = (u_vec * i_vec).sum(dim=1)         # (B,)
        x     = dot + u_b + i_b
        return torch.relu(x)


### TASK: Train and evaluate the RecommenderNet()


Now it's time to train and evaluate the defined `RecommenderNet()`. First, we need to process the original rating dataset a little bit by converting the actual user ids and item ids into integer indices for `tensorflow` to creating the one-hot encoding vectors.


In [31]:
def process_dataset(raw_data):
    
    encoded_data = raw_data.copy()
    
    # Mapping user ids to indices
    user_list = encoded_data["user"].unique().tolist()
    user_id2idx_dict = {x: i for i, x in enumerate(user_list)}
    user_idx2id_dict = {i: x for i, x in enumerate(user_list)}
    
    # Mapping course ids to indices
    course_list = encoded_data["item"].unique().tolist()
    course_id2idx_dict = {x: i for i, x in enumerate(course_list)}
    course_idx2id_dict = {i: x for i, x in enumerate(course_list)}

    # Convert original user ids to idx
    encoded_data["user"] = encoded_data["user"].map(user_id2idx_dict)
    # Convert original course ids to idx
    encoded_data["item"] = encoded_data["item"].map(course_id2idx_dict)
    # Convert rating to int
    encoded_data["rating"] = encoded_data["rating"].values.astype("int")

    return encoded_data, user_idx2id_dict, course_idx2id_dict

In [32]:
encoded_data, user_idx2id_dict, course_idx2id_dict = process_dataset(rating_df)

In [33]:
encoded_data.head()

Unnamed: 0,user,item,rating
0,0,0,3
1,1,1,3
2,2,2,3
3,3,3,3
4,4,4,3


Then we can split the encoded dataset into training and testing datasets.


In [34]:
def generate_train_test_datasets(dataset, scale=True, random_state=42):
    """
    Splits into train (70%), val (10%), test (20%) and (optionally) scales
    the train/val ratings. Returns 6 arrays: X_train, X_val, X_test,
    y_train, y_val, y_test.
    """
    # 1) Features & target
    X = dataset[['user', 'item']].values
    y = dataset['rating'].values

    # 2) Hold out 20% for test
    X_train_val, X_test, y_train_val, y_test = train_test_split(
        X, y, test_size=0.2, random_state=random_state, shuffle=True
    )

    # 3) Split the remaining 80% into 70% train / 10% val
    #    10% of original is 0.125 of the 80%
    X_train, X_val, y_train, y_val = train_test_split(
        X_train_val, y_train_val, test_size=0.125,
        random_state=random_state, shuffle=True
    )

    # 4) Scale train & val only
    if scale:
        scaler = MinMaxScaler()
        # reshape to (-1,1) for scaler, then flatten back
        y_train = scaler.fit_transform(y_train.reshape(-1, 1)).ravel()
        y_val   = scaler.transform(   y_val.reshape(-1,   1)).ravel()
        # leave y_test unscaled

    # 5) Return exactly six arrays
    return X_train, X_val, X_test, y_train, y_val, y_test

# Now this will work without the 2D-array error:
x_train, x_val, x_test, y_train, y_val, y_test = generate_train_test_datasets(encoded_data)


In [35]:
x_train, x_val, x_test, y_train, y_val, y_test = generate_train_test_datasets(encoded_data)

If we take a look at the training input data, it is simply just a list of user indices and item indices, which is a dense format of one-hot encoding vectors.


In [36]:
user_indices = x_train[:, 0]
user_indices

array([ 6483, 10636,  1219, ...,  7055,  1072, 20911])

In [37]:
item_indices = x_train[:, 1]
item_indices

array([ 8, 17, 22, ..., 15, 22,  0])

and training output labels are a list of 0s and 1s indicating if the user has completed a course or not.


In [38]:
y_train

array([1., 1., 1., ..., 1., 1., 1.])

Then we can choose a small embedding vector size to be 16 and create a `RecommenderNet()` model to be trained


In [39]:
embedding_size = 16
model = RecommenderNet(num_users, num_items, embedding_size)

_TODO: Train the RecommenderNet() model_


In [44]:
## WRITE YOUR CODE HERE:

## - call model.compile() method to set up the loss and optimizer and metrics for the model training, you may use
##  - - tf.keras.losses.MeanSquaredError() as training loss
##  - - keras.optimizers.Adam() as optimizer
##  - - tf.keras.metrics.RootMeanSquaredError() as metric

## - call model.fit() to train the model

## - optionally call model.save() to save the model

## - plot the train and validation loss
import math
model     = RecommenderNet(len(user_idx2id_dict), len(course_idx2id_dict), embedding_size=16)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# 2. Training & validation loop
n_epochs = 20
for epoch in range(1, n_epochs+1):
    # — Training (no explicit shuffle)
    total_loss = 0.0
    model.train()
    for (u_i, i_i), rating in zip(x_train, y_train):
        u = torch.tensor([u_i], dtype=torch.long)
        i = torch.tensor([i_i], dtype=torch.long)
        r = torch.tensor([rating], dtype=torch.float32)

        pred = model(u, i)
        loss = criterion(pred, r)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
    train_rmse = math.sqrt(total_loss / len(x_train))
    # …same for validation…
    print(f"Epoch {epoch} → Train RMSE: {train_rmse:.4f}")




Epoch 1 → Train RMSE: 0.7961


KeyboardInterrupt: 

<details>
    <summary>Click here for Hints</summary>
    
When you are fitting a model, dont forget to specify the parameters: `x=x_train, y=y_train`, as well as `batch_size=64`, number of `epochs=10` and of course `validation_data=(x_val, y_val)` you can also define `verbose = 1` which will show you an animated progress for the training progress for each epoch.
    
* You can set  `history = model.fit()` which will give you a "loss" dataframe which will be very useful for ploting the train and validation loss. To plot it, use plt.plot() with `history.history["loss"]` as its parameter for train loss and `history.history["val_loss"]` for validation loss.


_TODO:_ Evaluate the trained model


In [48]:
def evaluate(X, y, model, criterion):
    """
    Compute RMSE of model on the (X,y) dataset.
    X: array of shape (N,2) with [user_idx, item_idx]
    y: array of shape (N,) with true (possibly scaled) ratings
    """
    model.eval()
    total_loss = 0.0
    n_samples  = len(X)
    
    with torch.no_grad():
        for (u_i, i_i), rating in zip(X, y):
            # build single‐sample tensors
            u_tensor = torch.tensor([u_i], dtype=torch.long)
            i_tensor = torch.tensor([i_i], dtype=torch.long)
            r_tensor = torch.tensor([rating], dtype=torch.float32)
            
            # forward + loss
            pred = model(u_tensor, i_tensor)
            loss = criterion(pred, r_tensor)
            total_loss += loss.item()
    
    mse  = total_loss / n_samples
    rmse = math.sqrt(mse)
    return rmse

# Example usage inside your training loop:
for epoch in range(1, n_epochs+1):
    # … your training code …

    train_rmse = evaluate(x_train, y_train, model, criterion)
    val_rmse   = evaluate(x_val,   y_val,   model, criterion)
    print(f"Epoch {epoch}/{n_epochs} — Train RMSE: {train_rmse:.4f}, Val RMSE: {val_rmse:.4f}")


KeyboardInterrupt: 

<details>
    <summary>Click here for Hints</summary>
    
Use `x_test, y_test` as parameters for `model.evaluate()`


### Extract the user and item embedding vectors as latent feature vectors


Now, we have trained the `RecommenderNet()` model and it can predict the ratings with relatively small RMSE. 

If we print the trained model then we can see its layers and their parameters/weights.


In [None]:
model.summary()

AttributeError: 'RecommenderNet' object has no attribute 'summary'

In the `RecommenderNet`, the `user_embedding_layer` and `item_embedding_layer` layers contain the trained weights. Essentially, they are the latent user and item features learned by `RecommenderNet` and will be used to predict the interaction. As such, while training the neural network to predict rating, the embedding layers are simultaneously trained to extract the embedding user and item features.


We can easily get the actual weights using `model.get_layer().get_weights()` methods


In [None]:
# User features
user_latent_features = model.get_layer('user_embedding_layer').get_weights()[0]
print(f"User features shape: {user_latent_features.shape}")

IndexError: list index out of range

In [None]:
user_latent_features[0]

In [None]:
item_latent_features = model.get_layer('item_embedding_layer').get_weights()[0]
print(f"Item features shape: {item_latent_features.shape}")

In [None]:
item_latent_features[0]

Now, each user of the total 33901 users has been transformed into a 16 x 1 latent feature vector and each item of the total 126 has been transformed into a 16 x 1 latent feature vector.


### TASK (Optional): Customize the RecommenderNet to potentially improve the model performance


The pre-defined `RecommenderNet()` is a actually very basic neural network, you are encouraged to customize it to see if model prediction performance will be improved. Here are some directions:
- Hyperparameter tuning, such as the embedding layer dimensions
- Add more hidden layers
- Try different activation functions such as `ReLu`


In [None]:
## WRITE YOUR CODE HERE

## Update RecommenderNet() class

## compile and fit the updated model

## evaluate the updated model


### Summary


In this lab, you have learned and practiced predicting course ratings using neural networks. With a predefined and trained neural network, we can extract or embed users and items into latent feature spaces and further predict the interaction between a user and an item with the latent feature vectors.


## Authors


[Yan Luo](https://www.linkedin.com/in/yan-luo-96288783/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML321ENSkillsNetwork817-2022-01-01)


### Other Contributors


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2021-10-25|1.0|Yan|Created the initial version|


Copyright © 2021 IBM Corporation. All rights reserved.
