---
title: Collaborative Filtering From Scratch
author: "Gaurav Adlakha"
date: "2025-05-15"
categories: [fastai,Deep-Learning,Machine-Learning,Recommender-Systems]
description: "Collaborative filtering system from scratch using fastai and PyTorch"
toc: true
toc-depth: 3
format:
  html:
    code-fold: true
    code-tools: true
    highlight-style: github
    fig-width: 8
    fig-height: 6
---




# Collaborative Filtering From Scratch


## Table of contents

* Introduction to Collaborative Filtering
* The MovieLens Dataset
* Understanding the Data
* Building  Model 
* Improving Model Performance 
* Evaluating Results

## Introduction to Collaborative Filtering

>_Ever wondered how Netflix knows what you might want to watch next? The magic behind that is called collaborative filtering._

Collaborative filtering is one of the most powerful and widespread techniques in recommendation systems. The core idea is beautifully simple: people who agreed in the past are likely to agree again in the future. Instead of analyzing the content of movies or books, collaborative filtering looks at patterns of user behavior.

In this post, we'll build a collaborative filtering system from scratch using PyTorch and fastai. We'll start with the basics and gradually add sophistication to our model. By the end, you'll understand not just how to implement these systems, but why they work and how to improve them.

## The MovieLens Dataset

We'll use the classic MovieLens 100k dataset, which contains 100,000 movie ratings from 943 users on 1,682 movies. This dataset is perfect for learning collaborative filtering:

```python
from fastai.imports import *
from fastai.collab import *
from fastai.tabular.all import *
set_seed(42)
```

Let's load the dataset and take a look at the files it contains:

```python
path = untar_data(URLs.ML_100k)
path.ls()
```

The MovieLens dataset comes with several files. The main one we're interested in is `u.data`, which contains the ratings. Let's load it into a pandas DataFrame:

```python
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
                    names=['user','movie','rating','timestamp'])
ratings.head()
```

This gives us a DataFrame with four columns:
- `user`: A unique identifier for each user
- `movie`: A unique identifier for each movie
- `rating`: The rating given by the user (1-5 stars)
- `timestamp`: When the rating was given

## Understanding the Data

Looking at the first few rows, we can see each rating is a single interaction between a user and a movie. For example, user 196 gave movie 242 a rating of 3 stars.

```
   user  movie  rating  timestamp
0   196    242       3  881250949
1   186    302       3  891717742
2    22    377       1  878887116
3   244     51       2  880606923
4   166    346       1  886397596
```

While this format is good for storing data efficiently, it's not the most intuitive way to understand the data. To get a better feel for the patterns, we can reshape it into a user-movie matrix, where each row represents a user, each column represents a movie, and the cells contain ratings.

In the full matrix, there would be 943 rows (users) and 1,682 columns (movies). That's over 1.5 million cells, but our dataset only has 100,000 ratings - meaning our matrix is very sparse, with most cells empty. This sparsity is a key challenge in collaborative filtering.



## Visualizing the User-Movie Matrix

To better understand our data, let's create a more manageable view of the ratings matrix. We'll focus on the most active users and most-rated movies:

```python
def get_top_ratings(df, n_users=20, n_movies=20):
    "Return crosstab of ratings from top users for top movies"
    top_users = df.user.value_counts().index[:n_users]
    top_movies = df.movie.value_counts().index[:n_movies]
    filtered = df[(df.user.isin(top_users)) & (df.movie.isin(top_movies))]
    return pd.crosstab(filtered.user, filtered.movie, filtered.rating, aggfunc='mean')

ratings_matrix = get_top_ratings(ratings)
```

This function creates a cross-tabulation of our data, showing how the most active 20 users rated the 20 most-rated movies. Let's take a look at this matrix:

```python
ratings_matrix
```

Now we can see the ratings in a more intuitive format. Each row represents a user, each column represents a movie, and the values are the ratings (1-5 stars). The `NaN` values indicate that the user hasn't rated that particular movie.

Looking at this matrix, we can already spot some patterns:
- Some users tend to give higher ratings overall (like user 276)
- Some movies receive consistently high ratings (like movie 50)
- There are many missing values, which is typical in recommendation systems

## The Core of Collaborative Filtering: Vector Similarity

The fundamental idea behind collaborative filtering is to represent users and items as vectors in a shared space, then measure their similarity. When these vectors point in similar directions, it suggests a user will like a movie.

Let's see a simple example of how this works:

```python
movie = np.array([0.98, 0.9, -0.9])
user = np.array([0.9, 0.8, -0.6])
(user*movie).sum()
```

This gives us a similarity score of approximately 2.14, which is quite high. The vectors are aligned in a similar direction, suggesting this user would like this movie.

In this example:
- The movie vector might represent features like [SciFi, Action, Romance]
- The user vector represents how much they like each feature
- Both the user and movie have positive values for SciFi and Action, and negative values for Romance
- The dot product gives us a measure of their similarity

This is a simplified version of what happens in our recommendation model. In practice:

1. We don't manually define these features
2. We let the model learn them from the data
3. We use many more dimensions (typically 5-100)

The beauty of matrix factorization models is that they automatically discover these latent features that explain user preferences. The model might discover dimensions like "gritty vs. lighthearted" or "intellectual vs. action-packed" without us explicitly defining them.




## When Users and Movies Don't Match

We've seen what happens when a user's preferences align with a movie's features. But what about when they don't? Let's look at an example of a poor match:

```python
movie = np.array([0.98,0.9,-0.9])
user = np.array([0.1,-1.0,-0.6])
(user*movie).sum()
```

This gives us a negative similarity score of -0.262, suggesting this user would probably dislike this movie.

In this case:
- The user has a weak positive preference for the first feature (0.1)
- The user strongly dislikes the second feature (-1.0)
- Both user and movie have negative values for the third feature, which actually contributes positively to the similarity
- Overall, the negative similarity suggests a mismatch

This simple example illustrates how vector similarity captures preferences. In our full model, these vectors will be learned from the data rather than manually specified.

## Adding Movie Titles

So far, we've been working with movie IDs, which aren't very informative. Let's load the movie titles from the dataset:

```python
movies= pd.read_csv(path/'u.item',delimiter='|',header=None,encoding='latin1',usecols=[0,1],names=['movie','title'])
movies.head()
```

Now we can see that movie 1 is "Toy Story (1995)", movie 2 is "GoldenEye (1995)", and so on. This will make our recommendations more interpretable.

Let's remind ourselves of the structure of our ratings data:

```python
ratings
```

We have 100,000 ratings from users on movies. Now that we have the movie titles, we could merge them into our ratings dataframe to make it more readable.


## Combining Ratings with Movie Titles

Now let's merge our ratings data with the movie titles to make our dataset more interpretable:

```python
ratings= ratings.merge(movies)
ratings
```

Perfect! Now our ratings dataframe includes the movie titles alongside the IDs. This makes it much easier to understand what movies users are rating. For example, we can see that user 196 gave "Kolya (1996)" a rating of 3, and user 716 gave "Back to the Future (1985)" a perfect 5.

## Setting Up for Model Training

Now, let's create a fastai `CollabDataLoaders` object, which will handle the data preparation for our collaborative filtering model:

```python
dls= CollabDataLoaders.from_df(ratings, item_name='title',bs=64)
dls.show_batch()
```


The `CollabDataLoaders` class takes care of several important preprocessing steps:
1. Splitting the data into training and validation sets
2. Converting user IDs and movie titles into categorical variables
3. Creating mini-batches for efficient training
4. Handling the sparse nature of the data

Once we have our DataLoader set up, we can examine some basic information about our dataset:

```python
n_users= len(dls.classes['user'])
n_movies = len(dls.classes['title'])
n_movies
```

This would tell us how many unique users and movies we have in our dataset. When the code runs successfully, we'd see that we have 943 users and 1,665 movies with ratings.

The fastai `CollabDataLoaders` provides a convenient abstraction over the data preparation process. It automatically handles the conversion of our raw dataframe into a format suitable for training collaborative filtering models, saving us from having to write a lot of boilerplate code.


## Creating Latent Factors

Now that our data is prepared, let's explore the concept of latent factors, which are at the heart of collaborative filtering. We'll start by manually creating random factors for our users and movies:

```python
n_users
```

We have 944 users in our dataset. Now let's create random latent factors for both users and movies:

```python
user_factor = torch.randn(n_users,5)
movie_factor= torch.randn(n_movies,5)
```

We've created two matrices:
- `user_factor`: A matrix of size (944, 5) where each row represents a user's latent factors
- `movie_factor`: A matrix of size (1665, 5) where each row represents a movie's latent factors

The number 5 here represents the dimensionality of our latent space. We're saying that we want to represent users and movies in a 5-dimensional space, where each dimension captures some aspect of user preferences or movie characteristics.

Let's confirm the shapes of these matrices:

```python
user_factor.shape
```

```python
movie_factor.shape
```

These outputs confirm that we have the expected shapes: 944 users with 5 factors each, and 1665 movies with 5 factors each.

## Understanding Embeddings

In deep learning, these latent factors are often called "embeddings". An embedding is a way to represent categorical data (like user IDs or movie titles) as continuous vectors in a lower-dimensional space.

One way to think about embeddings is as lookups in a table. For example, if we want to get the embedding for user 5:

```python
torch.embedding(user_factor,torch.tensor(5))
```

This gives us the 5-dimensional vector for user 5. In a trained model, these vectors would capture meaningful patterns about users' preferences and movies' characteristics, but right now they're just random values.

The key insight of matrix factorization for collaborative filtering is that we can learn these embeddings from the rating data. The model will adjust these vectors during training so that the dot product between a user vector and a movie vector approximates the user's rating for that movie.

## Using PyTorch's Embedding Layer

Instead of manually creating embedding matrices, PyTorch provides a dedicated `Embedding` layer that's designed specifically for this purpose. Let's create an embedding layer for our users:

```python
user_emb= nn.Embedding(n_users,5)
```

The `nn.Embedding` module creates a lookup table of embeddings. In this case, we're creating embeddings for our 944 users, with each embedding being a 5-dimensional vector.

Unlike our manually created `user_factor`, this embedding layer is a proper PyTorch module that can be trained with gradient descent. It has weights that will be updated during training to minimize the prediction error.

Let's look at the embedding for user 10:

```python
user_emb.weight[10]
```

We can see that PyTorch has initialized these embeddings with random values. The `grad_fn=<SelectBackward0>` indicates that PyTorch is tracking gradients for these parameters, which means they can be updated during training.

For comparison, let's look at our manually created `user_factor`:

```python
user_factor
```

While both `user_emb` and `user_factor` represent the same concept (embeddings for users), the `nn.Embedding` module provides additional functionality that makes it more suitable for deep learning models.

Let's compare the embedding for user 10 in both representations:

```python
user_factor[10]
```

The values are different because they were randomly initialized, but the concept is the same. Both are 5-dimensional vectors representing user 10's preferences in our latent space.

Finally, let's look at what the `user_emb` object actually is:

```python
user_emb
```

This confirms that `user_emb` is an `Embedding` module with 944 embeddings, each of dimension 5.

The advantage of using PyTorch's `Embedding` layer is that it's designed specifically for this use case and integrates seamlessly with PyTorch's autograd system for computing gradients. It also provides efficient implementations of operations like looking up embeddings for batches of indices, which is exactly what we need for our collaborative filtering model.

## The Mathematics Behind Embeddings

There's an elegant mathematical relationship between embeddings and one-hot encodings that's worth understanding:

>_Taking the dot product with a one-hot encoding of a vector is the same as looking up that vector at a particular index_.

Let's see this in action. First, let's create a one-hot encoding for index 3 in a vector of length 100:

```python
one_hot(3,100)
```

This creates a vector with all zeros except for a single 1 at index 3. The `torch.uint8` data type indicates these are binary values.

To use this one-hot vector in matrix operations, we need to convert it to floating point:

```python
one_hot(3,100).float()
```

Now, here's the key insight: when we multiply a matrix by a one-hot vector, we're effectively selecting a single row from that matrix. This is precisely what happens when we look up an embedding.

For example, if we wanted to get the embedding for user 10, we could do it in two ways:

1. Direct lookup (what we did earlier):
```python
user_emb.weight[10]
```

2. Using matrix multiplication with a one-hot vector (mathematically equivalent):
```python
# Not shown in the notebook, but would be equivalent to:
# user_emb.weight.t() @ one_hot(10, n_users).float()
```

This mathematical equivalence is important because it helps us understand what's happening when we use embedding layers in deep learning models.

Let's look at our user embedding again to remind ourselves what it contains:

```python
user_emb
```

The embedding layer contains weights for each user (944 in total), with each weight being a 5-dimensional vector. During training, these weights will be updated to better predict the ratings.

This understanding of embeddings as rows in a matrix that can be selected either through direct indexing or through multiplication with one-hot vectors provides a solid foundation for understanding how collaborative filtering works at a mathematical level.

## Digging Deeper into Embeddings

Let's further explore the relationship between embeddings and one-hot encodings by examining the shapes of our tensors:

```python
user_emb.weight.shape
```

Our embedding weight matrix has shape `[944, 5]`, which means we have 944 embeddings (one for each user) and each embedding is a 5-dimensional vector.

Now let's look at the shape of a one-hot encoded vector:

```python
one_hot(10,n_users).float().shape
```

This gives us a vector of shape `[944]`, which makes sense because we're creating a one-hot encoding for user 10 out of 944 total users.

Now, let's verify our earlier claim that matrix multiplication with a one-hot vector is equivalent to directly indexing the embedding:

```python
user_emb.weight.t() @ one_hot(10,n_users).float()
```

This matrix multiplication gives us the same 5-dimensional vector we saw earlier when we directly indexed `user_emb.weight[10]`. The `grad_fn=<MvBackward0>` indicates that PyTorch is tracking gradients for this operation, which is important for training.

## What is an Embedding?

An embedding is a learned mapping from discrete objects (like words, users, or movies) to vectors of continuous numbers. In our case, we're learning embeddings for users and movies to represent them in a shared latent space.

Let's create another embedding layer to further explore this concept:

```python
u_e= Embedding(944, 5)
```

Here we've created a new embedding layer `u_e` with the same dimensions as our previous `user_emb`. However, since PyTorch initializes embedding weights randomly, the actual values will be different:

```python
u_e.weight[10]
```

Indeed, we see different values for user 10's embedding in this new layer.

## Working with FastAI DataLoaders

Now, let's shift our focus back to the data preparation aspect. We previously created a `CollabDataLoaders` object, which is FastAI's way of organizing data for collaborative filtering models.

Let's examine our data loaders:

```python
dls
```

This shows us our `TabularDataLoaders` object, which contains both training and validation data loaders.

We can access the training data loader specifically:

```python
dls[0]
```

This gives us the training data loader, which is a `TabWeightedDL` object.

Let's look at our data loaders again:

```python
dls
```

Finally, let's grab a single batch of data to see what our model will be working with during training:

```python
batch = dls.one_batch()
```

This gives us a batch of data containing user IDs, movie IDs, and the corresponding ratings. Our model will use the user and movie IDs to look up their respective embeddings, compute the dot product, and try to predict the rating.

Understanding these data structures is crucial for building and training our collaborative filtering model. The embeddings provide a way to represent users and movies in a shared space, and the data loaders provide a way to efficiently feed this data to our model during training.

## Building Our Collaborative Filtering Model

Now that we understand the data and the concept of embeddings, let's build a complete collaborative filtering model. First, let's remind ourselves of the dimensions of our problem:

```python
944,1665
```

This confirms we have 944 users and 1665 movies in our dataset.

Let's examine what's in our batch of data:

```python
batch[0][:,1]
```

This shows the movie IDs in our batch. Each number corresponds to a different movie in our dataset.

Now, let's create embedding layers for both users and movies, and see how they interact with our data:

```python
us_em=Embedding(944,5)
mo_em=Embedding(1665,5)

batch = dls.one_batch()

x= batch[0][:,0]
y= batch[0][:,1]

print(us_em(x).shape, mo_em(y).shape)
```

This outputs `torch.Size([64, 5]) torch.Size([64, 5])`, which tells us:
- We have a batch of 64 samples
- For each sample, we have a 5-dimensional embedding for both the user and the movie

## Creating a Neural Network for Collaborative Filtering

Now let's define a simple neural network that implements the collaborative filtering algorithm:

```python
class CollabNN(nn.Module):
    "Simple collaborative filtering model with embeddings"
    def __init__(self, n_users, n_items, n_factors=5):
        super().__init__()
        self.user_factors = nn.Embedding(n_users, n_factors)
        self.item_factors = nn.Embedding(n_items, n_factors)
        
    def forward(self, x):
        users, items = x[:,0], x[:,1]
        u_embs = self.user_factors(users)
        i_embs = self.item_factors(items)
        return (u_embs * i_embs).sum(dim=1)
```

This model:
1. Takes user and item IDs as input
2. Looks up the embeddings for each
3. Computes the element-wise product of the embeddings
4. Sums up the resulting vector to get a single number (the predicted rating)

Let's instantiate our model and get a batch of data:

```python
model = CollabNN(n_users, n_movies)
batch = dls.one_batch()
```

Now, let's see what predictions our (untrained) model makes for this batch:

```python
model(batch[0])
```

These are the raw predictions from our model for the 64 samples in our batch. Since the model hasn't been trained yet, these predictions are essentially random.

Let's check the shape of our predictions:

```python
model(batch[0]).shape
```

This confirms that we get one prediction per sample in our batch (64 in total).

## Training the Model

Now that we have our model defined, let's train it using fastai's `Learner` class:

```python
mdl =CollabNN(n_users, n_movies)
learner = Learner(dls, mdl, loss_func=MSELossFlat())
```

We're using Mean Squared Error (MSE) as our loss function, which is appropriate for a regression task like predicting ratings.

Let's train our model for 5 epochs using the one-cycle policy, which is a learning rate schedule that helps models train faster and better:

```python
learner.fit_one_cycle(5, 5e-3)
```

This trains our model for 5 epochs with a maximum learning rate of 0.005. The training progress shows both the training and validation loss decreasing over time, which is a good sign that our model is learning.

By the end of training, our model has achieved a training loss of 2.32 and a validation loss of 2.79. This is the mean squared error between our predictions and the actual ratings, so a value around 2-3 indicates that our predictions are typically off by about 1.5 stars on average.

Let's grab a batch of data for further analysis:

```python
x,y = dls.one_batch()
```

This gives us the inputs (user and movie IDs) and targets (ratings) from a batch of our data.

We've successfully built and trained a collaborative filtering model! This model learns embeddings for users and movies that capture their latent characteristics, and uses these embeddings to predict how users would rate movies they haven't seen yet.

## Analyzing Our Model's Performance

After training our model, it's helpful to analyze its performance more closely. Let's create a function to display the training and validation losses:

```python
def get_training_losses(learner):
    """Display training and validation losses from a fastai Learner as a DataFrame."""
    losses = learner.recorder.values
    train_losses = [x[0] for x in losses]
    valid_losses = [x[1] for x in losses]
    
    return pd.DataFrame({
        'Epoch': range(1, len(train_losses)+1),
        'Training Loss': train_losses,
        'Validation Loss': valid_losses
    })
```

Let's use this function to examine our model's learning progress:

```python
get_training_losses(learner)
```

This shows us how both the training and validation losses decreased over the 5 epochs of training. We can see that both losses drop significantly in the first few epochs and then start to level off, suggesting that our model is learning effectively.

## Improving Our Model with Output Range Constraints

One issue with our current model is that it can output any value, while movie ratings are typically constrained to a specific range (1-5 stars). Let's improve our model by adding a sigmoid activation with a specified output range:

```python
class CollabNN(nn.Module):
    "Simple collaborative filtering model with embeddings"
    def __init__(self, n_users, n_items, n_factors=5,y_range=(0,5.5)):
        super().__init__()
        self.user_factors = nn.Embedding(n_users, n_factors)
        self.item_factors = nn.Embedding(n_items, n_factors)
        self.y_range = y_range

        
    def forward(self, x):
        users, items = x[:,0], x[:,1]
        u_embs = self.user_factors(users)
        i_embs = self.item_factors(items)
        return sigmoid_range((u_embs * i_embs).sum(dim=1) , *self.y_range)
```

The key change here is the use of `sigmoid_range`, which applies a sigmoid activation and scales the output to our desired range (0-5.5 in this case). This ensures our predictions are within a sensible range for movie ratings.

Let's train this improved model:

```python
mdl =CollabNN(n_users, n_movies)
learner = Learner(dls, mdl, loss_func=MSELossFlat())
```

```python
learner.fit_one_cycle(5, 5e-3)
```

Looking at the training progress, we can see that this model starts with a higher loss but quickly improves, eventually reaching a lower validation loss than our previous model. This suggests that constraining the output range helps the model make more accurate predictions.

Let's check the final losses:

```python
get_training_losses(learner)
```

The final training loss is around 1.18 and the validation loss is around 1.40, which is significantly better than our previous model. This means our predictions are now typically off by about 1.2 stars on average, instead of 1.5 stars.

## Adding User and Item Bias Terms

We can further improve our model by adding bias terms for users and items. Some users tend to rate all movies higher or lower than average, and some movies tend to receive higher or lower ratings regardless of who's rating them. Let's capture these biases:

```python
class ModifiedCollabNN(nn.Module):
    "Collaborative filtering model with embeddings and bias terms"
    def __init__(self, n_users, n_items, n_factors=5, y_range=(0,5.5)):
        super().__init__()
        self.user_factors = nn.Embedding(n_users, n_factors)
        self.item_factors = nn.Embedding(n_items, n_factors)
        self.user_bias = nn.Embedding(n_users, 1)
        self.item_bias = nn.Embedding(n_items, 1)
        self.y_range = y_range
        
    def forward(self, x):
        users, items = x[:,0], x[:,1]
        u_embs = self.user_factors(users)
        i_embs = self.item_factors(items)
        u_bias = self.user_bias(users).squeeze()
        i_bias = self.item_bias(items).squeeze()
        dot = (u_embs * i_embs).sum(dim=1)
        return sigmoid_range((dot + u_bias + i_bias),  *self.y_range)
```

This model adds two new embedding layers: `user_bias` and `item_bias`, which capture the tendency of users to rate higher or lower, and the tendency of items to receive higher or lower ratings.

Let's train this enhanced model:

```python
mdl =ModifiedCollabNN(n_users, n_movies)
learner = Learner(dls, mdl, loss_func=MSELossFlat())
```

```python
learner.fit_one_cycle(5, 5e-3)
```

Looking at the training progress, we can see that this model achieves an even lower loss than our previous models, with a final training loss around 0.96 and validation loss around 1.17.

Let's check the final losses:

```python
get_training_losses(learner)
```

The improved performance confirms that accounting for user and item biases helps our model make more accurate predictions. This is a common enhancement in collaborative filtering systems and often leads to substantial improvements.

## Further Enhancing Our Model

Looking at the remaining cells in the notebook, we can see that there's one more enhancement we can make to our model - adding a global bias term. Let's examine this approach:

```python
class ModifiedCollabNN(nn.Module):
    "Collaborative filtering model with embeddings and bias terms"
    def __init__(self, n_users, n_items, n_factors=5, y_range=(0,5.5)):
        super().__init__()
        self.user_factors = nn.Embedding(n_users, n_factors)
        self.item_factors = nn.Embedding(n_items, n_factors)
        self.user_bias = nn.Embedding(n_users, 1)
        self.item_bias = nn.Embedding(n_items, 1)
        self.bias = nn.Parameter(torch.zeros(1))
        self.y_range = y_range
        
    def forward(self, x):
        users, items = x[:,0], x[:,1]
        u_embs = self.user_factors(users)
        i_embs = self.item_factors(items)
        u_bias = self.user_bias(users).squeeze()
        i_bias = self.item_bias(items).squeeze()
        dot = (u_embs * i_embs).sum(dim=1)
        return sigmoid_range(dot + u_bias + i_bias + self.bias, *self.y_range)
```

This model adds one more parameter compared to our previous version: a global bias term (`self.bias`). This term captures the overall average rating in the dataset, while the user and item biases capture deviations from this average.

Let's train this enhanced model:

```python
mdl =ModifiedCollabNN(n_users, n_movies)
learner = Learner(dls, mdl, loss_func=MSELossFlat())
```

Now let's train the model, but this time we'll add weight decay to help prevent overfitting:

```python
learner.fit_one_cycle(5, 5e-3, wd=0.1)
```

Weight decay (also known as L2 regularization) helps prevent the model from overfitting by penalizing large weights. This is particularly important in collaborative filtering, where we have many parameters and relatively sparse data.

Looking at the training progress, we can see that this model achieves an even lower loss than our previous models, with a final training loss around 0.76 and validation loss around 0.87.

Let's examine the training losses more closely:

```python
losses = learner.recorder.values

# Get training and validation losses (first two columns of values)
train_losses = [x[0] for x in losses]
valid_losses = [x[1] for x in losses]

# Create a dataframe to display them
pd.DataFrame({
    'Epoch': range(1, len(train_losses)+1),
    'Training Loss': train_losses,
    'Validation Loss': valid_losses
})
```

This gives us a clear view of how our model's performance improved over the training epochs. The training loss decreased from 0.87 to 0.76, while the validation loss decreased from 1.02 to 0.87. This is a significant improvement over our earlier models.

The addition of the global bias term and the use of weight decay have further improved our model's performance. The global bias term helps the model capture the overall average rating, while weight decay helps prevent overfitting by keeping the weights small.

Our final model now includes:
1. User and movie embeddings to capture latent factors
2. User and movie bias terms to capture individual tendencies
3. A global bias term to capture the overall average
4. Output range constraints to ensure predictions are within the valid range
5. Weight decay to prevent overfitting

This is a comprehensive collaborative filtering model that should provide good recommendations for users based on their past ratings.