[ Credits ] : Andrew Ng, DeepLearning.AI, Machine Learning Specialization on Coursera

# Collaborative Filtering Recommender System for Movie Recommendations

Collaborative filtering (CF) is a popular technique used in recommendation systems. It operates under the assumption that if two users agree on one issue, they will likely agree on others as well. In the context of movie recommendations, if two users both liked certain movies, they are likely to share similar tastes in other movies as well.

There are two primary types of collaborative filtering:

1. **User-Based CF**: This method finds users that are similar to the targeted user and recommends items based on what those similar users liked.
2. **Item-Based CF**: Instead of looking at user similarities, this approach finds similarities between items. So if a user liked a particular item, other items that are similar to it will be recommended.

## Why use Collaborative Filtering?

1. **Personalization**: Collaborative filtering offers a personalized user experience. Each user gets recommendations based on their unique tastes and preferences.
2. **Scalability**: Modern collaborative filtering methods can handle large datasets efficiently, making them ideal for today's internet-scale applications.
3. **No need for item metadata**: Unlike content-based recommendation systems, CF doesn't require any information about the items. It works purely based on user-item interactions.
4. **Adaptability**: Collaborative filtering models can adapt over time. As more users interact with items, the system becomes more refined and accurate.

## Notation
|General <br />  Notation  | Description| Python (if any) |
|:-------------|:------------------------------------------------------------||
| $r(i,j)$     | scalar; = 1  if user j rated movie i,  = 0  otherwise             ||
| $y(i,j)$     | scalar; = rating given by user j on movie  i    (if r(i,j) = 1 is defined) ||
|$\mathbf{w}^{(j)}$ | vector; parameters for user j ||
|$b^{(j)}$     |  scalar; parameter for user j ||
| $\mathbf{x}^{(i)}$ |   vector; feature ratings for movie i        ||     
| $n_u$        | number of users |num_users|
| $n_m$        | number of movies | num_movies |
| $n$          | number of features | num_features                    |
| $\mathbf{X}$ |  matrix of vectors $\mathbf{x}^{(i)}$         | X |
| $\mathbf{W}$ |  matrix of vectors $\mathbf{w}^{(j)}$         | W |
| $\mathbf{b}$ |  vector of bias parameters $b^{(j)}$ | b |
| $\mathbf{R}$ | matrix of elements $r(i,j)$                    | R |

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

## Data Loading

The data set is derived from the [MovieLens "ml-latest-small"](https://grouplens.org/datasets/movielens/latest/) dataset. It contains 9724 movies and 610 users.

**Preprocess the Data:**
We need to perform some preprocessing to reshape the data into the expected format for collaborative filtering.
- Create a user-movie `matrix Y` from ratings.csv, where `Y[i, j]` is the rating by user j for movie i. If a movie hasn't been rated by a user, fill it with 0.
- Create a `matrix R` from ratings.csv, where `R[i, j]` is 1 if movie i has been rated by user j, and 0 otherwise.

In [2]:
# Load and preprocess data
def load_movielens_data(movies_path, ratings_path):
    movies = pd.read_csv(movies_path)
    ratings = pd.read_csv(ratings_path)
    
    Y = ratings.pivot(index='movieId', columns='userId', values='rating').fillna(0).values
    R = (Y > 0).astype(int)
    
    return Y, R

In [3]:
def initialize_parameters(num_movies, num_users, num_features):
    X = np.random.randn(num_movies, num_features) * 0.01
    W = np.random.randn(num_users, num_features) * 0.01
    b = np.zeros((1, num_users))
    
    return X, W, b

In [4]:
Y, R = load_movielens_data('../data/movies.csv', '../data/ratings.csv')
num_movies, num_users = Y.shape
num_features = 10

In [5]:
assert (Y[(Y > 0)] >= 0.5).all() and (Y[(Y > 0)] <= 5).all(), "Ratings outside the 0.5-5 range found!"

In [6]:
X, W, b = initialize_parameters(num_movies, num_users, num_features)

## Data Exploration

In [7]:
print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)

Y (9724, 610) R (9724, 610)
X (9724, 10)
W (610, 10)
b (1, 610)


In [8]:
print("R[0][0] :", R[0][0])    # equal to 1 if user at index 0 rated movie at index 0 
print("Y[0][0] :", Y[0][0])    # if R[0][0] == 1, this rating should be > 0

print('-'*20)
print("R[0][1] :", R[0][1])    # equal to 0 if user at index 1 has not rated movie at index 0
print("Y[0][1] :", Y[0][1])    # movie rating is 0 since it is not rated by that user

R[0][0] : 1
Y[0][0] : 4.0
--------------------
R[0][1] : 0
Y[0][1] : 0.0


## Collaborative Filtering Learning Algorithm
The collaborative filtering algorithm in the setting of movie recommendations considers a set of $n$-dimensional parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$, $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the model predicts the rating for movie $i$ by user $j$ as $y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(j)}$ . Given a dataset that consists of a set of ratings produced by some users on some movies, we wish to learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}, \mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes the squared error).

### 4.1 Collaborative filtering cost function

To understand how the cost function is derived, the theory is detailed [here.](https://github.com/ali-izhar/machine-learning/tree/main/Theory/Recommender/Collaborative_Filtering)

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \left[ \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+ \underbrace{\left[
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
\right]}_{regularization}
\tag{1}$$

The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \left[ \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+\text{regularization}
$$

The below implementation is vectorized for speed. However, a simple `for loop` implementation is given here:

```python
def collaborative_filtering_cost(X, W, b, Y, R, lambda_):
    nm, nu = Y.shape
    J = 0
    for i in range(nm):
        for j in range(nu):
            if R[i, j] == 1:
                J += (np.dot(W[j, :], X[i, :]) + b[0, j] - Y[i, j]) ** 2
    
    J /= 2
    reg_term = (lambda_ / 2) * (np.sum(W**2) + np.sum(X**2))
    J += reg_term
    
    return J
```

In the code below, we're using the `reduce_sum` function from the tensorflow library. <i>"It computes the sum of elements across dimensions of a tensor."</i> Read more on how to sum across different dimensions [here](https://www.tensorflow.org/api_docs/python/tf/math/reduce_sum).

```python 
# x has a shape of (2, 3) (two rows and three columns):
x = tf.constant([[1, 1, 1], [1, 1, 1]])
x.numpy()
array([[1, 1, 1],
       [1, 1, 1]], dtype=int32)

# sum all the elements
# 1 + 1 + 1 + 1 + 1 + 1 = 6
tf.reduce_sum(x).numpy()
6
```

In [9]:
def collaborative_filtering_cost(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the collaborative-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    
    # multiplication with R automatically filters out the 0 entries
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

## Learning Movie Recommendations

First, we need a list of movies. Next, we'll generate random ratings for these movies and add them to our dataset. After that, we'll normalize the ratings.

In [10]:
def load_movie_list_pd():
    """ 
    Returns a list of movies and a dataframe with the corresponding indices.
    """
    df = pd.read_csv('../data/movies.csv', usecols=['movieId', 'title'], index_col='movieId')
    mlist = df["title"].to_list()
    return mlist, df

In [11]:
all_movies_list, all_movies_df = load_movie_list_pd()

In [12]:
"""
78499,Toy Story 3 (2010)
74508,Persuasion (2007)
7153,Lord of the Rings: The Return of the King, The (2003)
4306,Shrek (2001)
79132,Inception (2010)
8961,Incredibles, The (2004)
4973,Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
4896,Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
5816,Harry Potter and the Chamber of Secrets (2002)
7361,Eternal Sunshine of the Spotless Mind (2004)
86668,Louis Theroux: Law & Disorder (2008)
86922,Nothing to Declare (Rien à déclarer) (2010)
6539,Pirates of the Caribbean: The Curse of the Black Pearl (2003)
"""

# Initialize the ratings with zeros for all movies
my_ratings = np.zeros((num_movies, 1))

movie_ids = [78499, 74508, 7153, 4306, 79132, 8961, 4973, 4896, 5816, 7361, 86668, 86922, 6539]
ratings = [5, 2, 5, 5, 3, 5, 2, 5, 5, 3, 1, 1, 5]

for i in range(len(movie_ids)):
    if movie_ids[i] in all_movies_df.index:
        index_loc = all_movies_df.index.get_loc(movie_ids[i])
        my_ratings[index_loc] = ratings[i]
    else:
        print(f"Warning: Movie ID {movie_ids[i]} not found in the dataset.")

my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]
        
print('\nNew user ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        movie_id = all_movies_df.iloc[i].name
        print(f'Rated {my_ratings[i][0]} for {all_movies_df.loc[movie_id, "title"]}')


New user ratings:

Rated 5.0 for Shrek (2001)
Rated 5.0 for Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Rated 2.0 for Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Rated 5.0 for Harry Potter and the Chamber of Secrets (2002)
Rated 5.0 for Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Rated 5.0 for Lord of the Rings: The Return of the King, The (2003)
Rated 3.0 for Eternal Sunshine of the Spotless Mind (2004)
Rated 5.0 for Incredibles, The (2004)
Rated 2.0 for Persuasion (2007)
Rated 5.0 for Toy Story 3 (2010)
Rated 3.0 for Inception (2010)
Rated 1.0 for Louis Theroux: Law & Disorder (2008)
Rated 1.0 for Nothing to Declare (Rien à déclarer) (2010)


Now, let's add these reviews to $Y$ and $R$ and normalize the ratings.

## Normalize Ratings
Normalize the ratings to ensure that the optimization procedure works better.

In [13]:
def normalize_ratings(Y, R):
    """
    Preprocess data by subtracting mean rating for every movie (every row). Only include real ratings R(i,j)=1. 
    Returns the mean rating for every movie.
    
    - Y*R: Multiplies the ratings matrix Y by the matrix R which indicates whether a movie was rated (R(i,j) = 1) 
        or not (R(i,j) = 0). This has the effect of "zeroing-out" unrated movies, making them not contribute to the sum.
    - np.sum(Y*R, axis=1): Sum the ratings for each movie across all users. A movie is a row in the dataset, therefore,
        add along all cols (axis=1) for each row.
    - np.sum(R, axis=1): Count the number of ratings for each movie.
    - The division computes the average (mean) rating for each movie.
    - reshape(-1,1): Reshapes the resulting array into a column vector.
    - 1e-12: A small number is added to the denominator to prevent division by zero.
    """
    
    Ymean = (np.sum(Y*R, axis=1)/(np.sum(R, axis=1)+1e-12)).reshape(-1,1)
    Ynorm = Y - np.multiply(Ymean, R) 
    return (Ynorm, Ymean)

In [14]:
# Add normalized ratings 
Y = np.hstack((Y, my_ratings))
R = np.hstack((R, (my_ratings > 0).astype(int)))
  
# Normalize all ratings
Ynorm, Ymean = normalize_ratings(Y, R)

## Training the Recommender System

With our data in place, we can now train the recommender system.

In [15]:
def train_recommender(X, W, b, Ynorm, R, learning_rate, num_epochs, lambda_):
    optimizer = tf.optimizers.Adam(learning_rate)

    for epoch in range(num_epochs):
        with tf.GradientTape() as tape:
            J = collaborative_filtering_cost(X, W, b, Ynorm, R, lambda_)
            
        # Calculate Gradients
        grads = tape.gradient(J, [X, W, b])
        
        # Update Parameters using Gradient Descent
        optimizer.apply_gradients(zip(grads, [X, W, b]))
        
        if epoch % 20 == 0:
            print(f'Epoch: {epoch}, Loss: {J:0.2f}')

    return X, W, b

In [16]:
# Re-initialize varialbes since we added new ratings
num_movies, num_users = Y.shape
num_features = 100

In [17]:
# Training parameters
lambda_ = 1
learning_rate = 1e-1
num_epochs = 200

In [22]:
tf.random.set_seed(1234)

# Set Initial Parameters (W, X), use tf.Variable to track these variables
X = tf.Variable(tf.random.normal((num_movies, num_features), dtype=tf.float64), name='X') 

W = tf.Variable(tf.random.normal((num_users, num_features), dtype=tf.float64), name='W')

b = tf.Variable(tf.random.normal((1, num_users), dtype=tf.float64), name='b')

In [23]:
X_trained, W_trained, b_trained = train_recommender(X, W, b, Ynorm, R, learning_rate, num_epochs, lambda_)

Epoch: 0, Loss: 5574885.47
Epoch: 20, Loss: 281720.28
Epoch: 40, Loss: 109026.97
Epoch: 60, Loss: 53495.20
Epoch: 80, Loss: 30611.95
Epoch: 100, Loss: 19556.34
Epoch: 120, Loss: 13611.08
Epoch: 140, Loss: 10195.54
Epoch: 160, Loss: 8143.67
Epoch: 180, Loss: 6867.99


## Movie Recommendations

Let's see the movie recommendations for us based on our given ratings.

To predict the rating of movie $i$ for user $j$, we compute $\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}$. This can be computed for all ratings using matrix multiplication.

In [24]:
def recommend_movies(X, W, b, Ymean, all_movies_list, my_ratings, my_rated):
    """
    Provide personalized movie recommendations and compare original ratings to predictions.
    """
    
    # Make a prediction using trained weights and biases
    p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()
    
    # Restore the mean
    pm = p + Ymean
    my_predictions = pm[:, 0]

    # Rescale predictions to be within 0.5 and 5
    min_pred = np.min(my_predictions)
    max_pred = np.max(my_predictions)
    
    my_predictions = 0.5 + (my_predictions - min_pred) * (4.5 / (max_pred - min_pred))

    # Sort predictions
    ix = tf.argsort(my_predictions, direction='DESCENDING')

    for i in range(17):
        j = ix[i]
        if j not in my_rated:
            print(f'Predicting rating {my_predictions[j]:0.2f} for movie {all_movies_list[j]}')

    print('\n\nOriginal vs Predicted ratings:\n')
    for i in range(len(my_ratings)):
        if my_ratings[i] > 0:
            print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {all_movies_list[i]}')

In [25]:
recommend_movies(X_trained, W_trained, b_trained, Ymean, all_movies_list, my_ratings, my_rated)

Predicting rating 5.00 for movie Taxi Driver (1976)
Predicting rating 4.92 for movie Léon: The Professional (a.k.a. The Professional) (Léon) (1994)
Predicting rating 4.90 for movie 2001: A Space Odyssey (1968)
Predicting rating 4.88 for movie Mystery Science Theater 3000: The Movie (1996)
Predicting rating 4.86 for movie All of Me (1984)
Predicting rating 4.78 for movie Citizen Ruth (1996)
Predicting rating 4.70 for movie Jesus Camp (2006)
Predicting rating 4.65 for movie Hood of Horror (2006)
Predicting rating 4.65 for movie Mortal Kombat: Annihilation (1997)
Predicting rating 4.65 for movie Interstate 60 (2002)
Predicting rating 4.62 for movie Down by Law (1986)
Predicting rating 4.52 for movie Legends of the Fall (1994)
Predicting rating 4.46 for movie Cooler, The (2003)
Predicting rating 4.43 for movie For Richer or Poorer (1997)
Predicting rating 4.42 for movie Soul Plane (2004)
Predicting rating 4.42 for movie Trapped (2002)
Predicting rating 4.35 for movie Ender's Game (2013)




## Conclusion

The model uses collaborative filtering to provide personalized movie recommendations. It can be further enhanced using more sophisticated architectures, but the core idea remains the same: to predict user ratings based on historical data and recommend items accordingly.