# Collaborative Filtering Recommender System for Movie Recommendations

Collaborative filtering (CF) is a popular technique used in recommendation systems. It operates under the assumption that if two users agree on one issue, they will likely agree on others as well. In the context of movie recommendations, if two users both liked certain movies, they are likely to share similar tastes in other movies as well.

There are two primary types of collaborative filtering:

1. **User-Based CF**: This method finds users that are similar to the targeted user and recommends items based on what those similar users liked.
2. **Item-Based CF**: Instead of looking at user similarities, this approach finds similarities between items. So if a user liked a particular item, other items that are similar to it will be recommended.

## Why use Collaborative Filtering?

1. **Personalization**: Collaborative filtering offers a personalized user experience. Each user gets recommendations based on their unique tastes and preferences.
2. **Scalability**: Modern collaborative filtering methods can handle large datasets efficiently, making them ideal for today's internet-scale applications.
3. **No need for item metadata**: Unlike content-based recommendation systems, CF doesn't require any information about the items. It works purely based on user-item interactions.
4. **Adaptability**: Collaborative filtering models can adapt over time. As more users interact with items, the system becomes more refined and accurate.

## Notation
|General <br />  Notation  | Description| Python (if any) |
|:-------------|:------------------------------------------------------------||
| $r(i,j)$     | scalar; = 1  if user j rated movie i,  = 0  otherwise             ||
| $y(i,j)$     | scalar; = rating given by user j on movie  i    (if r(i,j) = 1 is defined) ||
|$\mathbf{w}^{(j)}$ | vector; parameters for user j ||
|$b^{(j)}$     |  scalar; parameter for user j ||
| $\mathbf{x}^{(i)}$ |   vector; feature ratings for movie i        ||     
| $n_u$        | number of users |num_users|
| $n_m$        | number of movies | num_movies |
| $n$          | number of features | num_features                    |
| $\mathbf{X}$ |  matrix of vectors $\mathbf{x}^{(i)}$         | X |
| $\mathbf{W}$ |  matrix of vectors $\mathbf{w}^{(j)}$         | W |
| $\mathbf{b}$ |  vector of bias parameters $b^{(j)}$ | b |
| $\mathbf{R}$ | matrix of elements $r(i,j)$                    | R |

In [37]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

## Data Loading

The data set is derived from the [MovieLens "ml-latest-small"](https://grouplens.org/datasets/movielens/latest/) dataset. We need to perform some preprocessing to reshape the data into the expected format for collaborative filtering.

**Preprocess the Data:**
- Create a user-movie matrix Y from ratings.csv, where Y[i, j] is the rating by user j for movie i. If a movie hasn't been rated by a user, fill it with 0.
- Create a matrix R from ratings.csv, where R[i, j] is 1 if movie i has been rated by user j, and 0 otherwise.

In [7]:
def load_movielens_data(movies_path, ratings_path):
    movies = pd.read_csv(movies_path)
    ratings = pd.read_csv(ratings_path)
    
    # Create a user-movie matrix
    # Each row corresponds to a movie, Each column corresponds to a user.
    Y = ratings.pivot(index='movieId', columns='userId', values='rating').fillna(0).values
    
    # Create R matrix
    R = Y.copy()
    R[R > 0] = 1
    
    return Y, R

In [8]:
def initialize_parameters(num_movies, num_users, num_features):
    # Initialize movie features matrix X
    X = np.random.randn(num_movies, num_features) * 0.01

    # Initialize user features matrix W
    W = np.random.randn(num_users, num_features) * 0.01

    # Initialize user biases b
    b = np.zeros((1, num_users))
    
    return X, W, b

In [9]:
Y, R = load_movielens_data('data/movies.csv', 'data/ratings.csv')

In [10]:
num_movies, num_users = Y.shape
num_features = 10  # or any other value depending on your needs

# Initialize X, W, and b
X, W, b = initialize_parameters(num_movies, num_users, num_features)

## Data Exploration

In [11]:
print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)
print("num_features", num_features)
print("num_movies",   num_movies)
print("num_users",    num_users)

Y (9724, 610) R (9724, 610)
X (9724, 10)
W (610, 10)
b (1, 610)
num_features 10
num_movies 9724
num_users 610


## Collaborative Filtering Learning Algorithm
The collaborative filtering algorithm in the setting of movie recommendations considers a set of $n$-dimensional parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$, $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the model predicts the rating for movie $i$ by user $j$ as $y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(j)}$ . Given a dataset that consists of a set of ratings produced by some users on some movies, we wish to learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}, \mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes the squared error).

### 4.1 Collaborative filtering cost function

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \left[ \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+ \underbrace{\left[
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
\right]}_{regularization}
\tag{1}$$

The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \left[ \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+\text{regularization}
$$

In [12]:
def collaborative_filtering_cost(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the collaborative-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

In [None]:
# def collaborative_filtering_cost(X, W, b, Y, R, lambda_):
#     """
#     For loop implementation.
#     """
#     nm, nu = Y.shape
#     J = 0

#     for i in range(nm):
#         for j in range(nu):
#             if R[i, j] == 1:
#                 J += (np.dot(W[j, :], X[i, :]) + b[0, j] - Y[i, j]) ** 2
    
#     J /= 2
#     reg_term = (lambda_ / 2) * (np.sum(W**2) + np.sum(X**2))
#     J += reg_term
    
#     return J

## Learning Movie Recommendations

In [25]:
def load_movie_list_pd(n=None):
    """ 
    Returns a list and a dataframe with an index of movies. 
    If n is provided, it loads only n random movies.
    """
    df = pd.read_csv('data/movies.csv', usecols=['movieId', 'title'], index_col='movieId')
    
    # If n is specified, sample n random rows from the dataframe
    if n:
        df = df.sample(n)
    
    mlist = df["title"].to_list()
    return mlist, df

In [26]:
movie_list, movie_list_df = load_movie_list_pd(20)

In [27]:
movie_list_df

Unnamed: 0_level_0,title
movieId,Unnamed: 1_level_1
25795,Dinner at Eight (1933)
26375,Silver Streak (1976)
7616,Body Double (1984)
61236,Waltz with Bashir (Vals im Bashir) (2008)
52241,"Lookout, The (2007)"
159858,The Conjuring 2 (2016)
91842,Contraband (2012)
31030,I Remember Mama (1948)
185029,A Quiet Place (2018)
165103,Keeping Up with the Joneses (2016)


In [33]:
# Initialize the ratings with zeros for all movies
my_ratings = np.zeros(num_movies)

# Generate ratings, with a bias towards rating of 5
ratings_choices = [1, 2, 3, 4, 5]
weights = [0.1, 0.1, 0.1, 0.2, 0.5]  # Here, 50% chance for a 5 rating

for idx, movie_id in enumerate(movie_list_df.index):
    my_ratings[idx] = np.random.choice(ratings_choices, p=weights)

# Now print the rated movies
print('\nNew user ratings:\n')
for idx, movie_id in enumerate(movie_list_df.index):
    if my_ratings[idx] > 0:  # Only printing movies which got rated
        print(f'Rated {int(my_ratings[idx])} for {movie_list_df.loc[movie_id, "title"]}')


New user ratings:

Rated 5 for Dinner at Eight (1933)
Rated 5 for Silver Streak (1976)
Rated 4 for Body Double (1984)
Rated 4 for Waltz with Bashir (Vals im Bashir) (2008)
Rated 1 for Lookout, The (2007)
Rated 5 for The Conjuring 2 (2016)
Rated 5 for Contraband (2012)
Rated 5 for I Remember Mama (1948)
Rated 5 for A Quiet Place (2018)
Rated 4 for Keeping Up with the Joneses (2016)
Rated 5 for Hatari! (1962)
Rated 4 for Last Castle, The (2001)
Rated 4 for Chain of Fools (2000)
Rated 5 for Return of Martin Guerre, The (Retour de Martin Guerre, Le) (1982)
Rated 5 for Caligula (1979)
Rated 5 for Commitments, The (1991)
Rated 4 for Onegin (1999)
Rated 1 for What Love Is (2007)
Rated 4 for Vanilla Sky (2001)
Rated 5 for Farewell My Concubine (Ba wang bie ji) (1993)


Now, let's add these reviews to $Y$ and $R$ and normalize the ratings.

## Normalize Ratings
Normalize the ratings to ensure that the optimization procedure works better.

In [34]:
def normalize_ratings(Y, R):
    """
    Preprocess data by subtracting mean rating for every movie (every row). Only include real ratings R(i,j)=1.
    [Ynorm, Ymean] = normalize_ratings(Y, R) normalized Y so that each movie has a rating of 0 on average. 
    Unrated moves then have a mean rating (0) 
    Returns the mean rating in Ymean.
    
    * Y*R: Multiplies the ratings matrix Y by the matrix R which indicates whether a movie was rated (R(i,j) = 1) 
        or not (R(i,j) = 0). This has the effect of "zeroing-out" unrated movies, making them not contribute to the sum.
    * np.sum(Y*R, axis=1): Sum the ratings for each movie across all users.
    * np.sum(R, axis=1): Count the number of ratings for each movie.
    * The division computes the average (mean) rating for each movie.
    * reshape(-1,1): Reshapes the resulting array into a column vector.
    * 1e-12: A small number is added to the denominator to prevent division by zero.
    """
    
    Ymean = (np.sum(Y*R, axis=1)/(np.sum(R, axis=1)+1e-12)).reshape(-1,1)
    Ynorm = Y - np.multiply(Ymean, R) 
    return(Ynorm, Ymean)

In [35]:
# Reshape my_ratings to a 2D column vector
my_ratings = np.array(my_ratings).reshape(-1, 1)

# Add new user ratings to Y 
Y = np.hstack((my_ratings, Y))

# Add new user indicator matrix to R
R = np.hstack(((my_ratings != 0).astype(int), R))

# Normalize the Dataset
Ynorm, Ymean = normalize_ratings(Y, R)

## Training Collaborative Filtering

In [44]:
# Re-initialize varialbes since we added new ratings
num_movies, num_users = Y.shape
num_features = 100

In [45]:
tf.random.set_seed(1234)

# Set Initial Parameters (W, X), use tf.Variable to track these variables
W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

In [46]:
iterations = 200
lambda_ = 1
for iter in range(iterations):
    with tf.GradientTape() as tape:

        # Compute the cost (forward pass included in cost)
        cost_value = collaborative_filtering_cost(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient( cost_value, [X,W,b] )

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients( zip(grads, [X,W,b]) )

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

Training loss at iteration 0: 5541103.4
Training loss at iteration 20: 279359.8
Training loss at iteration 40: 108046.2
Training loss at iteration 60: 53029.9
Training loss at iteration 80: 30319.3
Training loss at iteration 100: 19364.6
Training loss at iteration 120: 13491.1
Training loss at iteration 140: 10121.6
Training loss at iteration 160: 8099.7
Training loss at iteration 180: 6843.9


## Recommendations
Below, we compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as `my_ratings[]` above. To predict the rating of movie $i$ for user $j$, we compute $\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}$. This can be computed for all ratings using matrix multiplication.

In [57]:
# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

# Restore the mean
pm = p + Ymean

my_predictions = pm[:, 0]

# Sort predictions
sorted_indices = np.argsort(my_predictions)[::-1]  # [::-1] is for descending order

print('Top recommended movies:\n')
recommended_count = 0
for idx in sorted_indices:
    if idx not in my_rated and idx in movie_list_df.index:
        movie_title = movie_list_df.loc[idx, 'title']
        print(f'Predicting rating {my_predictions[idx]:.2f} for movie {movie_title}')
        recommended_count += 1
        if recommended_count == 17:
            break
            
print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movie_list[i]}')

Top recommended movies:

Predicting rating 5.50 for movie Body Double (1984)
Predicting rating 4.97 for movie Hatari! (1962)
Predicting rating 4.06 for movie Farewell My Concubine (Ba wang bie ji) (1993)
Predicting rating 4.01 for movie Caligula (1979)
Predicting rating 4.00 for movie Commitments, The (1991)
Predicting rating 3.99 for movie Chain of Fools (2000)
Predicting rating 3.52 for movie Last Castle, The (2001)
Predicting rating 3.01 for movie Onegin (1999)
Predicting rating 2.98 for movie Vanilla Sky (2001)


Original vs Predicted ratings:

Original [5.], Predicted 4.95 for Dinner at Eight (1933)
Original [5.], Predicted 4.90 for Silver Streak (1976)
Original [4.], Predicted 4.01 for Body Double (1984)
Original [4.], Predicted 3.91 for Waltz with Bashir (Vals im Bashir) (2008)
Original [1.], Predicted 1.33 for Lookout, The (2007)
Original [5.], Predicted 4.95 for The Conjuring 2 (2016)
Original [5.], Predicted 4.89 for Contraband (2012)
Original [5.], Predicted 4.81 for I Remem