<img src="data/images/lecture-notebook-header.png" />

# Recommender Systems: Collaborative Filtering

Collaborative filtering recommender systems are a type of recommendation system that predicts a user's preferences or interests by collecting and analyzing information from multiple users. The fundamental idea behind collaborative filtering is that if two users have similar preferences or interests on some items, they are likely to have similar preferences on other items as well.

Collaborative filtering algorithms typically work by building a model or matrix of user-item interactions or ratings. This model is then used to make predictions or recommendations for items that a user has not yet interacted with. There are two main approaches to collaborative filtering:

* **User-based collaborative filtering:** In this approach, the system identifies users who have similar preferences to the target user and recommends items that those similar users have liked or rated highly. The system looks for users with similar patterns of item ratings and uses their preferences to predict recommendations for the target user.

* **Item-based collaborative filtering:** In this approach, the system identifies items that are similar to the ones the target user has liked or interacted with in the past. It then recommends items that are similar to those the user has shown interest in. This method is based on the assumption that if a user likes one item, they are likely to be interested in similar items.

Collaborative filtering has been widely used in recommendation systems, particularly in domains such as e-commerce, music, and movie recommendations. It can provide personalized recommendations based on the collective wisdom and behavior of a large user community. However, collaborative filtering approaches can face challenges such as the cold start problem (when there is limited or no information about a new user or item) and the sparsity problem (when the user-item interaction matrix is sparse, meaning there are few ratings or interactions available for most users and items).

## Setting up the Notebook

### Make all Required Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tqdm import tqdm

from sklearn.decomposition import NMF
from sklearn.metrics.pairwise import cosine_similarity

## Load & Prepare MovieLens Dataset (Small)

Throughout this notebook, we are using a small sample of [MovieLens](https://grouplens.org/datasets/movielens/) dataset. GroupLens Research has collected and made available rating data sets from the [MovieLens web site](https://movielens.org). Apart from the ratings, the dataset also comes with tags assigned to movies. The information about the movies include their id, title, and genre(s). Here are some key features of the MovieLens dataset:

* **Ratings:** The dataset includes ratings given by users to movies on a numerical scale, typically ranging from 1 to 5. Users provide their subjective ratings based on their personal preferences.

* **Movie Metadata:** MovieLens dataset also provides additional information about movies, such as genre, release year, and tags. This metadata can be used to build content-based recommendation models.

* **User Information:** The dataset contains demographic information about users, such as age, gender, occupation, and zip code. This information can be utilized to analyze user preferences and personalize recommendations.

* **Dataset Size:** The MovieLens dataset is available in different sizes. The smallest version, MovieLens 100K, contains 100,000 ratings from approximately 1,000 users on around 1,700 movies. Larger versions, such as MovieLens 1M, MovieLens 10M, and MovieLens 20M, contain respectively 1 million, 10 million, and 20 million ratings.

Content-based recommender systems assume that the items (here: movies) come with a set of features describing each item. In this notebook, to keep it simple, we limit ourselves to the genre of movies. We will see later why only using genres does not result in great recommendations. Including the tags would certainly improve this, but would also increase the complexity quite a bit.

### Load Movies

We first load all movies into a `pandas` DataFrame for further processing.

In [None]:
# Read file using pandas
df_movies = pd.read_csv('data/datasets/ml-latest-small/movies.csv', sep=',', engine='python')

df_movies.head()

As we can see that all the genres of a movie are represented as a single string with the genres separated by a pipe symbol. So simplify subsequent steps, let's convert the dataframe into a dictionary with the movie ids as keys and the information about the movies as values. Each value is a 2-tuple with the movie title and the genres (but as set and all genres being lowercase).

**Important:** We keep the genres only to add some information to the recommendations; the genres are NOT used to find the recommendations!

In [None]:
# Convert to dictionary {artist_id -> (movie_name, genres)}
movie_dict = { row[0]: (row[1], set(row[2].lower().split('|'))) for row in df_movies.values}

print(movie_dict[1])

Later on, when we deal with the User-Item matrix, users and movies are identified by the row and column index in that matrix. For example the movie with id 1000 might be represented by the 700th column in the rating matrix To map back to the movies and users, we have to create mappings that allows us to map between movie and users ids and their respective row/column indices.

In [None]:
movie_ids = df_movies['movieId'].unique()

movie_id2idx, movie_idx2id = {}, {}

for idx, movie_id in tqdm(enumerate(movie_ids), total=len(movie_ids)):
    movie_id2idx[movie_id] = idx
    movie_idx2id[idx] = movie_id

num_movies = len(movie_ids)

### Load Ratings

Now we can look at the ratings. Again, we first load the information into a `pandas` DataFrame.

In [None]:
df_ratings = pd.read_csv('data/datasets/ml-latest-small/ratings.csv', sep=',', engine='python')

df_ratings.head()

Again, we have to create the mapping between the ids of users and their respective indices in the User-Item matrix.

In [None]:
user_ids = df_ratings['userId'].unique()

user_id2idx, user_idx2id = {}, {}
 
for idx, user_id in tqdm(enumerate(user_ids), total=len(user_ids)):
    user_id2idx[user_id] = idx
    user_idx2id[idx] = user_id

num_users = user_ids.shape[0]

After preprocessing both movies and ratings, we can get a sense of the size of our dataset.

In [None]:
print('Number of users: {}'.format(num_users))
print('Number of movies: {}'.format(num_movies))

Of course, 610 users and 9,742 can hardly be considered a big dataset. But to get the basic ideas and concepts of content-based recommender systems, it's more than sufficient.

### Initialize User-Item Matrix

The core information of recommender systems is the user-item matrix M. For our use case, the items are the movies and the matrix elements represent the rating of a user about a movie.

In [None]:
R = np.zeros((num_users, num_movies))

print('The User-Item Matrix R has shape of {}'.format(R.shape))

Obviously, all elements in R are currently 0. Now we can go through ratings in df_ratings to fill. Note how we need to map the user and movie ids to valid matrix indices.

In [None]:
for index, row in tqdm(df_ratings.iterrows(), total=len(df_ratings)):
    user_id, movie_id, rating = row['userId'], row['movieId'], row['rating']

    # Convert movie and user ids to indices 
    user_idx = user_id2idx[user_id]
    movie_idx = movie_id2idx[movie_id]
    
    # Fill matrix at the right spot with the rating
    R[user_idx][movie_idx] = rating

Let's check how sparse matrix R is, i.e., what percentage of entries are non-zero.

In [None]:
num_nonzero = np.count_nonzero(R > 0)
percent_sparsity = num_nonzero / np.prod(R.shape) * 100

print('Number of non-zero entries in R: {} (sparsity: {:.3f}%)'.format(num_nonzero, percent_sparsity))

A sparsity level of around 1.7% is actually rather high. In real-world settings with many more users and movies, the sparsity is typically much lower than that. This obviously calls for more efficient data structures to store very sparse matrices, but that's beyond the scope of this notebook.

### Calculating Average User and Movie Ratings.

We saw in the lecture that we need to normalize the ratings w.r.t, to users average user ratings. Firstly, all ratings are positive (1-5). This means that there's no explicit notion of "dislike" and unrated movies (0) would be treated as rated worst. And secondly, different users might have different notions of what rating represents a good movie. For example, one user might rate a good movie with 3 or higher, while another user considers good movies only from 4 upwards.

The following code cell calculated the average rating for each user as well as the average rating of each movie. Note that we have to exclude unrated movies (rating=0) which would otherwise distort these averages. We can do this by "masking" the 0-ratings so they are not considered when computing the averages. The actual normalization of the ratings is done below as it differs between the User-User and the Item-Item approaches.

In [None]:
masked = np.ma.masked_equal(R, 0)
movie_mean_ratings = np.mean(masked, axis=0)
user_mean_ratings = np.mean(masked, axis=1).reshape(-1, 1)

-------------------------------------------------

## Memory-Based Collaborative Filtering

Memory-based collaborative filtering is a technique used in collaborative filtering recommender systems that relies on the similarity between users or items to make recommendations. It is called "memory-based" because it uses the entire dataset of user-item interactions or ratings to calculate similarities and make predictions.

In memory-based collaborative filtering, the system builds a user-item matrix that represents the historical interactions or ratings of users for different items. This matrix can be sparse, meaning that most entries are unknown or missing. The system then computes the similarity between users or items based on the available data. Common similarity metrics include cosine similarity and Pearson correlation coefficient.

User-based memory-based collaborative filtering predicts a user's preferences by finding other users who have similar tastes and interests. It identifies users with similar patterns of item ratings and uses their ratings to estimate the target user's preferences for unrated items. The system calculates the predicted rating for an item by taking a weighted average of the ratings given by similar users for that item.

Item-based memory-based collaborative filtering, on the other hand, focuses on the similarity between items. It identifies items that are similar to the ones the user has interacted with and recommends those similar items. To make predictions, the system calculates the predicted rating for an item by taking a weighted average of the user's ratings for similar items.

Memory-based collaborative filtering has several advantages. It is easy to understand and implement, and it can capture complex relationships and user preferences. However, it also has limitations. It can be computationally expensive and may not scale well to large datasets. Additionally, it suffers from the sparsity problem when there are few ratings or interactions available for most users and items.

### User-Based

The User-based approach considers two users as similar if the rated the same items (here: movies) similarly. The idea is to find users with the same or similar taste and user the ratings to estimate the rating for a movie a user has not yet rated.

#### Normalization of Ratings

For normalize the ratings by subtracting the average user rating. Note that we do this first for all matrix elements (incl. unknown ratings) and the reset the ratings of unrated movies to 0.

In [None]:
# Subtract row mean from ratings (NOTE: this is also applied to zero values!)
R_normalized = R - user_mean_ratings

# Set the zero fields back to zero
R_normalized[R == 0] = 0

#### Computing the User Similarities

The User-based approach defines a user profile as the vector of normalized ratings. Based on this, we can calculate the similarity between users. With this, 2 users are similar, if (a) they rated a similar set of movies and (b) they rated those movies similarly (indicating a similar taste).

With [`cosine_similarity()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html) provides a handy method to calculate all similarities for us:

In [None]:
user_similarities = cosine_similarity(R_normalized, R_normalized)

print(user_similarities.shape)

Naturally, the shape of this matrix is `(num_users, num_users)`. The following examples of matrix elements are purely for a very basic sanity check: the similarity between the same user should be 1, and the similarities should be symmetric:

In [None]:
print(user_similarities[0,0]) # Should be 1 (apart from floating point issues) 

print(user_similarities[0,1]) # These two should return the same value
print(user_similarities[1,0]) # due to the symmetry of the distance matrix

We now have prepared all the required data to calculate the estimates of a rating given a user `user_idx` and movie `movie_idx`. The method `calculate_rating_user()` does this by performing the following to main steps:

* First, find the most similar users to `user_idx` that have also rated `movie_idx`
* Secondly, calculate the rating as the weighted average over the ratings given by the most similar users. The weight is the similarity between the users (i.e., the more similar the higher the weight)

The method contains comments to provide further details.

In [None]:
def calculate_rating_user(user_idx, movie_idx, k=10):
    ##
    ## Step 1: find all neighbors = USERS most similar to user_idx that have also rated movie_idx
    ##

    ## Sort users based on similarity
    neighbors = np.argsort(user_similarities[user_idx])[::-1]

    ## Remove any neighbor who (a) hasn't rated the movie (b) has a similarity of 0 with user_idx
    neighbors = [ n_idx for i, n_idx in enumerate(neighbors) if R[n_idx][movie_idx] != 0 and user_similarities[user_idx][n_idx] > 0]

    ## Focus only on the top-k neighbors
    topk_neighbor_indices = neighbors[:k]


    ##
    ## Step 2 calculate the rating as the weighted avarage over the ratings given by the neighbors
    ##
    
    ## Get the similarity values between the user and each neighbor
    ## (this values are used as the weights; the more similar the neighbor, the more import his/her rating)
    neighbor_similarities = user_similarities[user_idx][topk_neighbor_indices]
    
    ## Get the ratings the neighbors have given the movie
    ## (recall that we consider only neighbors who have indeed rated the movie)
    neighbor_ratings = R[topk_neighbor_indices][:,movie_idx]

    ## Just a fallback to avoid corner cases (e.g., the movie hasn't been rated by anybody)
    if np.sum(neighbor_similarities) == 0:
        return 0.0

    ## Return the weighted average rating as the predicted rating
    return np.round(np.sum(neighbor_similarities * neighbor_ratings) / np.sum(neighbor_similarities), 2)
 
    
## Compute the rating for an example user/movie pair
print(calculate_rating_user(366, 6783))

### Item-Based

The Item-based approach considers two items as similar if they are equally rated by users. The idea is to find items that are similar to an unrated movie and use their rating to estimate the unknown ratings.


#### Normalization of Ratings

First we normalize the ratings by subtracting the average movie rating. Note that we do this first for all matrix elements (incl. unknown ratings) and the reset the ratings of unrated movies to 0.

In [None]:
# Subtract row mean from ratings (NOTE: this is also applied to zero values!)
R_normalized = R - movie_mean_ratings

# Set the zero fields back to zero
R_normalized[R == 0] = 0

#### Computing the Item Similarities

The Item-based approach defines an item profile as the vector of normalized ratings. Based on this, we can calculate the similarity between items. With this, 2 items are similar, if (a) they have been rated by a similar set of users and (b) they have been rated similarly.

With [`cosine_similarity()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html) provides a handy method to calculate all similarities for us. Note that -- compared to the User-User approach -- we have to use the transpose of R to get the right result.

In [None]:
movie_similarities = cosine_similarity(R_normalized.T, R_normalized.T)

print(movie_similarities.shape)

Naturally, the shape of this matrix is `(num_movies, num_movies)`. The following examples of matrix elements are purely for a very basic sanity check: the similarity between the same movies should be 1, and the similarities should be symmetric:

In [None]:
print(movie_similarities[0,0]) # Should be 1 (apart from floating point issues) 
print(movie_similarities[0,1]) # These two should return the same value
print(movie_similarities[1,0]) # due to the symmetry of the distance matrix

Similar to above, we can now define a method `calculate_rating_item()` calculate the estimate a rating given a user `user_idx` and movie `movie_idx` by performing the following to main steps:

* First, find the most similar movies to `movie_idx` that have been rated by `user_idx`
* Secondly, calculate the rating as the weighted average over the ratings of the most similar movies. The weight is the similarity between the movies (i.e., the more similar the higher the weight)

The method contains comments to provide further details.

In [None]:
def calculate_rating_item(user_idx, movie_idx, k=10):
    ##
    ## Step 1: Find all neighbors = MOVIES most similar to movie_idx that have been rated movie_idx
    ##
   
    ## Sort movies based on similarity
    neighbors = np.argsort(movie_similarities[movie_idx])[::-1]
    
    ## Remove any neighbor who (a) hasn't rated the movie (b) has a similarity of 0 with user_idx
    neighbors = [ n_idx for i, n_idx in enumerate(neighbors) if R[user_idx][n_idx] != 0 and movie_similarities[movie_idx][n_idx] > 0]

    ## Focus only on the top-k neighbors
    topk_neighbor_indices = neighbors[:k]

    
    ##
    ## Step 2 calculate the rating as the weighted avarage over the ratings given by the neighbors
    ##

    ## Get the similarity values between the user and each neighboring movies
    ## (this values are used as the weights; the more similar, the more import the rating)
    neighbor_similarities = movie_similarities[movie_idx][topk_neighbor_indices]
    
    ## Get the ratings  of the neighboring movies for the the movie
    ## (recall that we consider only neighboring movies who have indeed rated by the user)
    neighbor_ratings = R[:,topk_neighbor_indices][user_idx]
    
    ## Just a fallback to avoid corner cases (e.g., the user hasn't rated any movie yet)
    if np.sum(neighbor_similarities) == 0:
        return 0.0
    
    ## Return the weighted average rating as the predicted rating
    return np.round(np.sum(neighbor_similarities * neighbor_ratings) / np.sum(neighbor_similarities), 3)    
            
print(calculate_rating_item(4, 0, k=10))      

### Complete Rating Matrix

With the two methods `calculate_rating_item()` and `calculate_rating_item()` we have now two means to predict an unknown rating for every (user,movie)-pair. The code cell below performs this step; you only have to uncomment your method of choice.

On problem is that this will run quire a while. Firstly there are many unknown ratings as R is naturally very sparse. And secondly, the our implementations of both `calculate_rating_item()` and `calculate_rating_item()` are hardly optimize for performance as this is not the focus here. 

You best run this code only once -- or twice given that we have to mean to predict the ratings -- and later load use the saved results. As a rough estimate `user-user` will run for about 1h; `item-item` will run for about 8h on a decent machine. Of course, you can limit yourself on the `user-user` approach only.

In [None]:
mode = 'user'
#mode = 'item'

## Create a copy of the rating matrix R (just so we don't overwrite R)
R_predicted = np.copy(R)

## Get count of 0 entries in R (i.e., unknown ratings)
num_zero = np.count_nonzero(R == 0)

print('Number of ratings to predict: {}'.format(num_zero))

with tqdm(total=num_zero) as progress_bar:
    for user_idx in range(R_predicted.shape[0]):    
        for movie_idx in range(R_predicted.shape[1]):

            # Ignore known ratings
            if R_predicted[user_idx][movie_idx] > 0:
                continue

            # Calculate the predicted rating for the current user and movie (given their indices)
            if mode == 'user-user':
                R_predicted[user_idx][movie_idx] = calculate_rating_user(user_idx, movie_idx, k=25)   
            elif mode == 'item-item':
                R_predicted[user_idx][movie_idx] = calculate_rating_item(user_idx, movie_idx, k=25)

            # Update progress bar
            progress_bar.update(1)
            
            
# Save matrix (as the dataset doesn't change, computing the matrix once is enough)     
if mode == 'user':
    with open('data/datasets/ml-latest-small/cf-user-matrix-movielens-small.npy', 'wb') as f:
        np.save(f, R_predicted)
elif mode == 'item':
    with open('data/datasets/ml-latest-small/cf-item-matrix-movielens-small.npy', 'wb') as f:
        np.save(f, R_predicted)


### Making Recommendations

We first load the matrix containing all ratings incl. the estimates we computed using the code above.

In [None]:
with open('data/datasets/ml-latest-small/cf-user-matrix-movielens-small.npy', 'rb') as f:
#with open('data/datasets/ml-latest-small/cf-item-matrix-movielens-small.npy', 'rb') as f:
    R_predicted = np.load(f)

Since we now have a complete rating matrix, making recommendations is pretty straightforward. We define the method `get_recommendation_cf()` for this. The basic steps are explained in the method and should be rather easy to follow:

In [None]:
def get_recommendation_cf(R_predicted, user_id, topk=10, factor=2, remove_rated=True):
    
    ## Get the user index given the user id
    user_idx = user_id2idx[user_id]
    
    ## Start with ranking all movies based on their (estimated) rankings
    recommendations = np.argsort(R_predicted[user_idx].squeeze())
    
    ## Remove movies from recommendations the use has only rated
    if remove_rated == True:
        already_rated_movies = np.where(R[user_idx] != 0)[0]
        
        recommendations = np.delete(recommendations, already_rated_movies)
        
    ## Focus on only on the top topk*factor recommendations
    recommendations = recommendations[::-1][:topk*factor]
    
    ## Pick a random topk sample of topk*factor recommendations
    recommendations = np.random.choice(recommendations, size=topk, replace=False)
    
    ## Sort the recommendations w.r.t. the average movie rating from best to worst
    ## (not really needed)
    #recommendations = sorted(recommendations, key=lambda tup: tup[1], reverse=True) 
    
    ## Return the indices (sorted) for all recommended movies
    return np.array([ movie_idx2id[r] for r in recommendations ])
    

Let's look at an example by recommending movies to the user with `user_id=1`. Feel free to pick a different user.

In [None]:
user_id = 1

for rank, movie_id in enumerate(get_recommendation_cf(R_predicted, user_id)):
    
    ## Convert the user and movie id to their respective indices
    user_idx, movie_idx = user_id2idx[user_id], movie_id2idx[movie_id]
    
    ## Get the title and genres of the recommended movies
    title, genres = movie_dict[movie_id]
    
    ## Get the average rating of the movie
    avg_rating = movie_mean_ratings[movie_idx]
    
    ## Get the estimated rating for the movie be the user
    pred_rating = R_predicted[user_idx,movie_idx]
    
    ## Print the results nicely
    print('[Rank {} ({:.2f}/{:.2f})] {} {}'.format(rank+1, pred_rating, avg_rating, title, '/'.join(genres)))

As you can see, the genres between the recommendations can differ quite a lot. This is no surprise as we do not consider the genres while making the recommendations. Thus, a nice extension would be to organize the recommendations regarding the genre to make it easier for the user to browse them.

---

## Model-Based: Matrix Factorization

Model-based recommender systems are a type of recommendation system that employs machine learning algorithms to build predictive models based on the available user-item data. These models capture the underlying patterns, relationships, and preferences within the data to make recommendations for users. Unlike memory-based collaborative filtering that directly uses the raw user-item interactions or ratings, model-based approaches involve a training phase where the system learns from the data to create a model. This model can then be used to make predictions for new users and items.

Model-based recommender systems typically employ techniques such as matrix factorization, clustering, or classification algorithms to learn the latent features and relationships in the data. Some commonly used algorithms include:

* **Matrix Factorization:** Matrix factorization methods decompose the user-item interaction matrix into lower-dimensional matrices to capture latent features. Popular matrix factorization techniques include Singular Value Decomposition (SVD) and its variations like Probabilistic Matrix Factorization (PMF) and Non-Negative Matrix Factorization (NMF).

* **Bayesian Networks:** Bayesian networks model the relationships between variables and use probabilistic inference to make predictions. They can capture dependencies and correlations between items or users to provide personalized recommendations.

* **Neural Networks:** Deep learning techniques, such as neural networks, can be used to learn complex patterns and representations from the user-item data. They have the ability to model non-linear relationships and capture intricate user preferences.

Model-based recommender systems offer several advantages. They can handle sparsity better than memory-based approaches and can handle large-scale datasets more efficiently. Additionally, they can incorporate additional features, such as item attributes or user demographics, into the models to enhance recommendation accuracy. However, model-based approaches also have some challenges. They require a training phase that involves significant computational resources and time. Moreover, they may suffer from overfitting if the model is too complex or if the dataset is small or noisy.

In this notebook -- in line with the lecture -- we look at the approach of Matrix Factorization to make recommendations. More specifically, we consider [Non-Negative Matrix Factorization](https://en.wikipedia.org/wiki/Non-negative_matrix_factorization) (NMF or NNMF) since all of our ratings are positive. Non-negative matrix factorization (NMF) is often considered easier compared to general matrix factorization due to its unique properties and constraints. Here are a few reasons why NMF is comparatively easier:

* **Non-negativity constraint:** In NMF, both the user-item interaction matrix and the factor matrices are constrained to be non-negative. This non-negativity constraint allows for a more intuitive and interpretable factorization. It can be particularly useful in applications where the values represent non-negative quantities, such as ratings, counts, or probabilities. Non-negativity helps in capturing additive combinations of factors, which can often result in meaningful and sparse representations.

* **Simplicity of interpretation:** The non-negativity constraint in NMF provides a natural way to interpret the factor matrices. The factors can be seen as non-negative components or parts that contribute to the overall observation. For example, in a movie recommendation system, the factors can correspond to genre preferences, and the factor values indicate the importance or relevance of each genre for a particular user.

* **Reduced dimensionality:** NMF typically aims to factorize a high-dimensional matrix into a lower-dimensional representation. This dimensionality reduction property can be advantageous in terms of computational efficiency and memory usage. By representing the original matrix with a reduced number of factors, NMF can simplify the subsequent computations and improve scalability.

* **Sparsity and interpretability:** NMF tends to produce sparse factor matrices, where most entries are zero or close to zero. This sparsity arises naturally from the non-negativity constraint and can be beneficial in recommendation systems. Sparse factor matrices are easier to interpret and can help identify the most influential factors or features driving the recommendations.

* **Algorithms and optimization:** NMF has specialized algorithms that are tailored for non-negativity constraints, such as multiplicative updates and alternating least squares. These algorithms are designed to ensure non-negativity during the factorization process, which simplifies the optimization task compared to general matrix factorization methods.

Despite its advantages, it is important to note that NMF also has its limitations. For example, the non-negativity constraint may not be suitable for all types of data, and it may not capture complex relationships as effectively as more flexible factorization techniques. The choice between NMF and general matrix factorization depends on the specific characteristics of the data and the requirements of the recommendation problem at hand.

### NMF with Toy Dataset

To better see the results of Matrix Factorization by actually printing and comparing the matrices, we create a very small raings matrix $R$ containing just 4 users and 7 items (e.g.: movies). The matrix elements are values from 1, 2, ..., 5 representing the users' ratings of a movie. As usual, 0 means that a user has not rated that movie.

#### Create Toy Data

In [None]:
R_toy = np.array([
    [4, 0, 0, 5, 1, 0, 0],
    [5, 5, 4, 0, 0, 0, 0],
    [0, 0, 0, 2, 4, 5, 0],
    [0, 3, 0, 0, 0, 0, 3]
], dtype=float)

print(R_toy)

#### Perform NMF

Recall that we want to find 2 matrices $W$ and $H$ such that

$$R = W\ast H$$

where $W$ is of shape `(num_users, k)` containing the latent representation of the users, and $H$ is of shape `(k, num_movies)` containing all the latent representations of the movies. $k$ specifies the size of the latent representations. Both $W$ and $H$ are initialized randomly, and then later refined using Gradient Descent or similar methods.

The NMF implementation in `scikit-learn` follows the standard NMF formulation, where a given matrix is decomposed into two non-negative matrices: the basis matrix (also known as the $W$ matrix) and the coefficient matrix (also known as the $H$ matrix). The goal is to approximate the input matrix by the product of these two matrices, where the elements are non-negative. `scikit-learn`'s NMF implementation supports different optimization algorithms, such as alternating least squares (ALS) and multiplicative updates -- which generally works much better than basic Gradient Descent. You can specify the algorithm to use by setting the `solver` parameter when creating the NMF instance.

The code cell below uses `scikit-learn` to perform NMF with latent representations of size `k=100`.

In [None]:
k = 100

nmf = NMF(n_components=k, init='random', random_state=0)

W = nmf.fit_transform(R_toy)
H = nmf.components_

Let's have look as the how the model performs by checking the dot product of W and H which predicts the rating matrix $R$.

In [None]:
R_toy_predicted = np.around(np.dot(W, H), decimals=2)

print(R_toy_predicted)

As you can see, for this simple toy data, the predicted ratings are almost identical to the true ratings.

### NMF with MovieLens Dataset

Now we can perform NMF to built a recommender system for our MovieLens dataset. In the code cell below, we again use `scikit-leans`'s implementation of NMF to compute the matrices $W$ and $H$. With the given parameters, you should see good results, but feel free to change them and observe their effects on the results.

**Side note:** Compared to the toy dataset, the code cell below will take some seconds/minutes to run. To see some feedback, we set `verbose=1` to print the *violation* after each epoch. Note that the violation is NOT the objective function. It is the sum of the absolute value of the projected gradient, and it is used only as a stopping criterion. The objective function is costly to compute, so it is not computed at each iteration.

In [None]:
%%time

k = 500

nmf = NMF(n_components=k, init='random', random_state=0, max_iter=20, verbose=1)

W = nmf.fit_transform(R)
H = nmf.components_

With $W$ and $H$ being computed, we can compute the matrix with all predicted ratings again.

In [None]:
R_predicted = np.around(np.dot(W, H), decimals=2)

with open('data/datasets/ml-latest-small/cf-nmf-movielens-small.npy', 'wb') as f:
    np.save(f, R_predicted)

Since the rating matrices are too large, let's just look and compare a couple of entries.

In [None]:
print(R[0][:20])
print(R_predicted[0][:20])

Like for the toy dataset, the true ratings and predicted ratings should be again very similar (assuming the default values for the input parameters)


### Making Recommendations

We first load the matrix containing all ratings incl. the estimates we computed using the code above.

In [None]:
with open('data/datasets/ml-latest-small/cf-nmf-movielens-small.npy', 'rb') as f:
    R_predicted = np.load(f)

With the matrix of predicted ratings, we can now make movie recommendations the same as we did above.

In [None]:
user_id = 1

for rank, movie_id in enumerate(get_recommendation_cf(R_predicted, user_id)):
    
    ## Convert the user and movie id to their respective indices
    user_idx, movie_idx = user_id2idx[user_id], movie_id2idx[movie_id]
    
    ## Get the title and genres of the recommended movies
    title, genres = movie_dict[movie_id]
    
    ## Get the average rating of the movie
    avg_rating = movie_mean_ratings[movie_idx]
    
    ## Get the estimated rating for the movie be the user
    pred_rating = R_predicted[user_idx,movie_idx]
    
    ## Print the results nicely
    print('[Rank {} ({:.2f}/{:.2f})] {} {}'.format(rank+1, pred_rating, avg_rating, title, '/'.join(genres)))

In this example, we show the estimates ratings just to show that they don't have to be with the range of 1-5.

---

## Summary

A recommender system based on collaborative filtering is a type of information filtering system that predicts users' preferences and interests by collecting and analyzing their behavior and feedback, as well as the behavior and feedback of similar users. Collaborative filtering techniques leverage the collective intelligence of a community to provide personalized recommendations to individuals. There are two main approaches to collaborative filtering: memory-based recommender systems and model-based recommender systems.

Memory-based recommender systems use the actual data from the user-item interactions to make recommendations. These systems typically employ two types of collaborative filtering: user-based and item-based. User-based collaborative filtering recommends items to a user based on the interests and preferences of similar users. It finds users who have similar patterns of item ratings or purchases and suggests items that those similar users have liked. Item-based collaborative filtering, on the other hand, recommends items to a user based on the similarities between the items themselves. It identifies items that are similar to the ones the user has rated positively and suggests those similar items.

Model-based recommender systems, also known as algorithmic or latent factor models, build a mathematical model from the user-item interactions to generate recommendations. These systems use techniques like matrix factorization or singular value decomposition to identify latent factors or hidden features that capture the underlying patterns and correlations in the data. By learning these latent factors, model-based recommender systems can make predictions and provide recommendations based on the relationships between users and items. These models are trained on large datasets and can handle sparse and noisy data, making them more scalable and robust compared to memory-based approaches.

Both memory-based and model-based recommender systems have their advantages and limitations. Memory-based methods are simple and intuitive, easy to implement, and can handle new users and items without retraining the model. However, they suffer from scalability issues as the dataset grows, and they struggle with the sparsity of data. Model-based approaches, on the other hand, offer better scalability and can provide more accurate recommendations, especially for sparse data. However, they require a training phase and are more complex to implement and maintain.

In summary, collaborative filtering-based recommender systems leverage user behavior and feedback to make personalized recommendations. Memory-based systems directly analyze user-item interactions, while model-based systems use mathematical models to capture underlying patterns and make predictions. Both approaches have their strengths and weaknesses, and the choice between them depends on factors such as dataset size, sparsity, scalability requirements, and implementation complexity.