# BLU11 - Learning Notebook - Part 3 of 3 - Collaborative Filtering

In [1]:
import os

import numpy as np
np.seterr(divide='ignore', invalid='ignore')

from scipy.sparse import coo_matrix, csr_matrix
from sklearn.metrics.pairwise import cosine_similarity

# 1 Collaborative-Filtering RS

Collaborative-filtering (RS, henceforth) is another type of memory-based RS.

The central premise of this type of recommenders is that if one has enough Ratings, then no other information is necessary.

So, to be clear, this is what you need. (A lot of it, though.)

![collaborative_filtering](../media/recommender_systems_framework_collaborative_filtering.png)

This idea couldn't be more different from most approaches we've studied, where exploring and extracting features was front and center.

In CF, a typical pipeline goes as follows, and we can think of it as filling in the blank spaces in our Ratings.

![collaborative_filtering](../media/collaborative_filtering.png)

We start by computing similarities between users or items; then we select the best neighbors.

Then, we predict the user preference as a weighted average of the neighbors and filter the best results.

## 1.1 Ratings

As we've seen, Ratings are at the core of CF. 

As usual, the first steps is to build our User-Item matrix $R$, containing the available Ratings.

In [2]:
def read_ratings():
    path = os.path.join('..', 'data', 'ml-latest-small', 'ratings.csv')
    
    users = np.genfromtxt(path, dtype='int', skip_header=True, usecols=[0], delimiter=',')
    movies = np.genfromtxt(path, dtype='int', skip_header=True, usecols=[1], delimiter=',')
    ratings = np.genfromtxt(path, skip_header=True, usecols=[2], delimiter=',')
    
    return users, movies, ratings


users, movies, ratings = read_ratings()

We reuse the funtions from the last notebook to do it.

In [3]:
def make_ratings(users, movies, ratings):
    
    cols = movies - 1
    rows = users - 1
    
    nrows = rows.max() + 1
    ncols = cols.max() + 1
    shape = (nrows, ncols)
    
    data = ratings
    
    coo = coo_matrix((data, (rows, cols)), shape=shape)
    
    return coo.tocsr()

    
R = make_ratings(users, movies, ratings)

# 2 Compute Similarities

As in content-based filtering, the default similarity measure is the cosine distance.

What is different are the vectors we want to compare. We want the distance between User or Item vectors,  or $r_j$ respectively.

There are two main types of CF, user-user and item-item filtering.

In user-user CF, we compute how similar the users, i.e., the row-vectors $r^T$ of the ratings matrix $R \in \mathbb{R}^{\space m \space \times \space n}$, are to each other.

Item-item approaches care about the similarity across all pairs of items, i.e., the columns-vectors $r$ in the column matrix.

## 2.1 User-User CF

To best understand this type of RS we use the tagline "users who are similar to you also liked".

We compute similarities across all possible pairs of rows $(u, v) \in U \times U$. 

$$\begin{bmatrix}1 & cosine(u_0, u_1) & ... & cosine(u_0, u_m) \\ cosine(u_1, u_0) & 1 & ... & cosine(u_1, u_m) \\ ...  & ... & ... & ...\\ cosine(u_m, u_0) & cosine(u_m, u_1) & ... & 1\end{bmatrix}$$

Regardless of the similarity metric used, the result is a matrix in $\mathbb{R}^{\space m \space \times \space m}$. 

Again, we use the cosine from `sklearn`.

In [4]:
def make_user_similarities(R):
    return cosine_similarity(R, dense_output=False)


user_similarities = make_user_similarities(R)

## 2.2 Item-Item CF

Alternatively, the tagline for item-item CF is "users who liked this item also liked".

We similarities across columns $(i, j) \in I \times I$. 

$$\begin{bmatrix}1 & cosine(i_0, i_1) & ... & cosine(i_0, i_n) \\ cosine(i_1, i_0) & 1 & ... & cosine(i_1, i_n) \\ ...  & ... & ... & ...\\ cosine(i_n, i_0) & cosine(i_n, i_1) & ... & 1\end{bmatrix}$$

The result is a matrix in $\mathbb{R}^{\space n \space \times \space n}$. (A rather big one.)

In [5]:
def make_item_similarities(R):
    return cosine_similarity(R.T, dense_output=False)


item_similarities = make_item_similarities(R)

# 3 Prediction

The prediction step, again, is purely algebra.

We compute the predicted ratings as a weighted average of the ratings of other users, or items.

## 3.1 User-user CF

Given the subset of users $U_i$ that rated the item $i \in I$, with cardinality $|U_i| = K$, and $v \in U_i$:

$$\hat{r}_{u, i} = \frac{\sum\limits_{k=1}^{K} sim(u, v_k)r_{v_k, i}}{\sum\limits_{k=1}^{K} |sim(u, v_k)|}$$

The predicted rating $\hat{r}_{u, i}$ is the average of all ratings by other users that rated the item, weighted by user similarity.

We can implement the weighted sum as a dot-product between the similarities $\in \mathbb{R}^{\space m \space \times \space m}$ and ratings $\in \mathbb{R}^{\space m \space \times \space n}$.

Then, we use broadcasting to normalize the results using the sum of weights.

In [6]:
def make_user_predictions(similarities, R):
    weighted_sum = np.dot(similarities, R)
    sum_of_weights = np.abs(similarities).sum(axis=1)
    
    return np.array(weighted_sum / sum_of_weights)

 
L_user = make_user_predictions(user_similarities, R)

## 3.2 Item-item CF

Given $I_u$, the set of items rated by the user $u \in U$, with cardinality $|I_u| = W$, and $j \in I_u$.

$$\hat{r}_{u, i} = \frac{\sum\limits_{w=1}^{W} sim(i, j_k)r_{u, j_k}}{\sum\limits_{k=1}^{K} |sim(i, j_k)|}$$

The predicted rating $\hat{r}_{u, i}$ is the average of all ratings by the same user to other items, weighted by item similarity.

Again, we can implement the weighted sum as a dot-product between the ratings $\in \mathbb{R}^{\space m \space \times \space n}$ and the similarities $\in \mathbb{R}^{\space n \space \times \space n}$.

In [7]:
def make_item_predictions(similarities, R):
    weighted_sum = np.dot(R, similarities)
    sum_of_weights = np.abs(similarities).sum(axis=0)
    
    pred = np.array(weighted_sum / sum_of_weights)
    pred[np.isnan(pred)] = 0
    
    return pred

 
L_item = make_item_predictions(item_similarities, R)

We replace missing values, resulting from the division by zero, with zero, to avoid problems downstream.

And, just like that, we were able to perform predictions using just ratings and linear algebra. Now, onto the usual filtering.

# 4 Filtering

## 4.1 Removing Rated Items

We remove previously rated items, like we did in the previous notebook.

In [8]:
def mask_rated_items(R, L):
    L = L.copy()
    L[R.nonzero()] = -1
    return L


L_user_ = mask_rated_items(L_user)
L_item_ = mask_rated_items(L_item)

## 4.2 Best-item

We use `argmax` to get the best item, please note that we are not working with sparse matrices for a while now.

In [9]:
def get_best_item(L):
    return L.argmax(axis=1) + 1


best_item_user = get_best_item(L_user_)
best_item_item = get_best_item(L_item_)

## 4.3 Top-N

Moreover, as usual, we use `argsort` on the array to retrieve the top-*N* list.

In [10]:
def get_top_n(L, n):
    return np.negative(L).argsort()[:, :n] + 1


top_5_user = get_top_n(L_user_, 5)
top_5_item = get_top_n(L_item_, 5)