![](../img/330-banner.png)

Lecture 16: Recommender Systems
-------------

UBC 2022-23 W2

Instructor: Amir Abdi

## Imports

In [None]:
import os
import random
import sys
import time

import numpy as np

sys.path.append("../code/.")
import matplotlib.pyplot as plt
from plotting_functions import *
from plotting_functions_unsup import *
from sklearn.decomposition import PCA
from sklearn.model_selection import cross_validate, train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

plt.rcParams["font.size"] = 16
import matplotlib.cm as cm

# plt.style.use("seaborn")
%matplotlib inline
pd.set_option("display.max_colwidth", 0)

## Learning outcomes <a name="lo"></a>

From this lecture, students are expected to be able to:

- State the problem of **recommender systems**. 
- Describe components of a **utility matrix**. 
- Create a utility matrix given ratings data. 
- Describe a common approach to **evaluate recommender systems**. 
- Implement some baseline approaches to complete the utility matrix. 
- Explain the idea of **collaborative filtering**. 
- Explain some serious consequences of recommendation systems. 

## Announcements

- HW6 was due last night
- HW7, due Mar 22, 11:59pm

<br><br>

## Recommender systems motivation

### What is a recommender system? 

- A recommender or a recommendation system **recommends** a particular product or service to users they are likely to consume. 

![](../img/recommendation_system.png)

<!-- <img src="img/recommendation_system.png" alt="" height="900" width="900">  -->


### Example: Recommender Systems
- A client goes to Amazon to buy products. 
- Amazon has some information about the client. They also have information about other clients buying similar products. 
- What should they recommend to the client, so that they buy more products? 
- There's no "right" answer (**no actual groudntruth label**). 
- The whole idea is to **understand user behavior** and **similarities across users** in order to recommend them products they are likely to consume. 

<img src="../img/utility_matrix.png" alt="" width="500"> 

### Why should we care about recommendation systems? 

- Almost everything we buy or consume today is in some way or the other influenced by recommendation systems. 
    - Music (Spotify), videos (YouTube), news, books and products (Amazon), movies (Netflix), jokes, restaurants, dating , friends (Facebook), professional connections (Linkedin)
- Recommendation systems are at the core of the success of many companies. 
    - Amazon
    - [Netflix](https://help.netflix.com/en/node/100639)


### What kind of data we need to build recommendation systems? 

- **User ratings data** (most common)
- **Features related to items or users** 
- Customer purchase history data

### Main approaches

- Collaborative filtering 
  - "Unsupervised" learning 
  - We only have labels $y_{ij}$ (rating of user $i$ for item $j$). 
  - We learn features.  
- Content-based recommenders 
    - Supervised learning
    - Extract features $x_i$ of users and/or items and building a model to predict rating $y_i$ given $x_i$. 
    - Apply **model.predict()** to predict for new users/items. 
- Hybrid 
    - Combining collaborative filtering with content-based filtering
    

### The Netflix prize

<!-- ![](../img/netflix.png) -->
<img src="../img/netflix.png" width="600">

[Source](https://netflixtechblog.com/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429)

### The Netflix prize

- 100M ratings from 0.5M users on 18k movies.
- Grand prize was **\$1M for first team to reduce squared error at least by 10%**.
- Winning entry (and most entries) used collaborative filtering:
    - Methods that only looks at ratings, not features of movies/users.
- A simple collaborative filtering method that does really well:
   - Now adopted by many companies.

<br><br><br><br>

## Recommender systems problem 

### Problem formulation

- Most often the data for recommender systems come in as **ratings** for a set of items from a set of users. 
- We have two entities: $N$ **users** and $M$ **items**. 
- **Users** are consumers. 
- **Items** are the products or services offered.  
    - E.g., movies (Netflix), books (Amazon), songs (spotify), people (tinder)  
    
<!-- ![](../img/utility_matrix.png) -->

<img src="../img/utility_matrix.png" alt="" height="900" width="900"> 


### Utility matrix 

- A **utility matrix** is the matrix that captures **interactions** between $N$ **users** and $M$ **items**. 
- The interaction may come in different forms: 
    - ratings, clicks, purchases

<!-- ![](../img/utility_matrix.png) -->

<!-- <img src="../img/utility_mat.png" alt="" height="900" width="900">  -->

### Utility matrix

- Below is a toy utility matrix. Here $N$ = 6 and $M$ = 5.  
- Each entry $y_{ij}$ ($i^{th}$ row and $j^{th}$ column) denotes the rating given by the user $i$ to item $j$. 
- We represent users in terms of items and items in terms of users. 

<!-- ![](../img/utility_matrix.png) -->

<!-- <img src="../img/utility_matrix.png" alt="" height="900" width="900">  -->

### Sparsity of utility matrix

- The utility matrix **can be very sparse** because **usually users only interact with a few items**. 
- For example: 
    - all Netflix users will have rated only a small percentage of content available on Netflix
    - all amazon clients will have rated only a small fraction of items among all items available on Amazon

### What do we predict? 
Given a utility matrix of $N$ users and $M$ items, **complete the utility matrix**. In other words, **predict missing values in the matrix**. 

<!-- ![](../img/utility_matrix.png) -->

<img src="../img/utility_matrix.png" alt="" height="900" width="900"> 

- Once we have predicted ratings, we can recommend items to users they are likely to rate higher. 

### Example dataset: [Jester 1.7M jokes ratings dataset](https://www.kaggle.com/vikashrajluhaniwal/jester-17m-jokes-ratings-dataset?select=jester_ratings.csv)

- We'll use a sample of [Jester 1.7M jokes ratings dataset](https://www.kaggle.com/vikashrajluhaniwal/jester-17m-jokes-ratings-dataset) to demonstrate different recommendation systems. 

The dataset comes with two CSVs
- A CSV containing ratings (-10.0 to +10.0) of 150 jokes from 59,132 users. 
- A CSV containing joke IDs and the actual text of jokes. 

> Some jokes might be offensive. Please do not look too much into the actual text data if you are sensitive to such language.

- Recommendation systems are most effective when you have a large amount of data.
- But we are only taking a sample here for speed.

In [None]:
filename = "../data/jester_ratings.csv"
ratings_full = pd.read_csv(filename)

print('total number of users: ', len(ratings_full.userId.unique()))

In [None]:
# limit to at most 4000 users
ratings = ratings_full[ratings_full["userId"] <= 4000]
print('total number of users: ', len(ratings.userId.unique()))

In [None]:
ratings.head()

In [None]:
user_key = "userId"
item_key = "jokeId"

### Dataset stats 

In [None]:
ratings.info()

In [None]:
def get_stats(ratings, item_key="jokeId", user_key="userId"):
    print("Number of ratings:", len(ratings))
    print("Average rating:  %0.3f" % (np.mean(ratings["rating"])))
    N = len(np.unique(ratings[user_key]))
    M = len(np.unique(ratings[item_key]))
    print("Number of users (N): %d" % N)
    print("Number of items (M): %d" % M)
    print("Fraction non-nan ratings: %0.3f" % (len(ratings) / (N * M)))
    return N, M

N, M = get_stats(ratings)

### Creating utility matrix

- Let's construct utility matrix with `number of users` rows and `number of items` columns from the ratings data. 

> Note we are constructing a non-sparse matrix for demonstration purpose here. In real life it's recommended that you work with sparse matrices. 

In [None]:
user_mapper = dict(zip(np.unique(ratings[user_key]), list(range(N))))
item_mapper = dict(zip(np.unique(ratings[item_key]), list(range(M))))
user_inverse_mapper = dict(zip(list(range(N)), np.unique(ratings[user_key])))
item_inverse_mapper = dict(zip(list(range(M)), np.unique(ratings[item_key])))

In [None]:
# Function to create a utility matrix
def create_Y_from_ratings(
    data, N, M, user_mapper, item_mapper, user_key="userId", item_key="jokeId"
):  
    Y = np.zeros((N, M))
    Y.fill(np.nan)
    for index, val in data.iterrows():
        n = user_mapper[val[user_key]]
        m = item_mapper[val[item_key]]
        Y[n, m] = val["rating"]

    return Y

### Utility matrix for the example problem
- Rows represent users.
- Columns represent items (jokes in our case).
- Each cell gives the rating given by the user to the corresponding joke. 
- Users are features for jokes and jokes are features for users.
- We want to predict the missing entries. 

In [None]:
Y_mat = create_Y_from_ratings(ratings, N, M, user_mapper, item_mapper)
Y_mat.shape

In [None]:
pd.DataFrame(Y_mat)

<br><br><br><br>

## Simple Solutions (Baselines)


- Recall that our goal is to predict missing entries in the utility matrix. 

### Evaluation

- We'll try a number of methods to do this. 
- Although there is no notion of "accurate" recommendations, we need a way to **evaluate** our predictions so that we'll be able to compare different methods.
- Although we are doing unsupervised learning, we'll split the data and evaluate our predictions as follows.  

### Data splitting 

- We split the ratings into train and validation sets. 
- **It's easier to split the ratings data instead of splitting the utility matrix.**

In [None]:
X = ratings.copy()
y = ratings[user_key]
X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.2, random_state=42
)

X_train.shape, X_valid.shape

Now we will create utility matrices for train and validation splits. 

In [None]:
train_mat = create_Y_from_ratings(X_train, N, M, user_mapper, item_mapper)
valid_mat = create_Y_from_ratings(X_valid, N, M, user_mapper, item_mapper)

In [None]:
train_mat.shape, valid_mat.shape

Notice that both matrices have the **same shape** (same number of users, same number of products)

- `train_mat` has only ratings from the train set and `valid_mat` has only ratings from the valid set.
- During training we assume that we do not have access to some of the available ratings. We predict these ratings and evaluate them against ratings in the validation set. 

### Questions for you

- How do train and validation utility matrices differ? 
- Why are utility matrices for train and validation sets are of the same shape?
<br><br>

**Answer:** ???

<!-- - The training matrix `train_mat` is of shape N by M but only has ratings from `X_train` and all other ratings missing.  -->
<!-- - The validation matrix `valid_mat` is also of shape N by M but it only has ratings `X_valid` and all other ratings missing.  -->
<!-- - They have the same shape because both have the same number of users and items; that's how we have constructed them.  -->

### Evaluation

- Now that we have train and validation sets, how do we evaluate our predictions?
- You can calculate the **error between actual ratings and predicted ratings** with metrics of your choice. 
    - Most common ones are MSE or RMSE. 

- The `error` function below calculates RMSE and `evaluate` function prints train and validation RMSE.  

In [None]:
def error(X1, X2):
    """
    Returns the root mean squared error.
    """
    return np.sqrt(np.nanmean((X1 - X2) ** 2))


def evaluate(pred_X, train_X, valid_X, model_name="Global average"):
    print("%s train RMSE: %0.2f" % (model_name, error(pred_X, train_X)))
    print("%s valid RMSE: %0.2f" % (model_name, error(pred_X, valid_X)))

### Baselines

Let's first try some simple approaches to predict missing entries. 

1. Global average baseline
2. [$k$-Nearest Neighbours imputation](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html)    
    

### Global average baseline

- Let's examine RMSE of the global average baseline. 
- In this baseline we predict everything as the global average rating

In [None]:
# Compute the arithmetic mean along the specified axis, ignoring NaNs and ignoring axis
avg = np.nanmean(train_mat)

# predict everything as average
pred_g = np.zeros(train_mat.shape) + avg

pd.DataFrame(pred_g).head()

In [None]:
evaluate(pred_g, train_mat, valid_mat, model_name="Global average")

<br><br><br><br>

We can improve a little and calculate Average per product:

In [None]:
avg = np.nanmean(train_mat, axis=0)
pred_g = np.zeros(train_mat.shape) + avg
pd.DataFrame(pred_g).head()

In [None]:
evaluate(pred_g, train_mat, valid_mat, model_name="Global average")

### [$k$-nearest neighbours imputation](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html)

- Can we try $k$-nearest neighbours type imputation? 
- Impute missing values using the mean value from **$k$ nearest neighbours found in the training set**. 
- Calculate distances between examples using features where neither value is missing. 

<!-- ![](../img/utility_matrix.png) -->

<img src="../img/utility_matrix.png" alt="" height="900" width="900"> 

KNNImputer:

> Each sample’s missing values are imputed using the mean value from n_neighbors nearest neighbors found in the training set.   
> **Two samples are close if the features that neither is missing are close.**

Note:
- This is similar to **"User-based Collaborative Filterin"** which we will shortly discuss

In [None]:
from sklearn.impute import KNNImputer

imputer = KNNImputer(n_neighbors=10)
train_mat_imp = imputer.fit_transform(train_mat)

In [None]:
pd.DataFrame(train_mat_imp)

In [None]:
evaluate(train_mat_imp, train_mat, valid_mat, model_name="KNN imputer")

<br><br>

Alternative Approaches:
- You can also use [nearest neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html) module of SkLearn
- We can look at nearest neighbours of a query items (here, jokes) instead of the users 
  - In this approach, columns are items, and users are features for jokes, and we'll have to find nearest neighbours of columns vectors (as opposed to the previous approach where we found nearest neighbors for row vectors) 
  - In other words, we apply KNN on the transpose of the original matrix
  - This is similar to **"Item-based Collaborative Filterin"** which we will shortly discuss

### (Optional) $k$-nearest neighbours on a query joke
- Let's transpose the matrix.  

In [None]:
item_user_mat = train_mat_imp.T

In [None]:
jokes_df = pd.read_csv("../data/jester_items.csv")
jokes_df.head()

In [None]:
id_joke_map = dict(zip(jokes_df.jokeId, jokes_df.jokeText))

In [None]:
from sklearn.neighbors import NearestNeighbors


def get_topk_recommendations(X, query_ind=0, metric="cosine", k=5):
    query_idx = item_inverse_mapper[query_ind]
    model = NearestNeighbors(n_neighbors=k, metric="cosine")
    model.fit(X)
    neigh_ind = model.kneighbors([X[query_ind]], k, return_distance=False).flatten()
    neigh_ind = np.delete(neigh_ind, np.where(query_ind == query_ind))
    recs = [id_joke_map[item_inverse_mapper[i]] for i in neigh_ind]
    print("Query joke: ", id_joke_map[query_idx])

    return pd.DataFrame(data=recs, columns=["top recommendations"])


get_topk_recommendations(item_user_mat, query_ind=8, metric="cosine", k=5)

**Question**
- Instead of imputation, what would be the consequences if we replace `nan` with zeros so that we can calculate distances between vectors? 

<br><br>
**Answer**???

<!-- It's not a good idea replace ratings with 0, because 0 can be an actual rating value in our case.  -->

### What to do with predictions? 

Once you have predictions, we can 
- **sort them based on ratings** and 
- **recommend items with highest ratings** to users.

<br><br><br><br>

## Break (5 min)

![](../img/eva-coffee.png)

## Collaborative filtering 

### Collaborative filtering 
- One of the most popular approach for recommendation systems. 
- Approach used by the winning entry (and most of the entries) in the Netflix competition. 
- An **unsupervised** approach
    - Only uses the user-item interactions given in the ratings matrix. 
- **Intuition**
    - We may have **similar users** and **similar items** which can help us predict missing entries. 
    - Leverage social information to provide recommendations. 

### Problem 

- Given a utility matrix with many missing entries, how can we predict missing ratings?  

$$
\begin{bmatrix} 
? & ? & \checkmark  & ? & \checkmark\\
\checkmark & ? & ?  & ? & ?\\
? & \checkmark & \checkmark  & ? & \checkmark\\
? & ? & ?  & ? & ?\\
? & ? & ? & \checkmark & ?\\
? & \checkmark & \checkmark  & ? & \checkmark
\end{bmatrix}
$$

> Note: rating prediction $\neq$ Classification or regression 

### Classification or regression

- We have $X$ and targets for some rows in $X$. 
- We want to predict the target column.  

$$
\begin{bmatrix} 
\checkmark & \checkmark & \checkmark  & \checkmark & \checkmark\\
\checkmark & \checkmark & \checkmark  & \checkmark & \checkmark\\
\checkmark & \checkmark & \checkmark  & \checkmark & \checkmark\\
\checkmark & \checkmark & \checkmark  & \checkmark & ?\\
\checkmark & \checkmark & \checkmark  & \checkmark & ?\\
\checkmark & \checkmark & \checkmark  & \checkmark & ?\\
\end{bmatrix}
$$

### Rating prediction 

- Ratings data has many missing values in the utility matrix. There is no special target column. We want to predict the missing entries in the matrix. 
- Since our goal is to **predict** ratings, usually the utility matrix is referred to as $Y$ matrix. 

$$
\begin{bmatrix} 
? & ? & \checkmark  & ? & \checkmark\\
\checkmark & ? & ?  & ? & ?\\
? & \checkmark & \checkmark  & ? & \checkmark\\
? & ? & ?  & ? & ?\\
? & ? & ? & \checkmark & ?\\
? & \checkmark & \checkmark  & ? & \checkmark
\end{bmatrix}
$$


### Types?

- **User-based Collaborative Filtering:** find similarities between users
- **Item-based Collaborative Filtering:** find similarities between items

<br><br>

- We don't have sufficient background to understand how collaborative filtering works under-the-hood.
- Let's look at an example to understand this at a high level. 

In [None]:
toy_ratings = pd.read_csv("../data/toy-movie-ratings.csv")
toy_ratings

In [None]:
N_toy = len(np.unique(toy_ratings["user_id"]))
M_toy = len(np.unique(toy_ratings["movie_id"]))
print("Number of users (N)                : %d" % N_toy)
print("Number of movies (M)               : %d" % M_toy)

In [None]:
user_mapper_toy = dict(zip(np.unique(toy_ratings["user_id"]), list(range(N_toy))))
item_mapper_toy = dict(zip(np.unique(toy_ratings["movie_id"]), list(range(M_toy))))
user_inverse_mapper_toy = dict(
    zip(list(range(N_toy)), np.unique(toy_ratings["user_id"]))
)
item_inverse_mapper_toy = dict(
    zip(list(range(M_toy)), np.unique(toy_ratings["movie_id"]))
)

In [None]:
Y_toy = create_Y_from_ratings(
    toy_ratings, N_toy, M_toy, user_mapper_toy, item_mapper_toy, user_key="user_id", item_key="movie_id"
)
utility_mat_toy = pd.DataFrame(
    Y_toy, columns=item_mapper_toy.keys(), index=user_mapper_toy.keys()
)
utility_mat_toy

- In this toy example, we see clear groups of movies and users.
    - For movies: Children movies and documentaries 
    - For users: Children movie lovers and documentary lovers  
- There are some unsupervised models which identify such latent features. 
- I'll show you how to use a package which implements this popular algorithm for collaborative filtering. 

### Rating prediction using the surprise package

- We'll be using a package called [Surprise](https://surprise.readthedocs.io/en/stable/index.html). 
  - https://github.com/NicolasHug/Surprise
- The collaborative filtering algorithm we use in this package is called `SVD` (Singular Value Decomposition). 

```
pip install scikit-surprise
```

In [None]:
!pip install scikit-surprise

<br><br><br><br>
An even more comprehensive library and reference for other libraries is the Recommenders by Microsoft:
- https://github.com/microsoft/recommenders    

Try it out.

<br><br><br><br>
**(Optional)**

**What is Singular Value Decomposition (SVD)?**

- It's a factorization approach for matrices. 
- It is one of the most general purposes linear algebra tools
- Used for dimensionality reduction (similar to PCA)
  - It reduces the data into its **main correlations**
- It is a data driven generalization of the Fast Fourier Transform (FFT)
  - It finds the tailored transformation that fits best for our data (as opposed to FFT which uses sine and cosines which may not be the best fit)


Best tutorial I've found out there on SVD is this 4-session mini-series (!):
- https://www.youtube.com/watch?v=gXbThCXjZFM
- https://www.youtube.com/watch?v=nbBvuuNVfco
- https://www.youtube.com/watch?v=xy3QyyhiuY4
- https://www.youtube.com/watch?v=WmDnaoY2Ivs

-------------

Collaborative filtering with SVD belongs to the family of **"Matrix Factorization-based Collaborative Filtering"**

Let's try it out on our Jester dataset utility matrix.  

In [None]:
import surprise
from surprise import SVD, Dataset, Reader, accuracy

In [None]:
reader = Reader()
data = Dataset.load_from_df(ratings, reader)  # Load the data
trainset, validset = surprise.model_selection.train_test_split(
    data, test_size=0.2, random_state=42
)

In [None]:
k = 5
svd_alg = SVD(n_factors=k, random_state=42)
svd_alg.fit(trainset)

svd_preds = svd_alg.test(validset)
accuracy.rmse(svd_preds, verbose=True)

- Improvement but not a big improvement over the global baseline (RMSE=5.77). 

Make predictions for a single user and a single item:

In [None]:
svd_alg.predict(uid=1825, iid=49)

<br><br><br>
Surprise comes with many other collaborative filtering and matrix factorization algorithms. 

For example,  `KNNBasic` (https://surprise.readthedocs.io/en/stable/knn_inspired.html#surprise.prediction_algorithms.knns.KNNBasic):

In [None]:
k = 10
from surprise import KNNBasic
algo = KNNBasic()
algo.fit(trainset)

knn_preds = algo.test(validset)
accuracy.rmse(knn_preds, verbose=True)

### Cross-validation for recommender systems

- We can also carry out cross-validation and grid search with this package. 
- Let's look at an example of cross-validation. 

In [None]:
from surprise.model_selection import cross_validate

pd.DataFrame(cross_validate(algo, data, measures=["RMSE", "MAE"], cv=5, verbose=True))

----------
**[Study on your own]**

#### Other built-in datasets shipped with the Surprise library

- Jester dataset is available as one of the built-in datasets in this package and you can load it as follows and run cross-validation as follows. 

In [None]:
data = Dataset.load_builtin("jester")

pd.DataFrame(cross_validate(algo, data, measures=["RMSE", "MAE"], cv=5, verbose=True))

**[Study on your own]**

----------

<br><br><br><br>

## Content-based filtering

### What is content-based filtering? 

Going back here: https://help.netflix.com/en/node/100639

- Supervised machine learning approach
- In collaborative filtering we assumed that we only have ratings data. 
- Usually there is some information on items and users available. 
- Examples
    - Netflix can describe movies as action, romance, comedy, documentaries. 
    - Amazon could describe books according to topics: math, languages, history. 
    - Tinder could describe people according to age, location, employment.
- Can we use this information to predict ratings in the utility matrix?   
    - Yes!

### Toy example: Movie recommendation

- Let's consider movie recommendation problem with the following toy data.

### Ratings data

In [None]:
toy_ratings = pd.read_csv("../data/toy_ratings.csv")
toy_ratings

In [None]:
N = len(np.unique(toy_ratings["user_id"]))
M = len(np.unique(toy_ratings["movie_id"]))
print("Number of users (N)                : %d" % N)
print("Number of movies (M)               : %d" % M)

In [None]:
user_key = "user_id"
item_key = "movie_id"

In [None]:
user_mapper = dict(zip(np.unique(toy_ratings[user_key]), list(range(N))))
item_mapper = dict(zip(np.unique(toy_ratings[item_key]), list(range(M))))
user_inverse_mapper = dict(zip(list(range(N)), np.unique(toy_ratings[user_key])))
item_inverse_mapper = dict(zip(list(range(M)), np.unique(toy_ratings[item_key])))

### Utility matrix

Let's create a dense utility matrix for our toy dataset. 

In [None]:
def create_Y_from_ratings(data, N, M):
    Y = np.zeros((N, M))
    Y.fill(np.nan)
    for index, val in data.iterrows():
        n = user_mapper[val[user_key]]
        m = item_mapper[val[item_key]]
        Y[n, m] = val["rating"]

    return Y

### Utility matrix

In [None]:
Y = create_Y_from_ratings(toy_ratings, N, M)
utility_mat = pd.DataFrame(Y, columns=item_mapper.keys(), index=user_mapper.keys())
utility_mat

In [None]:
avg = np.nanmean(Y)
avg

**Goal**: Predict missing entries in the utility matrix. 

In [None]:
import surprise
from surprise import SVD, Dataset, Reader, accuracy

Let's predict ratings with collaborative filtering.

In [None]:
reader = Reader()
data = Dataset.load_from_df(toy_ratings, reader)  # Load the data

trainset, validset = surprise.model_selection.train_test_split(
    data, test_size=0.01, random_state=42
)  # Split the data

In [None]:
k = 2
algo = SVD(n_factors=k, random_state=42)
algo.fit(trainset)
preds = algo.test(trainset.build_testset())

In [None]:
preds

### Movie features

- Suppose we also have movie features. 

In [None]:
movie_feats_df = pd.read_csv("../data/toy_movie_feats.csv", index_col=0)
movie_feats_df

In [None]:
Z = movie_feats_df.to_numpy()
Z.shape

- How can we use these features to predict missing ratings? 

### Overall idea

- Using the ratings data and movie features, we'll build **profiles for different users**. 
- Let's consider an example user **Pat**. 

### Pat's ratings

- We don't know anything about Pat but we know her ratings to movies. 

In [None]:
utility_mat.loc["Pat"]

- We also know about movies and their features. 
- If Pat gave a high rating to _Lion King_, it means that she liked the features of the movie. 

In [None]:
movie_feats_df.loc["Lion King"]

### Supervised approach to rating prediction 

- We treat ratings prediction problem as a set of regression problems. 
- Given movie information, **we create user profile for each user**.
- Build regression model for each user and learn regression weights for each user. 

- We build a **profile** for users based on 
    - the movies they have watched
    - their rating for the movies
    - the features of the movies


### Supervised approach to rating prediction 

For each user $i$ create a user profile as follows. 

- Consider all movies rated by $i$ and create `X` and `y` for the user: 
    - Each row in `X` contains the **movie features** of movie $j$ **rated by $i$**. 
    - Each value in `y` is the corresponding rating given to the movie $j$ **by user $i$**. 
- Fit a regression model using `X` and `y`. 
- Apply the model to predict ratings for new items! 

### Let's build user profiles 

- Build `X` and `y` for all users. 

In [None]:
from collections import defaultdict


def get_lr_data_per_user(ratings_df, d):
    lr_y = defaultdict(list)
    lr_X = defaultdict(list)
    lr_items = defaultdict(list)

    for index, val in ratings_df.iterrows():
        n = user_mapper[val[user_key]]
        m = item_mapper[val[item_key]]
        lr_X[n].append(Z[m])
        lr_y[n].append(val["rating"])
        lr_items[n].append(m)

    for n in lr_X:
        lr_X[n] = np.array(lr_X[n])
        lr_y[n] = np.array(lr_y[n])

    return lr_X, lr_y, lr_items

In [None]:
d = movie_feats_df.shape[1]
X_train_usr, y_train_usr, rated_items = get_lr_data_per_user(toy_ratings, d)

- What's the shape of each `X` and `y`?

In [None]:
X_train_usr

In [None]:
y_train_usr

### Examine user profiles 

- Let's examine some user profiles. 

In [None]:
def get_user_profile(user_name):
    X = X_train_usr[user_mapper[user_name]]
    y = y_train_usr[user_mapper[user_name]]
    items = rated_items[user_mapper[user_name]]
    movie_names = [item_inverse_mapper[item] for item in items]
    print("Profile for user: ", user_name)
    profile_df = pd.DataFrame(X, columns=movie_feats_df.columns, index=movie_names)
    profile_df["ratings"] = y
    return profile_df

### Pat's profile

In [None]:
get_user_profile("Pat")

- Pat seems to like Children's movies and movies with Comedy. 
- Seems like she's not so much into romantic movies.  


### Eva's profile

In [None]:
get_user_profile("Eva")

- Eva hasn't rated many movies. There are not many rows. 
- Eva seems to like documentaries and action movies. 
- Seems like she's not so much into romantic movies.  

### Regression models for users

We can train **a Regression model for ratings of each user**

In [None]:
from sklearn.linear_model import Ridge


def train_for_usr(user_name, model=Ridge()):
    X = X_train_usr[user_mapper[user_name]]
    y = y_train_usr[user_mapper[user_name]]
    model.fit(X, y)
    return model


def predict_for_usr(model, movie_names):
    feat_vecs = movie_feats_df.loc[movie_names].values
    preds = model.predict(feat_vecs)
    return preds

### Regression model for Pat 

- What are the regression weights learned for Pat? 

In [None]:
user_name = "Pat"
pat_model = train_for_usr(user_name)
col = "Coefficients for %s" % user_name
pd.DataFrame(pat_model.coef_, index=movie_feats_df.columns, columns=[col])

### Predictions for Pat
- How would Pat rate some movies she hasn't seen? 

In [None]:
movies_to_pred = ["Roman Holidays", "Malcolm x"]
pred_df = movie_feats_df.loc[movies_to_pred]
pred_df

In [None]:
user_name = "Pat"
preds = predict_for_usr(pat_model, movies_to_pred)
pred_df[user_name + "'s predicted ratings"] = preds
pred_df

### Regression model for Eva 

- What are the regression weights learned for Eva? 

In [None]:
user_name = "Eva"
eva_model = train_for_usr(user_name)
col = "Coefficients for %s" % user_name
pd.DataFrame(eva_model.coef_, index=movie_feats_df.columns, columns=[col])

### Predictions for Eva

- What are the predicted ratings for Eva for a list of movies?

In [None]:
user_name = "Eva"
preds = predict_for_usr(eva_model, movies_to_pred)
pred_df[user_name + "'s predicted ratings"] = preds
pred_df

### Completing the utility matrix with content-based filtering

Here is the original utility matrix.  

In [None]:
utility_mat

- Using predictions per user, we can fill in missing entries in the utility matrix. 

In [None]:
from sklearn.linear_model import Ridge

models = dict()
pred_lin_reg = np.zeros((N, M))

# features of each movie
Z = movie_feats_df.to_numpy()

# iterate over all users
for n in range(N):
    
    # train a model for each user
    models[n] = Ridge()
    models[n].fit(X_train_usr[n], y_train_usr[n])
    
    # predict 
    pred_lin_reg[n] = models[n].predict(Z)

In [None]:
pd.DataFrame(pred_lin_reg, columns=item_mapper.keys(), index=user_mapper.keys())

### More comments on content-based filtering

- The feature matrix for movies can contain different types of features.
    - Example: Plot of the movie (text features), actors (categorical features), year of the movie, budget and revenue of the movie (numerical features). 
    - You'll apply our usual preprocessing techniques to these features. 
- If you have enough data, you could also carry out hyperparameter tuning with cross-validation for each model.
- Finally, although we have been talking about linear models above, **you can use any regression model of your choice**. 

### Advantages of content-based filtering 

- We don't need many users to provide ratings for an item. 
- **Each user is modeled separately, so you might be able to capture personalization**. 
- Since you can obtain the features of the items, you can immediately recommend new items. 
    - This would not have been possible with collaborative filtering (**what to do in collaboration filtering cold-start when a new item just added?**)
- Recommendations are interpretable.
    - You can explain to the user why you are recommending an item because you have learned weights. 
    

### Disadvantages of content-based filtering 

- Feature acquisition and feature engineering
    - What features should we use to explain the difference in ratings? 
    - Obtaining those features for each item might be very expensive. 
- Less diversity: hardly recommend an item outside the user's profile. 
- **Cold start: When a new user shows up, you don't have any information about them.**

### Hybrid filtering

- Combining advantages of collaborative filtering and content-based filtering

<br><br><br><br>

## Final comments and summary <a name="1"></a>

### Formulating the problem of recommender systems 

- We are given ratings data. 
- We use this data to create **utility matrix** which encodes interactions between users and items. 
- The utility matrix has many missing entries. 
- We defined recommendation systems problem as **matrix completion problem**. 

### What did we cover? 

- There is a big world of recommendation systems out there. We talked about some basic traditional approaches to recommender systems. 
    - collaborative filtering 
    - content-based filtering 

If you want to know more advanced approaches to recommender systems, watch this 4-hour summer school tutorial by Xavier Amatriain, Research/Engineering Director @ Netflix.  

- [Part1](https://www.youtube.com/watch?v=bLhq63ygoU8)
- [Part2](https://www.youtube.com/watch?v=mRToFXlNBpQ)


### Evaluation 

- We split the data similar to supervised systems. 
- We evaluate recommendation systems using traditional regression metrics such as MSE or RMSE. 
- But **real evaluation of recommender system can be very tricky because there is no ground truth**. 
- We have been using RMSE due to the lack of a better measure.  
- What we actually want to measure is the **interest that our user has in the recommended items**. 

### Beyond error rate in recommendation systems 

- If a system gives the best RMSE it doesn't necessarily mean that it's going to give best recommendations. 
- In recommendation systems we do not have ground truth.
- Just training your model and evaluating it offline is not ideal. 
- Other aspects such as simplicity and **interpretation** are equally (if not more) important than best validation error. 
- Winning system of Netflix Challenge was never adopted.
    - **Big mess of ensembles was not really maintainable**  

### Other issues important in recommender systems

### Are these good recommendations? 

You are looking for water shoes and at the moment you are looking at [VIFUUR Water Sports Shoes](https://www.amazon.ca/VIFUUR-Barefoot-Quick-Dry-Blue-38-39/dp/B0753DL15Y), are these good recommendations? 

![](../img/reco-diversity.png)


Now suppose you've recently bought VIFUUR Water Sports Shoes and rated them highly. Are these good recommendations now? 
- Not really. Even though you really liked them you don't need them anymore. You want some non-Water Sports Shoes recommendations.
- **Diversity** is about how different are the recommendations. 
    - Another example: Even if you really really like Star Wars, you might want non-Star-Wars suggestions.    
- **We need a balance between Exploration and Exploitation**

<br><br><br><br><br>
**We need a balance between Exploration and Exploitation**  
What are they?

<br><br><br><br><br>

### Are these good recommendations? 

![](../img/freshness.png)

- Some of these books don't have many ratings but it might be a good idea to recommend "fresh" things. 
- **Freshness**: people tend to get more excited about new/surprising things.    

- But again you need a balance here. What would happen if you keep surprising the user all the time? 
- There might be **trust** issues. 
- Another aspect of trust is **explaining** your recommendation, i.e., telling the user why you made a recommendation. This gives the user an opportunity to understand why your recommendations could be interesting to them.   
  - **We are recommeding you THIS item because you liked ANOTHER item**
    


**Persistence**: how long should recommendations last?
- If you keep not clicking on a recommendation, should it remain a recommendation?

**Social recommendation**: what did your friends watch?
- Many recommenders	are	now	connected to social	networks.
- "Login using you Facebook	account".
- Often, people	like similar movies	to their friends.
- If we get a new user, then recommendations are based on friend's preferences. 

### Types of data 

- Explicit data: ratings, thumbs up, etc. 
- Implicit data: collected from the users' behaviour (e.g., mouse clicks, purchases, time spent doing something)
- Trust implicit data that costs something, like time or even money. 
    - this makes it harder to fraud

### Some thoughts on recommendation systems  
- Be mindful of the consequences of recommendation systems. 
    - Recommendation systems can have terrible consequences. 
- **All for-profit companies (small tech / big tech)**, which extensively use recommendation systems, are **profit-driven**
  - They make decisions to infinitely increase profit margins of the company
  - All actions of a company has roots in profit
  - Product design in a for-profit company, including the recommendation systems, are no exception; they are intended to maximize user attention/engagement
- There are tons of news and research articles on serious consequences of recommendation systems.  

### Some thoughts on recommendation systems  

- Some weird stories which got media attention.   
[How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did](https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/?sh=3171af136668)
- More serious consequences are in political contexts. 
    - [Facebook Admits It Was Used to Incite Violence in Myanmar](https://www.nytimes.com/2018/11/06/technology/myanmar-facebook.html)
    - [YouTube Extremism and the Long Tail](https://www.theatlantic.com/politics/archive/2018/03/youtube-extremism-and-the-long-tail/555350/)    

### My advice

- Ask hard and uncomfortable questions from **yourself (and from your employer where possible)** before implementing and deploying such systems.  



### Resources 

- [Collaborative filtering for recommendation systems in Python, by N. Hug](https://www.youtube.com/watch?v=z0dx-YckFko)
- [An interesting talk: The paradox of choice](https://www.ted.com/talks/barry_schwartz_the_paradox_of_choice)
- [How Netflix’s Recommendations System Works](https://help.netflix.com/en/node/100639)
- [Hands on Recommendation Systems with Python](https://learning.oreilly.com/library/view/hands-on-recommendation-systems/9781788993753/)

<br><br><br><br>