-----------
# Outline of Notebook
- ### Recommender Systems
- ### Collaborative Filtering
- ### Recommender System Implementation Detail
- ### Content Based Filtering Algorithm
- ### Recommending a Large Catalogue
-----------

# Recommender Systems

## Collaborative Filtering

Let's say we have users that rated some movies and we also have some features on the movies:
![](2022-07-29-15-27-58.png)
- Here, $x_1$ = how much romance in the movie and $x_2$ = how much action in the movie
- Ex. Love at last is a very romantic movie as per the table

$n_u$ = # of users = 4

$n_m$ = # of movies / # of items = 5

$n$ = # of features = 2

$x^{(1)} = [0.9, 0]$

For user 1: Predict rating for movie $i$ as: $w^{(1)} \cdot x^{(i)} + b^{(1)}$ (like linear regression)
- Ex. Let's say $w^{(1)} = [5, 0]$ and $b^{(1)} = 0$ and $x^{(3)} = [0.99, 0]$
- Prediction rating of Alice for third movie: $w^{(1)} \cdot x^{(3)} + b^{(1)} = 4.95$
- This rating seems plausible because Alice has given good ratings to 2 highly romantic movies

<u>For user $j$:</u> Predict rating for movie $i$ as: $w^{(j)} \cdot x^{(i)} + b^{(j)}$

### Learning the Parameters

Notation:
- $r(i, j) = 1$ if user $j$ has rated movie $i$ (0 otherwise)
- $y^{(i, j)}$ = rating given by user $j$ on movie $i$ (if defined)
- $w^{(j)}, b^{(j)}$ = parameters for user $j$
- $x^{(i)}$ = feature vector for movie $i$
- $m^{(j)}$ = # of movies rated by user $j$

For user $j$ and movie $i$, predict rating: $w^{(j)} \cdot x^{(i)} + b^{(j)}$

To learn $w^{(j)}$ and $b^{(j)}$ (very much like linear regression): 
$$J(w^{(j)}, b^{(j)}) = \frac{1}{2}\sum_{i:r(i, j) = 1} (w^{(j)} \cdot x^{(i)} + b^{(j)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum_{k = 1}^{n} (w_k^{(j)})^2$$
- The notation at the bottom of the sum means that we are only going to loop over the values of i which are available/have a rating

To learn parameters $w^{(1)}, b^{(1)}, w^{(1)}, b^{(1)}, \ldots, w^{(n_u)}, b^{(n_u)}$ for all users: 
$$J(w^{(1)}, b^{(1)}, \ldots, w^{(n_u)}, b^{(n_u)}) = \sum_{j = 1}^{n_u} (J(w^{(j)}), b^{(j)})$$

### What if you don't have features?

![](2022-07-29-16-08-56.png)

Let's assume that we already have the parameters we need:
- $w^{(1)} = [5, 0]$, $b^{(1)} = 0$
- $w^{(2)} = [5, 0]$, $b^{(2)} = 0$
- $w^{(3)} = [0, 5]$, $b^{(3)} = 0$
- $w^{(4)} = [0, 5]$, $b^{(4)} = 0$

Using $w^{(j)} \cdot x^{(i)} + b^{(j)}$:
- $w^{(1)} \cdot x^{(1)} \approx 5$
- $w^{(2)} \cdot x^{(1)} \approx 5$
- $w^{(3)} \cdot x^{(1)} \approx 0$
- $w^{(4)} \cdot x^{(1)} \approx 0$
- Therefore, $x^{(1)}$ must be equal to $[1, 0]$
- By doing this, we can figure out the features of each movie

### Learning Features

Given $w^{(1)}, b^{(1)}, w^{(1)}, b^{(1)}, \ldots, w^{(n_u)}, b^{(n_u)}$, to learn $x^{(i)}$:
$$J(x^{(i)}) = \frac{1}{2}\sum_{i:r(i, j) = 1} (w^{(j)} \cdot x^{(i)} + b^{(j)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum_{k = 1}^{n} (x_k^{(i)})^2$$

To learn $x^{(1)}, x^{(2)}, \ldots, x^{(n_m)}$:
$$J(x^{(1)}, \ldots, x^{(n_m)}) = \sum_{i = 1}^{n_m} (J(x^{(i)}))$$

### Collaborative Filtering Algorithm (Merging the Learning of the Features and the Parameters)

<u>Final Cost Function</u> $$J(w, b, x) = \frac{1}{2}\sum_{i:r(i, j) = 1} (w^{(j)} \cdot x^{(i)} + b^{(j)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum_{j = 1}^{n_u}(\sum_{k = 1}^{n} (w_k^{(j)})^2) + \frac{\lambda}{2}\sum_{i = 1}^{n_m}(\sum_{k = 1}^{n} (x_k^{(i)})^2)$$

<u>Gradient Descent</u>

repeat {

$\quad w^{(j)}_i = w^{(j)}_i - \alpha\frac{\partial}{\partial w^{(j)}_i}J(w, b, x)$

$\quad b^{(j)} = b^{(j)} - \alpha\frac{\partial}{\partial b^{(j)}}J(w, b, x)$

$\quad x^{(i)}_k = x^{(i)}_k - \alpha\frac{\partial}{\partial x^{(i)}_k}J(w, b, x)$

}

### Binary Labels (Favs, Likes, Clicks) -- Collaborative Filtering

Applications:
- Did user $j$ purchase an item after being shown?
- Did user $j$ fav/like an item?
- Did user $j$ spend at least 30 sec with an item?
- Did user $j$ click on an item?

Meaning of ratings:
- 1 - engaged after being shown item
- 0 - did not engage after being shown item
- ? - item not yet shown

![](2022-07-29-16-44-28.png)

Previously:
 
$\quad$ Predict $y^{(i, j)}$ as $w^{(j)} \cdot x^{(i)} + b^{(j)}$ (like linear regression)

For binary labels:

$\quad$ Predict that the probability of $y^{(i, j)} = 1$ is given by: 

$\quad\quad g(w^{(j)} \cdot x^{(i)} + b^{(j)})$ where $g(z) = \frac{1}{1 + e^{-z}}$ (like logistic regression)

Previous Cost Function: $$J(w, b, x) = \frac{1}{2}\sum_{i:r(i, j) = 1} (w^{(j)} \cdot x^{(i)} + b^{(j)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum_{j = 1}^{n_u}(\sum_{k = 1}^{n} (w_k^{(j)})^2) + \frac{\lambda}{2}\sum_{i = 1}^{n_m}(\sum_{k = 1}^{n} (x_k^{(i)})^2)$$

Loss for binary labels $y^{(i, j)}$: $$J(w, b, x) = \sum_{i:r(i, j) = 1} (L(f_{w, b, x}(x), y^{(i, j)})) \text{ where } L(f_{w, b, x}(x), y^{(i, j)}) = \text{logistic regression loss function}$$

## Recommender Systems Implementation Detail

<u>Mean Normalization</u>
- Take average rating of each movie (row)
- Subtract the mean from each value in row for each row
- Then, use the algorithm to predict the movie ratings
- So the movie ratings aren't negative, add back the corresponding mean value to the rating
- This allows the algorithm to run faster and also lets the algorithm do better when a user has a very small number of ratings given

<u>How does Collaborative Filtering find related items?</u>
- Let's say you have an item $i$ with learned features $x^{(i)}$
- To find other items related to it, find item $k$ with $x^{(k)}$ similar to $x^{(i)}$
- ie. With smallest distance: $\sum_{l = 1}^{n} (x_l^{(k)} - x_l^{(i)})^2 = ||x_l^{(k)} - x_l^{(i)}||^2$

<u>Limitations of Collaborative Filtering</u>
- Doesn't solve Cold start problem well: How to
    - rank new items that few users have rated?
    - show someting reasonable to new users who have rated few items
- Does't allow for the usage of side information about items or users:
    - Item: Genre, movie stars, studio, etc.
    - User: Demographics, expressed preferences, etc.

## Content Based Filtering Algorithm

Collaborative Filtering:
- Recommend items to you based on rating of users who game similar ratings as you

Content-based Filtering:
- Recommend items to you based on features of user and item to find good match

What we want to do is: Predict rating of user $j$ on movie $i$ as:
- $v_u \cdot v_m$
    - Where $v_u$ is a vector computed from the user's features ($x_u$)
    - Where $v_m$ is a vector computed from the movie's features ($x_m$)
    - NOTE: We are doing this so we can condense the user's and movie's features to the same size as well as to get the important info so we can take the dot product to find the rating the user would give a certain movie

### How to find $v_u$ and $v_m$

Using 2 Neural Networks to convert $x_u$ to $v_u$ and $x_m$ to $v_m$

![](2022-07-29-17-50-56.png)

NOTE: The output layer of both neural networks needs to have the same number of neurons so can take the dot product of $v_u$ and $v_m$

### How to apply Content-Based Filtering to Binary Labels?

Instead of having the prediction as: $v_u \cdot v_m$

Prediction should be: $g(v_u \cdot v_m)$ where $g(z) = \frac{1}{1 + e^{-z}}$

$g(v_u \cdot v_m)$ = the probability that $y^{(i, j)}$ is 1

### Training both the User and Movie Networks

<u>Cost Function Training Both User and Movie Networks</u> 
$$J = \sum_{(i, j):r(i, j) = 1} (v_u \cdot v_m - y^{(i, y)})^2 + \text{NN Regularization Term}$$

# Recommending from a Large Catalogue

If a movie streaming site has thousands of movies, running neural networks for each movie and user that comes onto that site is very computationally expensive and infeasible.

<u>Two Steps: Retrieval & Ranking</u>

Retrieval:
- Generate large list of plausible item candidates
    - Add for each of the last 10 movies watched by the user, the 10 most similar movies into list --> $||v_m^{(k)} - v_m^{(i)}||^2$
    - Add for most viewed 3 genres, the top 10 movies in list
    - Add top 20 movies in the country in list
- Combine retrieved items into list, removing duplicates and items already watched/purchased

Ranking:
- Take list retrieved and find which movies the user will give the highest ratings to using learned model
- Display top items to user