# Supervised Learning
## Recommenders

Author: Bingchen Wang

Last Updated: 10 Sep, 2022

---
<nav>
    <a href="../Machine%20Learning.ipynb">Machine Learning</a> |
    <a href="./Supervised Learning.ipynb">Supervised Learning</a>
</nav>

---

### Overview

Concepts:
- [Recommendation Systems](#RS)
- [Collaborative Filtering](#CF)
- [Content-based Filtering](#CBF)
- [Finding Similar Items/Users](#FSIU)

Implementation:
- [TensorFlow](./Recommenders/Tensorflow%20Implementation.ipynb)
- PyTorch

<a name = "RS"></a>
### Recommondation Systems
<blockquote>
    A recommender system, or a recommendation system (sometimes replacing 'system' with a synonym such as platform or engine), is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user.
    -- Wikipedia
</blockquote>

<a name = "CF"></a>
### Collaborative Filtering

#### Notation

<table style="width:50%">
    <tr>
        <th style="text-align:left">Concept</th>
        <th style="text-align:left">Notation</th>
    </tr>
    <tr>
        <td>number of items</td>
        <td>$n_m$</td>
    </tr>    
    <tr>
        <td>number of users</td>
        <td>$n_u$</td>
    </tr>
    <tr>
        <td>number of features for each item</td>
        <td>$n$</td>
    </tr>
    <tr>
        <td>item $i$ is rated by user $j$</td>
        <td>$r(i,j) = 1$</td>
    </tr>
    <tr>
        <td>rating of item $i$ given by user $j$</td>
        <td>$y(i,j)$</td>
    </tr>
    <tr>
        <td>feature rating of item $i$</td>
        <td>$\mathbf{x}^{(i)}$</td>
    </tr>
    <tr>
        <td>preference of user $j$</td>
        <td>$\mathbf{w}^{(j)}, b^{(j)}$</td>
    </tr>
</table>

#### Algorithm
Target: ratings by users, click by users, likes by users etc. <br>
Prediction is given by:
$$
\hat y(i,j) = \mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}
$$
The cost function is given by:
$$
J(\mathbf{W}, \mathbf{b}, \mathbf{X})= \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2
+\underbrace{\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2}_{\text{regularisation}}
$$
Trainable parameters: $\mathbf{W}, \mathbf{b}, \mathbf{X}$.

#### Mean Normalisation
Intuition:
- For a new user who has not rated any item, the regularization term drives the individual parameters to 0 (assuming $b^{(j)}$ is initialised as 0 and without normalisation).

Procedure:
- Compute average ratings across the columns (where ratings are available), $\mu_i$
- Subtract the ratings by the average ratings (where ratings are available) $\tilde y(i,j) = y(i,j) - \mu_i$
- Train models using $\mathbf{X}$ and $\mathbf{\tilde y}$
- For predictions, use $\hat y(i,j) = \hat{\tilde{y}}(i,j) + \mu_i$

$0/0$ problem:
- Can be solved by adding a small number to the denominator.

Benefits:
- Better predictions for users who have not rated any item or have only rated a few items.
- Make the optimisation algorithm run a little faster.

<a name = "CBF"></a>
### Content-based Filtering

#### Notation

<table style="width:50%">
    <tr>
        <th style="text-align:left">Concept</th>
        <th style="text-align:left">Notation</th>
    </tr>
    <tr>
        <td>item content vector</td>
        <td>$\mathbf{x}_m^{(i)}$</td>
    </tr>    
    <tr>
        <td>user content vector</td>
        <td>$\mathbf{x}_u^{(j)}$</td>
    </tr>
    <tr>
        <td>item/user feature dimension</td>
        <td>$n$</td>
    </tr>
    <tr>
        <td>item feature vector (post NN)</td>
        <td>$\mathbf{v}_m^{(i)} \in \mathbb{R}^n$</td>
    </tr>
    <tr>
        <td>user feature vector (post NN)</td>
        <td>$\mathbf{v}_u^{(j)} \in \mathbb{R}^n$</td>
    </tr>
    <tr>
        <td>rating of item $i$ given by user $j$</td>
        <td>$y(i,j)$</td>
    </tr>
</table>
<div class = "alert alert-block alert-info"><b>Note:</b> Item features can also be engineered using the target, e.g. average user ratings per item. </div>

#### Algorithm
<figure>
    <center> <img src="./Recommenders/content-based filtering.pdf"   style="width:100%" ></center>
</figure>

Target: ratings by users, click by users, likes by users etc. <br>
Prediction is given by:
$$
\hat y(i,j) = \mathbf{v}_u^{(j)} \cdot \mathbf{v}_m^{(i)}
$$
The cost function is given by:
$$
J(\mathbf{W}, \mathbf{b})= \sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{v}_u^{(j)} \cdot \mathbf{v}_m^{(i)} - y^{(i,j)})^2
+\text{regularisation of neural networks weights}
$$
Trainable parameters: $\mathbf{W}, \mathbf{b}$ (neural network parameters).

#### Preprocessing
##### Input user content and item content vectors
- Apply the **standard scaler** along each feature (for neural network inputs)
$$
z = \frac{(x- \mu)}{s}
$$
##### Target ratings
- Apply the **min-max scaler** to scale $y$ to be between $-1$ and $1$.
$$
z = \frac{y - min}{max -min} \cdot (1) + (1- \frac{y- min}{max - min}) \cdot (-1)
$$

<a name = "FSIU"></a>
### Finding Similar Items/Users
Similar users have similar user feature vectors, while similar items have similar item feature vectors.

Measure similarity by the squared distance between the two vectors:
- User
$$
\left \Vert v_u^{(j)} - v_u^{(k)} \right \Vert^2 = \sum_{l=1}^{n}(v_{u_l}^{(j)} - v_{u_l}^{(k)})^2
$$
- Item
$$
\left \Vert x_m^{(i)} - x_m^{(h)} \right \Vert^2 = \sum_{l=1}^{n}(x_{m_l}^{(i)} - x_{m_l}^{(h)})^2 \;\;\;\text{(collaborate filtering)}
$$
$$
\left \Vert v_m^{(i)} - v_m^{(h)} \right \Vert^2 = \sum_{l=1}^{n}(v_{m_l}^{(i)} - v_{m_l}^{(h)})^2 \;\;\;\text{(content-based filtering)}
$$