# Recommending Products to Users/Customers

References:<br>
https://www.coursera.org/learn/machine-learning

Could possibly be used to recommend markets to businesses by identifying demand in similar markets or products that are similar to already popular products.

---

# Content-based recommendations

Make recommendations for unviewed/unrated movies based on previous ratings by user(s) for other products.

<img src="images/rec1.png">

Here,<br>
$ n_u = $ number of users <br>
$ n_m = $ number of movies <br>
$ n = $ number of features <br>
$ r(i,j) = 1$ if user $j$ has rated movie $i$ (0 otherwise) <br>
$y^{(i,j)}$ = rating by user $j$ on movie $i$ (if defined) <br>
$ m^{(j)}$ = number of movies rated by user $j$


Each movie is represented by a feature vector with an intercept term $x_0 = 1$, for example,
$$ x^{(1)} = \begin{bmatrix} 1 \\ 0.9 \\ 0 \end{bmatrix} $$


For each user $j$, learn a parameter $\theta^{(j)} \in \mathbb{R}^3$. Predict user $j$ as rating moving $i$ with $(\theta^{(j)})^T x^{(i)}$.


Can apply a least squares minimization procedure with regularization to parameterize the reviews/behavior of each user $j$.

<img src="images/rec2.png" width="600">

Apply gradient descent to minimize this function. Procedure is different for features that are zero-valued, there is no regularization.

<img src="images/rec3.png" width="600">

---

# Collaborative filtering

If features describing movies do not exist, this algorithm can learn what features to use.

<img src="images/rec4.png">

Assume users have indicated their personal preference for different types of movies. Thus users have provided information that allows us to parameterize their potential ratings.

<img src="images/rec5.png">

From this information, it is possible to infer the values of the features for each movie. So given an estimate for parameters $\theta$ the values of the features $x$ can be calculated, then the parameters can be updated and the process repeated iteratively.

### Algorithm

#### Optimization objective

The cost function $J$ is,

$$ \frac{1}{2} \sum_{(i,j):r(i,j)=1} \big( (\theta^{(j)})^T x^{(i)} - y^{(i,j)} \big)^2 + 
\frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n} (x_k^{(i)})^2 + 
\frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} (\theta_k^{(j)})^2 $$

1. Initialize $x$ and $\theta$ to small random variables
1. Minimize the cost function $J$ using gradient descent or other optimization procedure for every user $j$ and movie $i$
1. For a user with parameters $\theta$ and movie with learned features $x$, predict a star rating of $\theta^T x$

---

# Low rank matrix factorization

Find related products using ratings.

<img src="images/rec6.png" width="600">

The predicted ratings can also be written as $X \Theta^T$, where,

$$ X = \begin{bmatrix} (x^{(1)})^T \\ (x^{(2)})^T \\ ... \\ (x^{(n_m)})^T \end{bmatrix}, \quad
\Theta = \begin{bmatrix} (\theta^{(1)})^T \\ (\theta^{(2)})^T \\ ... \\ (\theta^{(n_u)})^T \end{bmatrix} $$

How to find movies $j$ related to movie $i$?

Find the movies with the smallest distance between the feature vectors $||x_j - x_i ||$.

### Note

If users have not previously rated any movies, this algorithm will return a rating of zero for all movies but this might not be realistic. So we can use ***mean normalization*** on the given rating data, where each movie will have an average rating of zero.

In this case the prediction for user $j$ on movie $i$,

$$ (\theta{(j)})^T x^{(i)} + \mu_i $$

The prediction is then just the average rating for that movie over all users (since the first term is zero because user $j$ has not reviewed movie $i$).

Can also be modified to recommend movie $i$ with no ratings with a predicted rating based on the individual user $j$. Though the utility of that procedure is questionable.