## Lecture 7: Recommender Systems

These notes are for lecture 7, recommendation systems. There are two main ideas that this lecture talks about:

1. Using a naive approach based on simple similarity measures to tackle the problem of recommendation
2. Using a technique called matrix factorization

**Introduction and Notation**

The user ratings and the corresponding items are stored in a user-item rating matrix:

$$ Y = 
\begin{bmatrix}
y_{11} & y_{12} & y_{13}\\
y_{21} & y_{22} & y_{23}
\end{bmatrix}
$$


Here $y_{ij}$ corresponds to the rating by $user_i$ for $item_j$

There are some peculiarities about this matrix:

1. Most entries in this matrix will be missing
2. All the users generally don't interact with all the items

Check the tab **User Item Rating Matrix** for a detailed example.

**K-Nearest Neighbour Method**

In this method we do the following:

1. We decide on how we will compute the similarity between different users: common metrics that one can use are: euclidean distance, cosine distance, correlation coefficient etc
2. We then decide how many similar users should be considered to determine the rating of one target user

See the tab **User Item Rating Matrix** for a detailed numeric example

The downside of using this approach is that:

1. The similarity measures used fail to account for users that can have likings across varied dimensions (ML Academic also liking gardening books)
2. The hidden structure of the data is not revealed

**Collaborative Filtering: The naive approach**

The basic problem set up has been done in the tab called **Collaborative Filtering 1**. Both $Y and X$ matrices have been described there.

The estimation problem can be posed in a very similar fashion as the regression problem with following cost function:

$Cost(X)=\sum_{(a,i)\in D}\frac{(Y_{ai}-X_{ai})^2}{2}+\frac{\lambda}{2}\sum_{(a,i)}X^2_{ai}$

When the cost is minimised, the minima occurs for the following value of $X_{ai}$

If $(a,i) \in D$ then $X_{ai} = \frac{Y_{ai}}{1+\lambda}$

else

$X_{ai}=0$

**Collaborative Filtering: Matrix Factorization**

The intuition behind the approach is explained in **Algorithms Demos.pptx** deck. The main idea here is that we will factorize the matrix $Y_{m \times n}$ into two matrices $U_{m \times k}$ and $V_{k \times n}$ such that $Y = UV$

There are two takeaways from this:

1. The rank $k$ lets us decide the number of factors that we may believe will be impacting the rating decisions
2. The number of parameters to be estimated are now $m+n$ instead of $m \times n$.


**Alternating Minimization**

Refer to the official video, the numerical example is quite clear.