# Recommender Systems

Usage of recommender systems: Predicting user's interest towards unconsumed items. Example: Music, movies, products, social network posts...

Our purpose is to predict the rating that users will associate to an item, in order to decide if we are going to suggest it or not. Thus, it is a regression problem in which we are going to recommend the item with the larger predicted appreciation ratio.

If we model recommendation as linear regression, we assume that the predicted/actual rating of an item is linearly dependent to the features of the item, which may represent various things depending on context (e.g. for movies, those features will represent all the possible movie categories, while $\theta$ represent the preferences of the user for the associated category). **Purpose**: Minimizing the sum of squared errors for items which were already rated.

**Content-based filtering:** Items are described by features and ratings are assigned based on those features.

$$\min_{\theta_1,\dots,\theta_{n_u}} \frac{1}{2} \Sigma_{j=1}^{n_u}\Sigma_{i=1}^n r_{i,j}(\theta_j^Tx_i-y_{i,j})^2 + \frac{\lambda}{2}\Sigma_{j=1}^{n_u}\theta_j^T\theta_j$$

Where:

* $\theta$ is a vector of the preferences of the user for the associated category.
* $r_{i,j}$ is used to take in account only items that were already rated
* $x_i$ is the vector of features.
* $y_{i,j}$ is the vector of rating.
* $\lambda$ is the regularization factor.

**Collaborative filtering**: By knowing item features $x_n$ (e.g. categories) and ratings vector $y$, we can learn the user preferences $\theta_{n_u}$. Users implicitly collaborate to characterize content.

$$\min_{x_i,\dots,x_n} \frac{1}{2} \Sigma_{i=1}^{n}\Sigma_{j=1}^{n_u} r_{i,j}(\theta_j^Tx_i-y_{i,j})^2 + \frac{\lambda}{2}\Sigma_{i=1}^{n}x_i^Tx_i$$

If we assume p features, we are assuming that user preferences are p dimensional.

We can learn features and preferences together:

$$\min_{x_i,\dots,x_n;\theta_1,\dots,\theta_{n_u}} \frac{1}{2} \Sigma_{i=1}^{n}\Sigma_{j=1}^{n_u} r_{i,j}(\theta_j^Tx_i-y_{i,j})^2 + \frac{\lambda}{2}\Sigma_{i=1}^{n}x_i^Tx_i + \frac{\lambda}{2}\Sigma_{j=1}^{n_u}\theta_j^T\theta_j$$

For a new user, the part of the summation concerning the already rated items is zero, so the only goal is to minimize the sum of preferences. Using regularization, this results in assigning zero to all preferences for the user, which is consistent to the context since we don't know his preferences. This is called **cold start problem**, since we are assuming the user is like all the others. To avoid this setting, we can ask user to pre-compile some preferences. Same goes for new items.

Features have no clear meaning, their meaning can be extrapolated by plotting them and trying to associate them with high-scoring items.

We can also see recommendation as a binary classification problem (to recommend or not to recommend?) by recommending the most preferred item to the user in absolute (**accuracy** is 1 if the recommended item is also the preferred one, 0 if it is not) or the most preferred among the top k recommendation (**accuracy at K**, if the preferred item was between the top k recommended, accuracy is 1, else is 0).à

Another possible approach is **information retrieval**, which is closely related to classification since its indexes are very close to FPR and FNR.

**Precision**: The number of recommended documents which are also relevant. If precision is 1 and recall is 0, we are not recommending nothing.

$$\text{Precision} = \frac{\text{# Relevant} \times \text{# Recommended}}{\text{# Recommended}}$$

**Recall**: The number of relevant documents which are also recommended. If  precision is 0 and recall is 1, we are recommending everything.

$$\text{Recall} = \frac{\text{# Relevant} \times \text{# Recommended}}{\text{# Relevant}}$$

The ideal case is where both are 1.

Some interesting concepts for recommender systems:

* **Diversity**: We should avoid to recommend always the same items in order to avoid being boring.

* **Serendipity**: Generate a positive surprise in the user, which sometimes requires to go out of the mathematical guidelines.

* **Revenue**: A good measure of effectivenes is measuring the revenue generated by it. It can be measured by evaluating how sales evolved before and after inserting the system.

