# Collaborative Filtering for Implicit Feedback Datasets

### Cost function for squared error with regularization

Across all x (users) and y (items), find the values of u and i that minimize the summation below.

$\underset{x,y}min\underset{u,i}\sum 
c_{ui} (p_{ui} - x_u^Ty_i)^2 + \lambda
(\underset u \sum \parallel x_u \parallel ^2
+\underset u \sum \parallel y_i \parallel ^2)$

##### Where:

$x_u$ is user vector,
$y_i$ is item vector.

$p_{ui} = 1$ if interaction, 
$p_{ui} = 0$ if no interaction.

$c_{ui} = 1 + \alpha * r_{ui}$, where
$r_{ui}$ = number of user item interactions.

$\lambda$ is regularization term.

#### Explanation of cost function

We take the squared error of our prediction and 
multiply by the confidence, and regularize our $x$ and $y$ vectors with $\lambda$ to penalize overfitting. (larger values or smaller values?)

$\alpha$ allows us to influence our confidence levels. Clearly, our confidence increases when a producer samples the same artist multiple times, but by how much? $\alpha$ determines how important multiple samples are.

We add 1 so that non-interactions are not lost during the cost calculation.

### ALS Algorithm

However, we can't use the cost function above because of the size of the dataset. (m * n terms)

Therefore we modify the cost function to Alternating Least Squares, which works by holding either user vectors or item vectors constant and calculating the global minimum, then alternating to the other vector.

#### Recompute user factors

$x_u = (Y^T C^u Y + \lambda I)^{-1}  Y^T C^u p(u)$

##### Where:

$Y$ is $n * f$ matrix of item-factors. 

$C^u$ is a $n*n$ diagonal matrix for user $u$ where $C^u_{ii} = c_{ui}$. This is our confidence matrix for n items.

$p(u)$ is vector of preferences for user $u$.


#### Recompute item factors

$y_i = (X^TC^iX + \lambda I)^-1 X^TC^ip(i)$

##### Where:
$X$ = $m * f$ matrix  of user_factors. 

$C^i$ is $m * m$ diagonal matrix for each item $i$ where $C_{uu}^i = c_{ui}$

$p(i)$ is vector of preferences for item $i$.

# Explaining the recommendation

1. Substitute in for user factors the equation for user factors
$x_u = (Y^T C^u Y + \lambda I)^{-1}  Y^T C^u p(u)$



Predicted preference of user $u$ at item $i$ 

$\hat{p}_{ui} = y_i^Tx_u$

Thus

$\hat{p}_{ui} =  y_i^T(Y^T C^u Y + \lambda I)^{-1}  Y^T C^u p(u)$

Denote $f*f$ matrix $(Y^T C^u Y + \lambda I)^{-1}$ as $W^u$

$W^u$ is considered the weight for user $u$
