# Alternating Least Squares

## References

http://cs229.stanford.edu/proj2017/final-posters/5147271.pdf

## Intro 

ALS algorithm works by alternating between rows and columns to factorized the matrix.

Stochastic Gradient Descent 

- Flexibility
- Parallel
- Slower
- Hard to handle unobserved interaction (sparsity)

Alternating Least Square 

- basically ordinary least square method only
- Parallel
- Faster
- Easy to handle sparsity


## Algorithm

1. Initiate row factor U, column factor V
2. Repeat until convergence
    1. for i=1 to n do   (iterating over rows)
        
        $u_i = (\sum_{r_{{i,j} \in r_{i*}}} {v_j v_j^T + \lambda I_k })^{-1} \sum_{r_{i,j} \in r_{i*}}{r_{ij} v_j}$
        
       end for  [solving for row factors when column factors are features]
    
    2. for i=1 to m do   (iterative over columns)
       
        $v_i = (\sum_{r_{{i,j} \in r_{*j}}} {u_i u_i^T + \lambda I_k })^{-1} \sum_{r_{i,j} \in r_{*j}}{r_{ij} u_i}$
       
       end for [solving for column factors when row factors are features]
       
       
Where\
    U = row factor matrix\
    V = column factor matrix\
    r = ratings
    
    
    
Similarity between als vs ols with l2 regularization 

$\theta = (X^T X + \lambda I)^{-1} X^T Y$

$v_i = (\sum_{r_{{i,j} \in r_{*j}}} {u_i u_i^T + \lambda I_k })^{-1} \sum_{r_{i,j} \in r_{*j}}{r_{ij} u_i}$

## Approach


```sql
    
              movies (n)                         k                  n
            +---------------------+        +------+   +----------------------+
            |                     |        |      | x |      movie           | k
            |                     |        | user |   |      factor          |      V
            |                     |        |factor|   +----------------------+
  users(m)  |                     |        |      |
            |     RATINGS         |    ~   |      |
            |                     |    ~   |      | m
            |                     |        |      |
            |                     |        |      |
            |                     |        |      |
            +---------------------+        +------+
                                                U
```

Iterative Algorithm 

* fix V, compute U
* fix U, compute V

## Compare with SVD


Matrix Factorization Method.

More here https://machinelearningexploration.readthedocs.io/en/latest/MathExploration/SingularValueDecomposition.html

```sql

                                        +----------------------+
                                      K |______________________| 
                                        |                      |  
  items                            K    +----------------------+        
+---------------------+        +-------+   
|                     |        |   |   |   
|u                    |        |   |   |   
|s                    |        |   |   |
|e                    |    ~   |   |   |
|r                    |    ~   |   |   | 
|s                    |        |   |   |
|                     |        |   |   |
|                     |        |   |   |
+---------------------+        +-------+
           X                      V       x       VT             


                 row factors (items embeddings)
               /
Factorization 
               \ 
                column factors (user embeddings)
                
                
```


In SVD the missing observations has to be fill as zeros.

$| A - U V^T |^2$

```sql
        item1  item2  item3   item4
       +------+------+------+------+
 user1 |  1   |  0   |  0   |  1   |
       +------+------+------+------+
 user2 |  0   |  1   |  0   |  0   |
       +------+------+------+------+
 user3 |  0   |  1   |  0   |  0   |
       +------+------+------+------+
 user4 |  0   |  0   |  1   |  0   |
       +------+------+------+------+
   
```


In ALS we use rows and columns alternatively as features, hence `no need to fill missing values`.

$\sum_{i,j \in obs} (A_{ij} - U_i V_j^T)^2$

```sql
        item1  item2  item3   item4
       +------+------+------+------+
 user1 |  1   |      |      |  1   |
       +------+------+------+------+
 user2 |      |  1   |      |      |
       +------+------+------+------+
 user3 |      |  1   |      |      |
       +------+------+------+------+
 user4 |      |      |  1   |      |
       +------+------+------+------+
   
```


# Weighted Alternating Least Squares (WALS)

\begin{align}
    \sum_{i,j \in obs} (A_{ij} - U_i V_j)^2 - w_k \times \sum_{i,j \notin obs} (0 - U_i V_j)^2
\end{align}