# Latent Factor Models

## I. Matrix Factorization

### What is Matrix Factorization? 

Matrix factorization is a dimensionality reduction technique which can be expressed as the following general optimization problem:

$$ \min_{\textbf{U}, \textbf{V}} \quad \mathcal{L} \ \big( \ \textbf{X}, \textbf{U} \textbf{V}^\top \ \big) $$

where:

$
\begin{aligned} 
\quad\textbf{X} & \quad \text{is an} \  m \times n \ \text{input matrix}, \\
\quad\textbf{U} & \quad \text{is an} \  m \text{-row output matrix}, \\
\quad\textbf{V} & \quad \text{is an} \  n \text{-row output matrix}, \\
\quad\mathcal{L} & \quad \text{is a loss function}. 
\end{aligned}
$

Minimizing the loss function effectively **factors** the input matrix into the product of two *thin* output matrices. The input matrix is typically very sparse: most of the elements $x_{ij}, \ i \in \{1, \dots, m \}, \ j \in \{1, \dots, n \}$ are empty. The information that *is* contained in the input matrix is encoded in a latent space $\mathbb{R}^k$ by setting the column dimension of the output matrices to be $k$ where $k \ll \{m, n\}$. 

The specific choice of loss function is dependent upon the problem space represented by the the non-empty elements of input matrix. For example the winners of the famous [Netlix Prize](https://en.wikipedia.org/wiki/Netflix_Prize) factored an imput matrix where the non-empty elements represent **ratings**. In this application $x_{ij} \in \{1,\dots ,5\}$. 

We will now consider a problem space where the $x_{ij} \in \{0, 1\}$ represent **interactions**. 

### Interaction Dataset

We will build a synthetic interaction dataset where $m, n$ are sufficintly large. Each $i$ will have up to $4$ corresponding $j$:

In [11]:
from src.utils import make_interactions

ImportError: cannot import name 'make_interactions' from 'src.utils' (/Users/alexandervucenovic/projects/research/lfm/src/utils.py)

In [8]:
from src.utils import make_interactions_train_test

m = 1_000_000
n = 1_000
max_j_per_i = 4

interactions_train, interactions_train = make_interactions_train_test(m, n, max_j_per_i)

print(f'There are {len(interactions_train)} train interactions and {len(interactions_train)} test interactions\n\
The sparsity of the input matrix is {1 - ((len(interactions_train) + len(interactions_train)) / (m * n))}.')

ImportError: cannot import name 'make_interactions_train_test' from 'src.utils' (/Users/alexandervucenovic/projects/research/lfm/src/utils.py)

Next, 

In [9]:
from src.utils import build_dict

D = build_dict(interactions)

NameError: name 'interactions' is not defined

ModuleNotFoundError: No module named 'utils'

In [2]:
import torch

In [3]:
torch.rand(5)

tensor([0.2761, 0.6389, 0.5799, 0.1393, 0.7162])

In [4]:
torch.device('mps')

device(type='mps')

In [None]:
build_dict()