# Hierarchical Matrix Factorization

## What is Matrix Factorization? 

Matrix factorization is a dimensionality reduction technique which can be expressed as the following general optimization problem:

$$ \min_{\textbf{U}, \textbf{V}} \quad \mathcal{L} \ \big( \ \textbf{X}, \textbf{U} \textbf{V}^\top \ \big) $$

where:

$
\begin{aligned} 
\quad\textbf{X} & \quad \text{is an} \  m \times n \ \text{input matrix}, \\
\quad\textbf{U} & \quad \text{is an} \  m \text{-row output matrix}, \\
\quad\textbf{V} & \quad \text{is an} \  n \text{-row output matrix}, \\
\quad\mathcal{L} & \quad \text{is a loss function}. 
\end{aligned}
$

The input matrix is typically very sparse: most of the elements $x_{i,j}, \ i \in \{1, \dots, m \}, \ j \in \{1, \dots, n \}$ are empty. The information that *is* contained in the input matrix is encoded in a latent space by the product of the output matrices. The column dimension of the output can be denoted by $k$ where $k \ll \{m, n\}$. The choice of loss function is dependent upon the problem space represented by the input matrix.

This technique has a variety of practical applications. Matrix factorization was used by the winners of the famous [Netlix Prize](https://en.wikipedia.org/wiki/Netflix_Prize) and is commonly referred to as *collaborative filtering* in the recommendation system domain.

## What is Hierarchical Matrix Factorization? 

In [2]:
import torch

In [3]:
torch.rand(5)

tensor([0.2761, 0.6389, 0.5799, 0.1393, 0.7162])

In [4]:
torch.device('mps')

device(type='mps')