In [1]:
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");

# Singular Value Decomposition

Singular value decomposition (SVD) is a factorization of a matrix into three matrices. It is used in many applications such as data compression, denoising, and solving linear systems of equations. In scientific machine learning, it is used in principal component analysis (PCA), Karhunen-Loève transform, dynamic mode decomposition, and proper orthogonal decomposition.

More details on the theory can be found on the book [Data-driven Science and Engineering](https://databookuw.com/).

Let $\mathbf{X}$ be an $n \times m$ matrix.
Think of $\mathbf{X}$ as matrix you can make when doing $n$ experiments and measuring $m$ different things.
The SVD of $\mathbf{X}$ is given by

$$
\mathbf{X} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T
$$

where $\mathbf{U}$ is an $n \times n$ orthogonal matrix, $\mathbf{\Sigma}$ is an $n \times m$ matrix with non-negative real numbers on the diagonal and zeros elsewhere, and $\mathbf{V}$ is an $m \times m$ orthogonal matrix.

## Economy-size SVD
Assume that $n \geq m$.
Then, $\mathbf{\Sigma}$ has the form:

$$
\mathbf{\Sigma} = \begin{bmatrix}
\hat{\mathbf{\Sigma}} \\
\mathbf{0}
\end{bmatrix},
$$

where $\hat{\mathbf{\Sigma}}$ is an $m \times m$ diagonal matrix with non-negative real numbers on the diagonal and zeros elsewhere, and $\mathbf{0}$ is an $(n-m) \times m$ matrix with zeros.
Now, only the first $m$ columns of $\mathbf{U}$ are needed to represent $\mathbf{X}$.
We write the **economy-size** SVD as:

$$
\mathbf{X} = \mathbf{U}_m \hat{\mathbf{\Sigma}} \mathbf{V}^T,
$$

where $\mathbf{U}_m$ is an $n \times m$ matrix with the first $m$ columns of $\mathbf{U}$.

## Truncated SVD

The truncated SVD is a low-rank approximation of $\mathbf{X}$.
It is given by:

$$
\mathbf{X} \approx \mathbf{U}_k \hat{\mathbf{\Sigma}}_k \mathbf{V}_k^T,
$$

where $\mathbf{U}_k$ is an $n \times k$ matrix with the first $k$ columns of $\mathbf{U}$, $\hat{\mathbf{\Sigma}}_k$ is a $k \times k$ diagonal matrix with the first $k$ singular values of $\mathbf{\Sigma}$, and $\mathbf{V}_k$ is an $m \times k$ matrix with the first $k$ columns of $\mathbf{V}$.

We can also write:

$$
\mathbf{X} \approx \mathbf{X}_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T,
$$

where $\sigma_i$ is the $i$-th singular value, and $\mathbf{u}_i$ and $\mathbf{v}_i$ are the $i$-th columns of $\mathbf{U}$ and $\mathbf{V}$, respectively.

One can show that the matrix $\mathbf{X}_k$ is the best rank-$k$ approximation of $\mathbf{X}$ in the Frobenius norm.