# Lecture 3: Singular Value Decomposition

Unlike eigendecomposition, can be performed on any matrix $A$, not just on square ones.

Exposes useful and interesting properties of $A$:

 * rank
 * null-space
 * orthogonal basis for col/row space
 
Applications of SVD:

 * computing pseudo-inverse
 * least-squares fitting of data
 * matrix approximation
 * determining rank, range, and null space

![svd](img/svd.png)

## Properties of SVD

1. $U, V$ orthogonal (rows and cols are orthonormal vectors)
2. $U \in \mathbb{R}^{M \times M}, U^TU = UU^T = I$
3. $D \in \mathbb{M \times N}$, diagonal
 * padded with 0-rows at the bottom if M > N
 * padded with 0-columns right if N > M
4. Elements on $\mathbf{D}$'s diagonal are the **singular values $\sigma_i$**
5. Singular values are ordered from largest to smallest (convention)
6. All singular values are positive by convention (TODO(andrei): Is this inherent? Do they get clamped, abs'd?)
7. $\sigma_i = d_{ii} = 0 \quad \forall i > \operatorname{rank}(A)$

If M = N and A is symmetric, then SVD is identical to the eigendecomposition.

### Explicit representations of range, null-space, and rank

 * For the right singular vectors (in V): $\operatorname{Null}(A) =
 \operatorname{ker}(A)  =
 \operatorname{span}\left(\{ v_i \> | \> \sigma_i = 0 \}\right) $
 * For the left singular vectors (in U): $\operatorname{Range}(A) =
 \operatorname{span}\left( \{ u_i \> | \> \sigma_i > 0 \} \right)$
 * $\operatorname{rank}(A) = $ nr. of non-zero singular values.
 * $\operatorname{nullity}(A) = $ nr. of zero singular values.
 * rank + nullity = min(M, N) (fundamental theorem of LA)

In [7]:
# TODO(andrei): Short note on compact SVD + example from numpy.linalg

## Matrix Approximation via SVD

### Frobenius norm

$$
\|A\|_F := \sqrt{\sum_{i=1}^M\sum_{j=1}^Na_{ij}^2} = \sqrt{\operatorname{trace}(A^TA)}
 = \sqrt{\sum_{i=1}^K\sigma_i}, \quad K = \min{M, N}
$$

The frobenius norm can be expressed as the l2 norm of the singular value vector, i.e. the sqrt of the sum of squared singular values.

Proof follows from the definition of trace (sum of diagonal elements), and its cyclic nature, $\operatorname{trace}(XYZ) = \operatorname{trace}(ZXY)$.

TODO(andrei): Why does trace(VD^2V^T) = trace(D^2)?


### Singular Values and Matrix Norms

Matrix 2-norm = largest singular value:

$$ \| A \|_2 = \sup\left\{ \|Ax\|_2 : \|x\|_2 = 1 \right\} = \sigma_1 $$

From Wikipedia: In the special case of p = 2 (the Euclidean norm), the induced matrix norm is the spectral norm. The spectral norm of a matrix A is the largest singular value of A i.e. the square root of the largest eigenvalue of the positive-semidefinite matrix $A^TA$.

### Eckart-Young Theorem

**Reduced SVD** provides an optimal low-rank approximation in the sense of the Frobenius norm.

Define:
$$
\mathbf{A}_K := \sum_{i=1}^K \sigma_i u_i v_i^T, \quad \operatorname{rank}(\mathbf{A}_K )= K
$$

Then we can find the minimum approximation of $A$, $B$ of rank $K$ as being:

$$
\min_{\operatorname{rank}(B)=K} \| A - B \|_F^2 = \| A - A_K \|_F^2
= \sum_{k=K+1}^{\operatorname{rank}(A)}\sigma_k^2
$$

In other words, for a fixed rank K approximation, the minimum error we can get is the one obtained by the (rank(A) - K) **smallest eigenvalues**. I.e. the best rank K approximation is given by the K **largest eigenvalues**.

It can also be shown that this approximation is not only optimal in the sense of the Frobenius norm, but also in the sense of the **matrix 2-norm**.

## SVD for Image Compression

Represent image as matrix of pixels -> SVD -> compressed (lower-rank) approximation.

Q: For a given MxN image, what is the compression ratio, when the compressed image is rank R?

Note: following matrices should have a bar (for compact SVD notation).
Rank R => kept R singular values => U = M x R, D = R x R, V = R x N.

Original = N \* M

Compressed = (M \* R + R \* R + R \* N)

In [10]:
n = 1080
m = 1920
R = 256   # Judging by examples in the slides, this should be an OK quality.

full_size = n * m
compressed_size = m * R + R * R + R * n
ratio = compressed_size * 1.0 / full_size
print("Full size: {0}\nCompressed: {1}\nRatio: {2:.2f}% of original"
      .format(full_size, compressed_size, ratio * 100))

Full size: 2073600
Compressed: 833536
Ratio: 40.20% of original


## Eigendecomposition via SVD

Columns of U are eigenvectors of $AA^T$.
Rows of V are eigenvectors of $A^TA$.

If A is symmetrical, then the two are identical.

### Computing SVD

![svd-instructions](img/svd-instructions.png)

Note: if A symmetric, in the end the resulting $U = AVD^{-1}$ should be the same as the matrix of eigenvectors of $AA^T$. TODO(andrei): Double check.

In [13]:
# TODO(andrei): Small primer on how to compute eigenpairs for small matrices.
# Necessary for exam! (Good reference available on Khan Academy.)