# Singular Value Decomposition

Singular value decomposition, is one of the most useful matrix factorizations in applied linear algebra.

We will start with symmetric matrices that are form the foundation for singular value decompositions. 

Symmetric matrices are quite common in real-world applications, and they play an important role in various mathematical theories. Their importance comes from two key properties: 

**Diagonalization:** For any symmetric matrix denoted as 'A,' there exists a diagonal matrix 'D' and an invertible matrix 'P' such that:

A = PDP⁻¹

**Orthogonality:** Furthermore, we can choose the matrix 'P' to be orthogonal (meaning its inverse is its transpose, P⁻¹ = Pᵀ). When this condition holds, matrix 'A' is referred to as 'orthogonally diagonalizable.'

In fact, an 'n x n' matrix 'A' is orthogonally diagonalizable if and only if it is symmetric.


Now, we are interested in the geometricinterpretation of the action of a symmetric matrix $A$ on a vector $\vec{x}$. While this seems to not have a straightforward answer, but by orthogonally diagonalizing 'A,' we can arrive at a reasonable explanation. Let's start with the geometric interpretation of the action of a diagonal matrix:

**Example 1:** Give a non-trivial example of a $3x3$ diagonal matrix, and exlore its effect on an arbitary vector in $\mathbb{R}^3$. Consider the effect of your matrix o standard basis $\vec{e}_1, \vec{e}_2, \vec{e}_3$.

In [9]:
# try out a few example to see it yourself.
#D =  

**Example 2:**

Give a non-trivial example of a $3x3$ orthogonal matrix, and exlore its effect on an arbitary vector in $\mathbb{R}^3$. Consider the effect of your matrix o standard basis $\vec{e}_1, \vec{e}_2, \vec{e}_3$.

In [10]:
# code here
#P = 

Now we are ready to answer our question. Let 
$$
A = PDP^T.
$$

The transformation defined by $A$ can be interpreted as a sequence of geometric transformations: $P^T$ rotates by $\theta$, $D$ stretches and reflects, and $P$ rotates by $-\theta$

Plot $\vec{v}$ and $A\vec{v}$ to visualize it better.   

In [11]:
# code here

What can we do with matrices that are not symmetric or even square? What effect does an $n \times m$ matrix have on a vector? Our intuition suggests that it should involve stretching, rotation, and possibly a change in dimension. But how can we break down such a matrix into operations that reflect these transformations?

### Singular Values

Let $A$ be an $m \times n$ matrix. Then, $A^TA$ is symmetric and orthogonally diagonalizable. Let $\{\vec{v_1}, \dots, \vec{v_n}\}$ be an orthonormal basis of $\mathbb{R}^n$ consisting of the eigenvectors of $A^TA$, and let $\lambda_1, \dots, \lambda_n$ be their associated eigenvalues (it's worth noting that such a basis always exists). Then, it's not difficult to observe:

$$\|A\vec{v_i}\|^2 = \lambda_i\quad \text{for all } i \leq n \quad (*)$$

This shows that these eigenvalues are non-negative. For convenience, let's assume that we've arranged the eigenvalues as follows:

$$\lambda_1 \geq \lambda_2 \geq \dots \geq 0.$$

The __singular values__ of $A$, denoted as $\sigma_1, \dots, \sigma_n$, are the square roots of these eigenvalues:

$$\sigma_i = \sqrt{\lambda_i} \quad \text{for all } i \leq n.$$

It's important to note that, based on $(*)$, the length of $A\vec{v_i}$ is $\sigma_i$ for all $i \leq n$.

The first singular value of $A$ is the maximum of $\|A\vec{x}\|$ over all unit vectors $\vec{x}$ (where $\vec{x}$ is a unit vector), and this maximum is achieved at $\vec{v_1}$. Similarly, the second singular value is the maximum of $\|A\vec{x}\|$ over all unit vectors that are orthogonal to $\vec{v_1}$, and this maximum is attained at $\vec{v}_2$, and so on.

__Theorem:__ The number of nonzero singular values is equal to the rank of the matrix $A$. In fact, if $A$ has only $r$ nonzero singular values, then $A\vec{v_1}, A\vec{v_2}, \dots, A\vec{v_r}$ form an orthonormal basis for the column space of $A$ (denoted as col(A)).


### SVD (Singular Value Decomposition)

Consider an $m \times n$ matrix $A$ of rank $r$. It can be decomposed into three matrices: an $m \times n$ "diagonal-like" matrix $\Sigma$, and two orthogonal matrices $U$ and $V$ such that:

$$
A = U_{mxm} \Sigma_{mxn} V^{T}_{nxn}
$$

More precisely, the matrix $\Sigma$ takes the following form:

$$
\Sigma = \begin{bmatrix} D_{rxr} & 0_{rx(n-r)} \\ 0_{(m-r)xr} & 0_{(m-r)x(n-r)}\end{bmatrix}
$$

Additionally, it holds that:

$$
UU^T = I_m \quad \text{and} \quad VV^T = I_n
$$

In intuitive terms, the SVD represents an expansion of the original data in a coordinate system where the covariance matrix is diagonal.

The calculation of the SVD involves finding $AA^T$ and $A^TA$. The eigenvectors of $A^TA$ make up the columns of matrix $V$, while the eigenvectors of $AA^T$ make up the columns of matrix $U$. The diagonal entries of the matrix $D$ are the singular values of $A$ and are arranged in descending order.

**Example: Singular Value Decomposition (SVD) and Geometric Interpretation**

Consider the matrix:

$$
A = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix}
$$


1. Construct the Singular Value Decomposition (SVD) of matrix A.
2. Use this construction to explain why the mapping $\vec{x} \to A\vec{x}$ sends the unit sphere in $\{\vec{x}\in \mathbb{R^3}: \|x\| = 1 \}$ onto an elips in $\mathbb{R}^2$. 

![Screenshot%202023-10-08%20at%205.07.27%20PM.png](attachment:Screenshot%202023-10-08%20at%205.07.27%20PM.png)

In [12]:
# you code