In [1]:
using LinearAlgebra, RowEchelon, LaTeXStrings, Plots, SymPy
include("LAcodes.jl");  ##] dev --local "."
LAcodes.title( "The Singular Value Decomposition", sz=30, color="darkred")

# 1. Motivation

## 1.1 Generalize the Idea of an Eigendecomposition

The Eigendecomposition of a matrix $A$ has shortcomings: 
* A square matrix $A$ may or may not have a **complete eigenvector basis**
* There is no eigendecomposition for **matrices that are not square.**

> The case $A$ of size $M \times N$ with $M \ne N$ shows a restiction<br> we had imposed to find the eigendecomposition:<br>
$\quad\quad$ **we used the same basis vectors** $s_1, s_2, \dots s_n$ in both the domain and the codomain of $y = A x$.


<div style="float:left;width:15cm;border:1px solid black;">

$\;$**Idea:** use different bases for $y = A x.$
$$\left. \begin{align}
x &= V \tilde{x} \\
    y &= U \tilde{y}
\end{align}\right\}\quad \Rightarrow \quad U \tilde{y} = A V \tilde{x}
\quad \Rightarrow \quad \tilde{y} = U^{-1} A V \tilde{x} = \Sigma \tilde{x},
$$

$\quad\quad$ where we have set $\Sigma = U^{-1} A V \Leftrightarrow A = U \Sigma V^{-1}.$
</div>
<div style="float:right;border:1px solid black;width:9cm;height:3.1cm;">
$\;$ Better yet, let us try for orthonormal bases:

$$
U^{-1} = U^t, \; \text{ and }\; V^{-1} = V^t.
$$

$\;$**Remark: the matrix sizes are**<br>
$\quad\quad A_{M\times N}, \Sigma_{M\times N}, V_{N \times N}, U_{M \times M}$. 
</div>

> What should $\Sigma$ look like? We would like a diagonal matrix, but $\Sigma$ is not square in general.
Let us try for
$$
\Sigma = \begin{pmatrix} \Sigma_r & 0 \\ 0 & 0 \end{pmatrix},
$$
where $\Sigma_r$ is a square diagonal matrix of size $r \times r$ with $r$ non-zero entries on the diagonal,<br>
and zero entries to fill out the remaining entries in a matrix of size $M \times N$.

> **Examples:**
>
> $\quad\quad
\left( \begin{array}{cc|c} \color{red}5 & \color{red}0 & 0 \\ \color{red}0 & \color{red}1 & 0 \\ \hline 0 & 0 & 0 \end{array}\right),\quad
\left( \begin{array}{cc|c} \color{red}5 & \color{red}0 & 0 \\ \color{red}0 & \color{red}1 & 0\end{array}\right), \quad
\left( \begin{array}{cc} \color{red}5 & \color{red}0 \\ \color{red}0 & \color{red}1 \\ \hline 0 & 0 \end{array}\right),\quad
$
where the $\color{red}{\Sigma_r}$ entries are shown in red.

## 1.2 Is this Feasible?

<div style="float:left;with:15cm;">

Consider $A = U \Sigma V^t.$<br>
We can manipulate this equation in various ways:

$$\begin{align}
A = U \Sigma V^t & \quad \Leftrightarrow \quad &  A V & = \Sigma U \label{eqn1}\tag{1} \\
A = U \Sigma V^t & \quad \Rightarrow \quad  & A^t A & = V \Sigma^t \Sigma V^t \label{eqn2}\tag{2} \\
A = U \Sigma V^t & \quad \Rightarrow \quad  & A A^t & = U \Sigma \Sigma^t U^t \label{eqn3}\tag{3} \\
\end{align}
$$
</div>

**Remarks:**
* $\Sigma^t \Sigma = \begin{pmatrix} \Sigma_r^2 & 0 \\ 0 & 0 \end{pmatrix}$ is a diagonal matrix of size $N \times N.$
* $\Sigma \Sigma^t = \begin{pmatrix} \Sigma_r^2 & 0 \\ 0 & 0 \end{pmatrix}$ is a diagonal matrix of size $M \times M.$

* Eqn 2:  $\;A^tA = V ( \Sigma^t \Sigma) V^t \quad$ is an orthogonal eigendecomposition  of the symmetric matrix $A^t A.$<br>
$\quad\quad$  An orthogonal matrix $V$ and a diagonal matrix $\Sigma^t \Sigma$ do exist
* Eqn 3:  $\;AA^t = U ( \Sigma \Sigma^t) U^t \quad$ is an orthogonal eigendecomposition  of the symmetric matrix $A A^t.$<br>
$\quad\quad$  An orthogonal matrix $U$ and a diagonal matrix $\Sigma \Sigma^t$ do exist

> What is not clear is whether the diagonal matrices $\Sigma^t \Sigma$ and $\Sigma \Sigma^t$  are related:
> * **do they share the same non-zero entries $\Sigma_r^2?$**

* Eqn 1: $\; A U = V \Sigma \quad$ further posits a relationship between $U$ and $V.$
> **Is this relationship satisfied?**

# 2. The Gram Matrix $A^t A$

Some reminders for matrices $A$ of size $M \times N$

##### **Rank and the Dimension of the Null Spaces**

<div style="float:left;width:12cm;height:4cm;border:1px solid black;">

$\;\;$ We have previously seen that $\mathscr{N}(A^t A)\ =\ \mathscr{N}(A).$<br>
$\;\;$ Therefore
* $dim\ \mathscr{N}(A^t A)\ =\ dim\ \mathscr{N}(A) = N -rank(A)$
* $rank\ (A) = rank\ (A^t A)$<br>since both matrices have the same number of columns $N$.
</div>
<div style="float:right;width:12cm;height:4cm;border:1px solid black;">

$\;\;$ Similarly, $\mathscr{N}(A A^t)\ =\ \mathscr{N}(A^t).$<br>
$\;\;$ Therefore
* $dim\ \mathscr{N}(A A^t)\ =\ dim\mathscr{N}(A^t)\ =\ M - rank\ (A^t)$
* $rank\ (A^t)\ =\ rank\ (A A^t)$<br>since both matrices have the same number of columns $M$.
</div>

##### **Dimension of the Eigenspaces for $\lambda = 0$**

<div style="float:left;width:12cm;height:4.5cm;border:1px solid black;">
$\;\;$ We also know that $rank(A) = rank(A^t)\;$<br>
$\;\;$ and that non-zero nullspace vectors are eigenvectors for $\lambda =0$.

* $A, A^t, A^t A$ and $A A^t$ all have the same rank.
* Eigenspace $E_0$ of $A^t A$ has $dim\ \mathscr{N}(A^t A) = N - rank(A)$
* Eigenspace $E_0$ of $A A^t$ has $dim\ \mathscr{N}(A^t A) = M - rank(A)$
* $A^t A$ and $A A^t$ have the same number<br>$rank(A)$ non-zero eigenvalues
</div>
<img src="SVD_ranks.svg" width=400 style="float:right;">

##### **Relationship of Eigenpairs for $A^t A$ and $A A^t$**

<div style="float:left;width:13cm;height:4.5cm;border:1px solid black;">

$\;\;$ Let $(\lambda, x)$ be an eigenpair of $A^t A$. Observe
$$
(A^t A)\ x = \lambda x \Rightarrow (A A^t) (A x) = \lambda (A x)
$$

$\;\;\therefore$ If $A x \ne 0,$ it is an eigenvector of $A A^t.$ Is it?<br><br>

$\;\;$ **$(\lambda \ne 0, x)$ is an eigenpair of $A^t A \Rightarrow (\lambda, A x)$ is an eigenpair of $A A^t.$**<br>
$\;\;$ **$(\lambda \ne 0, x)$ is an eigenpair of $A A^t \Rightarrow (\lambda, A^t x)$ is an eigenpair of $A^t A.$**
</div>
<div style="float:right;width:11cm;height:4.5cm;border:1px solid black;">

$\begin{align}
\left( \lVert A x \rVert^2 \right)&  = \left( (A x) \cdot (A x) \right)\\
& = (A x)^t (A x) \\
& = x^t A^t A x \\
& = \lambda x^t x \\
& = \lambda \left( \lVert x \rVert^2 \right) \ne 0.
\end{align}$

$\;\;$ Since $x$ is an eigenvector $\lVert x \rVert \ne 0 . \quad$
$\therefore \color{red}{A x \ne 0 \;\text{ iff } \lambda \ne 0}.$
</div>

##### **So $A^t A$ and $A A^t$ Have the Same Non-zero Eigenvalues and Corresponding Eigenvectors**

* For the eigenspaces for eigenvalue $\lambda = 0$, we have
    * for $A^t A$, we have $dim\ E_0 = N - rank(A)$
    * for $A A^t$, we have $dim\ E_0 = M - rank(A)$
* Both matrices have the **same dimension for the eigenspaces $dim\ E_\lambda$ for $\lambda \ne 0$**

* **Both $A^t A$ and $A A^t$ share the diagonal matrix $\Sigma_r^2.$**
* **The size of this matrix $r = rank(A).$**

* The eigenvectors of $A^t A$ for the non-zero eigenvalues are **the first $r$ vectors in $V:$**<br>
* The eigenvectors of $A A^t$ for the non-zero eigenvalues are **the first $r$ vectors in $U$.**

* Since the remaining vectors are bases for the null spaces of $A^t A$ and $A A^t$ respectively,<br>
these first $r$ vectors are **bases for the respective row spaces.**

##### **What About Orthogonality?**

Let $(\lambda_1, x_1)$ and $(\lambda_2, x_2)$ be eigenpairs of $A^t A$.<br>
$\quad\quad$ If $x_1 \perp x_2$, we find
$\quad
(A x_1) \cdot (A x_2) = \lambda_1 \lambda_2\; x_1 \cdot x_2 = 0.
$

$\quad\quad$ Since $A x_1$,$A x_2$ are eigenvectors of $A A^t$ provided $\lambda_1 \lambda_2 \ne 0,$ we find

* **Given eigenpairs $(\lambda_1 \ne 0, x_1)$ and $(\lambda_2 \ne 0, x_2)$ of $A^t A$,
  then $(\lambda_1, A x_1, \lambda_2 A x_2)$ are eigenpairs of $A A^t.$<br>
  $\quad\quad$ Further, $x_1 \perp x_2 \Rightarrow A x_1 \perp A x_2.$**
* **Given eigenpairs $(\lambda_1 \ne 0, x_1)$ and $(\lambda_2 \ne 0, x_2)$ of $A A^t$,
  then $(\lambda_1, A^t x_1, \lambda_2 A^t x_2)$ are eigenpairs of $A A^t.$<br>
  $\quad\quad$ Further, $x_1 \perp x_2 \Rightarrow A^t x_1 \perp A^t x_2.$**

##### **The Reduced Decomposition (Compact Decomposition)**

Let's look at the decomposition again, where we partition the $U$ and $V$ matrices to separate out the first $r$ vectors:
$\quad\quad U = \begin{pmatrix} U_r & \tilde{U}_r \end{pmatrix}, \quad V = \begin{pmatrix} V_r & \tilde{V}_r \end{pmatrix}.$<br>

$
\quad\quad A = \begin{pmatrix} U_r & \tilde{U}_r \end{pmatrix}
    \begin{pmatrix} \Sigma_r & 0 \\ 0 & 0 \end{pmatrix}
    \begin{pmatrix} V_r & \tilde{V}_r \end{pmatrix}^t
  = \color{red}{U_r\ \Sigma_r\ V_r^t}.
$

**The null space basis vectors have no effect on this decomposition!**

> As long as we use a basis for the null space of $A^t A$ for $\tilde{V}_r$<br>
and a basis for the null space of $A A^t$ for $\tilde{U}_r$,<br>
we have established that **a decomposition $A V = U \Sigma$ exists!**

But what about orthogonality?

##### **Non-negative Eigenvalues, Orthogonal Eigenvectors**

We need $\Sigma_r$, not $\Sigma_r^2$. Let's look again at the eigenvalues of $A^t A.$

Given an eigenpair $(\lambda, x)$ of $A^t A:$

$$
\begin{align}
A^t A x = \lambda x
& \Rightarrow x^t A^t A x = \lambda x^t x \\
& \Rightarrow \lVert A x \rVert^2 = \lambda \lVert x \rVert^2
& \Rightarrow \lambda \ge 0\quad\text{ since } x \ne 0.
\end{align}$$

$\quad\quad$ **The matrices $A^t A$ and $A A^t$ are positive semidefinite.**

> $\quad\quad \Sigma_r =\begin{pmatrix} \sigma_1  & 0         & \dots & 0 \\
                           0         & \sigma_2  & \dots & 0 \\
                           \dots     & \ldots    & \dots & \dots \\
    0         & 0         & \dots & \sigma_r \end{pmatrix},$
>
> $\quad\quad$ where $\sigma_i\ =\ \sqrt{ \lambda_i },\ i=1,2, \dots r\quad$ <span>are the square roots<br>$\quad\quad$  of the nonzero (and hence positive) eigenvalues of $A^t A.$</span>

> $\quad\quad$ A small refinement: since the $\sigma_i$ are positive reals,<br>
$\quad\quad$ we will **order them by decreasing magnitude:**$\quad\quad
\sigma_1 \ge \sigma_2 \dots \ge \sigma_r.
$

##### **Obtain $U_r$ from $V_r$**

The SVD decomposition requires unit column vectors in $U$ and $V$.

Consider an eigenpair of $(\lambda = \sigma^2 \ne 0, v)$ for $A^t A$ <br>
with the eigenvector scaled to be a unit vector:
$\quad \lVert x \rVert = 1$.

Observe $\quad$
$x^t A^t A x = \sigma^2 x^t x \Rightarrow \lVert A x \rVert^2 = \sigma^2$.

The corresponding unit eigenvector for $A A^t$ is therefore given by $$u = \frac{1}{\sigma} A x.$$
Combining all the eigenvectors into a matrix as columns, we see that
$$
\color{red}{U_r = A V_r \Sigma_R^{-1} \Leftrightarrow A V_r = U_r \Sigma_r}.
$$

# 3. The Singular Value Decomposition

## 3.1 SVD Existence Theorem

<div style="background-color:#F2F5A9;">

**Definition:** Given a matrix $A \in \mathbb{R}^{M \times N}$.<br>
    $\quad\quad$ $A = U \Sigma V^t$ is a **singular value decomposition** of $A$ iff<br>
    $\quad\quad$ $U$ and $V$ are orthogonal matrices, and $\Sigma = \begin{pmatrix} \Sigma_r & 0 \\ 0 & 0 \end{pmatrix}$,<br>
    $\quad\quad$ where $\Sigma_r$ is a diagonal matrix of size $r\times r$, with diagonal entries $\sigma_1 \ge \sigma_2 \dots \ge \sigma_r >0.$
    
$\quad\quad$ The $\sigma_i$ are **singular values** of $A.$<br>
$\quad\quad$ The columns of $V$ are **right singular vectors** of $A.$<br>
$\quad\quad$ The columns of $U$ are **left singular vectors** of $A.$
    
</div>

<div style="background-color:#F2F5A9;">

**Theorem:** Every matrix $A \in \mathbb{R}^{M \times N}$ has a **singular value decomposition** such that<br>
    $\quad\quad$ $A = U \Sigma V^t  = U_r \Sigma_r V_r^t$
<div style="background-color:#F2F5A9;margin:30px;">

* $\Sigma = \begin{pmatrix} \Sigma_r & 0 \\ 0 & 0 \end{pmatrix}$,
    where $\Sigma_r$ is a diagonal matrix of size $r\times r$, with non-negative diagonal entries $\sigma_1 \ge \sigma_2 \dots \ge \sigma_r >0.$
* $r = rank(A)$
* $V = \begin{pmatrix} V_r & \tilde{V}_r \end{pmatrix}$
    * the $r$ columns of $V_r$ form an orthonormal basis for $\mathscr{C}(A)$
    * the $N-r$ columns of $\tilde{V}_r$ form an orthonormal basis for $\mathscr{N}(A)$
* $U = \begin{pmatrix} U_r \tilde{U}_r \end{pmatrix}$
    * the $r$ columns of $V_r$ form an orthonormal basis for $\mathscr{R}(A)$
    * the $M-r$ columns of $\tilde{V}_r$ form an orthonormal basis for $\mathscr{N}(A^t)$
<br><br>
</div></div>

## 3.2 SVD Computation

The derivation shows one way of computing the SVD:
* Start with either $A^t A$ (size $N \times N$, or $A A^t$ (size $M \times M$).<br>
  We typically choose the smaller matrix.<br>

<div style="margin:30px;border:1px solid black;">

* **Compute the orthogonal eigendecomposition of $A^t A.$**<br>
  $\;\;$ this results in $\Sigma_r$, $V_r$<br>
  $\;\;$ and (optionally) an orthogonal basis for $\mathscr{N}(A)$.
  * If the full SVD is required, <br>
    we need to compute an orthogonal basis for $\mathscr{N}(A),$<br>
    and obtain $\Sigma$ by augmenting $\Sigma_r$ with zeros to the same size as $A.$
* **Compute $U_r = A V_r \Sigma_r^{-1} \Leftrightarrow u_i = \frac{1}{\sigma_i} A v_i$**
* If the full SVD is required, **compute an orthogonal basis** $\tilde{v}_i, i=1,2, \dots M-r$<br>
    $\;\;$ for $\mathscr{N}(A^t) = \mathscr{N}(A A^t) = span\{u_1, u_2, \dots u_r \}^\perp.$
</div>

#### **Example**

Let $A = \begin{pmatrix}  -3 & -1 & -1 \\
-3 & -1 & -1 \\
1 & 3 & -1 \\
-1 & -3 & 1
 \end{pmatrix}$

##### **Step 1: Orthonormal Eigendecomposition of $A^t A$**

**Eigenvalues:**

> $A^t A = \begin{pmatrix} 20 & 12 & 4 \\
12 & 20 & -4 \\
4 & -4 & 4
\end{pmatrix}$ has characteristic polynomial
$p(\lambda) = - \lambda ( \lambda^2 + 44 \lambda -384 )$
>
> $\therefore \lambda  = 32, 12,0$.



**Eigenvector Basis**

> Bases for the null spaces $\mathscr{N}(A- \lambda I)$ are shown in the table below.<br>
**Caveat:** the eigenvalues must be entered in decreasing order

<div style="float:left;margin:30px;width:40%;">
<table border="1" cellpadding="0" cellspacing="0" style="border-collapse: collapse" width="300px">
<tr>
    <td height="19" width="100px">$\color{blue}{\sigma=\sqrt{\lambda}}$</td>
    <td height="19" width="100px">$4 \sqrt{2}$</td>
    <td height="19" width="100px">$2 \sqrt{3}$</td>
    <td height="19" width="100px">$\quad\quad 0$</td>
</tr>
<tr>
    <td height="19" width="100px">$\color{blue}\lambda$</td>
    <td height="19" width="100px">$32$</td>
    <td height="19" width="100px">$12$</td>
    <td height="19" width="100px">$\quad\quad 0$</td>
</tr>
<tr>
    <td height="16" width="100px" ><span  style="color:blue;">(m)</span></td>
    <td height="16" width="100px"><span  style="justify:right;">$\quad$ (1)</span></td>
    <td height="16" width="100px">$\quad$ (1)</td>
    <td height="16" width="100px">$\quad\;$ (1)</td>
</tr>
<tr>
    <td height="19" width="100px"><span  style="color:blue;">Basis for $E_\lambda$</span></td>
    <td height="19" width="100px">$\;\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}\;$</td>
    <td height="19" width="100px">$\;\begin{pmatrix} -1 \\ 1 \\ -1 \end{pmatrix}\;$</td>
    <td height="19" width="100px">$\quad\;\begin{pmatrix} -1 \\ 1 \\ 2 \end{pmatrix}\;$</td>
</tr>
<tr>
    <td height="19" width="100px"><span  style="color:blue;">Orthonormal Basis for $E_\lambda$</span></td>
    <td height="19" width="100px">$\;\begin{pmatrix} \frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} \\ 0 \end{pmatrix}\;$</td>
    <td height="19" width="100px">$\;\begin{pmatrix} -\frac{\sqrt{3}}{3} \\ \frac{\sqrt{3}}{3} \\ -\frac{\sqrt{3}}{3} \end{pmatrix}\;$</td>
    <td height="19" width="100px">$\quad\;\begin{pmatrix} -\frac{\sqrt{6}}{6} \\ \frac{\sqrt{6}}{3} \\ 0 \end{pmatrix}\;$</td>
</tr>
</table>
</div><div style="float:right;margin:30px;width:40%;">
Therefore $\quad \color{red}{rank(A) = 2}$

$V = \frac{1}{6} \left( \begin{array}{cc|c} 3 \sqrt{2} &  2 \sqrt{3} &\sqrt{6}\\
                                            3 \sqrt{2} & -2 \sqrt{3} & -\sqrt{6} \\
                                            0          & 2 \sqrt{3} & -2 \sqrt{6} \end{array} \right)$

$\Sigma = \left(  \begin{array}{cc|c} \color{red}{4 \sqrt{2}} & 0 & 0\\
                                      0 & \color{red}{2 \sqrt{3}} & 0\\ \hline
                                      0 & 0 & 0 \\
                                      0 & 0 & 0
\end{array}\right)$
</div>

##### **Step 2: $U_r$**

$U_r = A V_r \Sigma_r^{-1} = \frac{1}{2}\begin{pmatrix} -1 & -1 \\
-1 & -1 \\
1 & -1 \\
-1 & 1
\end{pmatrix}$

##### **Verify the Compact SVD**


$$A = U_r \Sigma_r V_r^t = \frac{1}{2}\begin{pmatrix} -1 & -1 \\
-1 & -1 \\
1 & -1 \\
-1 & 1
    \end{pmatrix}\; \begin{pmatrix}
4 \, \sqrt{2} & 0 \\
0 & 2 \, \sqrt{3}
    \end{pmatrix}\; \frac{1}{6} \begin{pmatrix}
3 \, \sqrt{2} & 2 \, \sqrt{3} \\
3 \, \sqrt{2} & -2 \, \sqrt{3} \\
0 & 2 \, \sqrt{3}
\end{pmatrix} \quad = \begin{pmatrix}
-3 & -1 & -1 \\
-3 & -1 & -1 \\
1 & 3 & -1 \\
-1 & -3 & 1
\end{pmatrix}
$$

##### **Step 3: Obtain $\tilde{U}_r$ and Complete $U$ for the Full SVD**

A basis for $\mathscr{N}(A^t)$ is given by $\left\{\; 
 \begin{pmatrix}-1 \\ 1 \\ 0 \\ 0 \end{pmatrix},\
 \begin{pmatrix}-1 \\ 1 \\ 1 \\ 1 \end{pmatrix} 
\;\right\}$; using QR, an orthonormal basis is $\left\{\; 
 \begin{pmatrix}-\frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} \\ 0 \\ 0 \end{pmatrix},\
 \begin{pmatrix}0 \\ 0 \\ \frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} \end{pmatrix} 
\;\right\}$

Finally $U = \left( U_r \; \tilde{U}_r \right)$:

$$U = \frac{1}{2}\begin{pmatrix}
-1 & -1 & -\sqrt{2} & 0 \\
-1 & -1 &  \sqrt{2} & 0 \\
 1 & -1 & 0 &  \sqrt{2}\\
-1 &  1 & 0 &  \sqrt{2}
\end{pmatrix}
$$

# Stuff

BTW, I like
https://towardsdatascience.com/svd-8c2f72e264f