In [1]:
import sympy as sp

<hr style="border-width:4px; border-color:coral"></hr>

# Singular Value Decomposition (SVD)

<hr style="border-width:4px; border-color:coral"></hr>

Central to numerical linear algebra is the "singular value decomposition".  This is one of the most general of the matrix decompositions, in that it can be applied to any matrix (singluar, non-singular, square, or non-square).  

But before continuing, we do a brief review of the eigenvalue decomposition to see where it can fail to be completely general. 

## Review : Eigenvalue decomposition

<hr style="border-width:3px; border-color:black"></hr>

Eigenvalue/eigenvector pairs $(\lambda, \mathbf v)$ of a square matrix $A \in \mathbb R^{m \times m}$ satisfy

\begin{equation}
A\mathbf v = \lambda \mathbf v
\end{equation}

where $\lambda \in \mathbb C$, and $\mathbf v \in \mathbb C^m$.  Because eigenvalues are the roots of the *characteristic polynomial* $p(\lambda) = \det(A - \lambda I)$, we will always have $m$ eigenvalues. Eigenvalues can have multiplicities greater than 1.  
 

The **eigenvectors** associated  with each distinct eigenvalue satisfy

\begin{equation}
(A - \lambda I)\mathbf v = 0.
\end{equation}

The subspace spanned by the vectors in the nullspace of $A - \lambda I$ form the *eigenspace* associated with $\lambda$. 

#### Algebraic multipicity and geometric multiplicity

* The *algebraic multiplicity* of an eigenvalue $\lambda$ refers to multiplicity of $\lambda$ as a root of the characteristic polynomial $p(\lambda)$. 


* The *geometric multiplicity* of an eigenvalue $\lambda$ is the dimension of the eigenspace associated with $\lambda$. 


In general, the geometric multiplcity is less than or equal to the algebraic multiplicity.

#### Defective eigenvalues

An eigenvalue of said to be *defective* if its geometric multiplicity is less than its algebraic multiplicity.  A matrix with a defective eigenvalue is a *defective matrix*. 




<hr style="border-width:2px; border-color:black"></hr>

#### Example : Defective matrix

The matrix

\begin{equation}
\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}
\end{equation}

is defective. We can see immediately that it has the single eigenvalue $\lambda=0$ with algebraic multiplicity 2.  But the nullspace of $A - \lambda I = A$ has only the single vector $\mathbf v = (1,0)^T$, so the geometric multiplicity of $\lambda$ is 1. 
<hr style="border-width:2px; border-color:black"></hr>


#### Eigenvectors associated with distinct eigenvalues

Eigenvectors associated with distinct eigenvalues are *linearly independent*. 

#### Proof

Prove for two eigenvectors associated with distinct eigenvalues.

Start : Assume $A \mathbf x =\lambda \mathbf x$, and $A \mathbf y =\mu \mathbf y$, where $\lambda \ne \mu$.  Assume $\mathbf x = \beta \mathbf y$ for some $\beta \ne 0$.  We have 
$A\mathbf x = A (\beta \mathbf y) = \beta A\mathbf y = \beta \mu \mathbf y$.  But we also have 
$A \mathbf x = \lambda \mathbf x = \beta \lambda \mathbf y$.  Combining these two equalities, we have $\beta \lambda \mathbf y =
\beta \mu \mathbf y$, from which we conclude that $\lambda = \mu$, which is a contradiction. So our assumption that $\mathbf x = \beta \mathbf y$ is wrong, and so $\mathbf x$ and $\mathbf y$ are linearly independent. 

**Note:** The contrapositive of this statement is not true!  

**Homework problem:** prove that eigenvectors associated with distinct eigenvalues of a hermitian matrix are orthogonal. 

### Diagonalizable matrix

A matrix $A \in \mathbb C^{m \times m}$ is *diagonalizable* if there exists a matrix $R \in \mathbb C^{m \times m}$ so that 

\begin{equation}
R^{-1}AR = \Lambda = \mbox{diag}\left(\lambda_1, \lambda_2, \dots, \lambda_m\right)
\end{equation}

A matrix $A \in \mathbb C^{m \times m}$ is diagonalizable if and only if $A$ has no defective eigenvalues.  (Golub and Van Loan, Corollary 7.1.8).  

<hr style="border-width:2px; border-color:black"></hr>

#### Example : Distinct eigenvalues

An $m \times m$ matrix with $m$ distinct eigenvalues is diagonalizable.  

The contrapostive does not follow! (Think of the identity matrix).  $ A  = I$; $A = I^{-1} A I$ (:-))
<hr style="border-width:2px; border-color:black"></hr>

### Eigen-decomposition

A diagonalizable matrix $A \in \mathbb C^{m \times m}$ can be *decomposed* as 

\begin{equation}
A = R \Lambda R^{-1}
\end{equation}

where the columns of $R$ are the eigenvectors of $A$. 



<hr style="border-width:2px; border-color:black"></hr>

#### Example : Spectral Theorem


All real symmetric matrices $A \in \mathbb R^{m \times m}$ can be decomposed as 

\begin{equation}
A = Q^T \Lambda Q
\end{equation}

where both $Q$ and $\Lambda$ are real matrices in $R^{m \times m}$.  The matrix $Q$ is a orthogonal matrix whose columns are the normalized eigenvectors of $A$, and $\Lambda$ is a diagonal matrix whose diagonal entries are the eigenvalues of $A$.

The Spectral Thoeorem states that not only is a symmetric matrix diagonalizable, it can be *unitarily* diagonalized. Its eigenvectors can be chosen to form an orthonormal set.   

#### Proof

It is a homework exercise to show the eigenvalues of a symmetric matrix are real and that eigenvectors associated with *distinct* eigenvalues of are orthogonal.  It is more challenging to prove that there are no defective eigenvalues and that the eigenvectors associated with the same eigenvalue are orthogonal.  

**Outline of the proof:**  The proof relies on the *Real Schur Decomposition*.  Every square matrix can be written in Real Schur form as $A = Q T Q^T$, where $T$ is a block triangular matrix with $1 \times 1$ or $2 \times 2$ blocks and $Q$ is an orthogonal matrix.  Since $A$ is symmetric, $T$ can be written as the *direct sum* of $1 \times 1$ or symmetric $2 \times 2$ blocks, with $2 \times 2$ blocks corresponding to complex eigenvalues of $A$. But since a symmetric $2 \times 2$ block has real eigenvalues, $T$ has only $1 \times 1$ blocks, and so $A$ can be decomposed as 

\begin{equation}
QAQ^T = \mbox{diag}(\lambda_1, \lambda_2, \dots \lambda_m)
\end{equation}

For a complete proof, see Golub and Van Loan, 7.4.1 (Real Schur Decomposition) and 8.1.1 (Eigenvalues of symmetric matrices). 

**Note:** An *orthogonal* matrix $Q$ is a real matrix satisfying $Q^T Q = I$.   The term *unitary* is reserved for matrices with general  complex entries. 

**Note:** A *direct sum* $A \bigoplus B \bigoplus + \dots$  is a block diagonal matrix whose diagonal entries are the matrices $A$,  $B$, $\dots$. See [direct sum of matrices](https://en.wikipedia.org/wiki/Direct_sum#Direct_sum_of_matrices).

<hr style="border-width:2px; border-color:black"></hr>

<hr style="border-width:4px; border-color:coral"></hr>

## Singular Value Decomposition

<hr style="border-width:4px; border-color:coral"></hr>

From the above, it is clear that not every matrix can be decomposed as $R\Lambda R^{-1}$.  First, we have restricted the eigen-decompostion to only square matrices.  Second, the eigen-decomposition is only available for diagonalizable, square matrices.  And third, the eigenvectors are in general not orthogonal. 

A more general matrix decomposition that solves the above is the *Singular Value Decompostion*. 


### Singular values

<hr style="border-width:2px; border-color:black"></hr>

The *singular values* of a matrix $A \in \mathbb C^{m \times n}$ with rank $r$ are values $\sigma_i$ satisfying

\begin{equation}
A\mathbf v_i = \sigma_i \mathbf u_i, \qquad i = 1,2,\dots r
\end{equation}

where $\mathbf u_i$ and $\mathbf v_i$ are left and right singular vectors, respectively. 

A few key properties of the singular values/vectors : 

* The singular values $\sigma \in \mathbb R$ are non-negative, 

* The right singular vectors $\mathbf v \in \mathbf C^{n}$ are in the *row space* of $A$, and 

* The left singular vectors $\mathbf u \in \mathbf C^{m}$ are in the *column space* of $A$.   

The singular values need not be distinct.  

By convention, singular values are numbered in descending order so that $\sigma_1 \ge \sigma_2 \ge 2 \dots \sigma_r > 0$, with $\sigma_{r+1} = \sigma_{r+2} = \dots = \sigma_{n} = 0$.

### Singular value decomposition

<hr style="border-width:2px; border-color:black"></hr>

If we assume that the singular vectors are normalized, we can collect the singular values and singular vectors into semi-unitary and diagonal matrices $U$, $V$ and $\Sigma$. Then $A$ can be written as 

\begin{equation}
A = U\Sigma V^*
\end{equation}

where $U \in \mathbb C^{m \times r}$ and $V \in \mathbf C^{n \times r}$ have orthonormal columns, and $\Sigma \in \mathbb R^{r \times r}$ is a diagonal matrix.  

The above decompostion is referred to as a "reduced" decomposition, since we might have $r < \min(m,n)$.  



<hr style="border-width:2px; border-color:black"></hr>

#### Example : SVD of an outer product matrix


What is the singular value decomposition of the matrix formed as an outer product?  

#### Solution

Suppose that $A = \mathbf u \mathbf v^*$, where $\mathbf u \in \mathbb C^m$ and $\mathbf v \in \mathbb C^n$. In this case, $A$ is rank deficient, and we have only one non-zero singular value.  The decomposition is 

\begin{equation}
U = \frac{\mathbf u}{\Vert \mathbf u\Vert},  \qquad V = \frac{\mathbf v}{\Vert \mathbf v\Vert}, \qquad \Sigma = \left[\sigma_1\right].
\end{equation}

where $\sigma_1 = \Vert \mathbf u\Vert \Vert \mathbf v \Vert$,  and the decomposition looks like 

<br><br>

<center>
<img width=800px src="./images/svd_01.png"></img>
</center>   

<hr style="border-width:2px; border-color:black"></hr>

<hr style="border-width:2px; border-color:black"></hr>

#### Example : SVD as sum of outer products

The general SVD can be written as the sum of outer products  as

\begin{equation}
A = \sum_{i=1}^r \sigma_i \mathbf u_i \mathbf v_i^*
\end{equation}

<br><br>

<center>
<img width=900px src="./images/svd_02.png"></img>
</center>   

This particular view of the SVD is what makes it a powerful tool.  If we have only a few singular values that are large, and the remaining are negligible, or zero, we can "compress" $A$ by representing it using only a few of the 
singular value/vectors.  

\begin{equation}
A \approx \sum_{i=1}^s \sigma_i \mathbf u_i \mathbf v_i^*, \qquad s \ll r
\end{equation}

This is a key idea behind image compression.  Suppose the entres of $A$ represent pixel values in $[0,1]$ in a color channel (e.g. red, green or blue) in a $1024 \times 1024$ image.   For each channel, the full image would require $1024^2$ bytes (or about 1MB of storage).  But if we can store the same image with only 10 singular vectors/values, we can get an image compression ratio of $(2 \times 10 \times 1024 + 10)/1024^2 \approx 0.01$, or a compression rate of about 1\% for each color channel.

<hr style="border-width:2px; border-color:black"></hr>

<hr style="border-width:2px; border-color:black"></hr>

### Example : Connection to eigenvalues

Suppose we have a diagonalizable matrix $A$.  Is there a connection between the eigenvalues/vectors and the singular values/vectors of $A$?

#### Solution

Given the SVD of a matrix $A$, we can write

\begin{equation}
A^*A = (U \Sigma V^*)^*(U \Sigma V^*) = (V \Sigma^* U^*)(U \Sigma V^*) = V (\Sigma^* \Sigma)V^*
\end{equation}

and 

\begin{equation}
AA^* = (U \Sigma V^*)(U \Sigma V^*)^* = (U \Sigma V^*)(V \Sigma^* U^*) = U (\Sigma \Sigma^*)U^*
\end{equation}

The right singular vectors $V$ are the eigenvctors of $A^*A$ and the left singular vectors $U$ are the eigenvectors of $AA^*$.    The singular values are the square roots of the eigenvalues of $A^*A$ or $AA^*$.  

If $A^*A = AA^*$, .e.g. $A$ is a normal matrix, then $A$ is unitarily diagonalizable as $A = QDQ^*$. If $A$ is also positive definite, than the eigenvalue decomposition and the singular value decomposition are the same.

<hr style="border-width:2px; border-color:black"></hr>


<hr style="border-width:2px; border-color:black"></hr>

### Proof of the existence of the SVD

See Theorem 4.1 (Lecture 4, TB, page 29).  The proof is by construction. 

* The singular values are unique

* The singular vectors are unique up to complex constants of magnitude 1, e.g. $z = e^{i\theta}$.  

<hr style="border-width:2px; border-color:black"></hr>

### Computation (by hand) of an SVD

To compute the singular values/vectors of a matrix $A \in R^{\mathbb m \times n}$ : 

1. Find the eigenvectors and eigenvalues of $A^*A$.  The eigenvectors are the right singular vectors $\mathbf v$ of $A$, and the eigenvalues are the square of the singular values, e.g. $\sigma^2 = \lambda$.   

2.  For $\sigma \ne 0$, compute left singular vectors $\mathbf u$ from $\mathbf u = A\mathbf v/\sigma$. 

The above will can be used to construct a "reduced" SVD.  To construct a full SVD, we would also need to complete $U$ and $V$ with additional vectors from the complement of the spaces spanned by the singular vectors found above. 

#### Example : Compute the SVD 

The Python function below computes a reduced SVD of an input matrix $A$. 

In [2]:
def SVD(A):

    AtA = A.transpose()*A
    S2 = AtA.eigenvects()  # Tuples : (eval, multiplicity, Matrix of evecs)
    print("AtA.eigenvects() = ")
    display(S2)
    print("")

    m,n = A.shape

    U = sp.Matrix(m,0,[])
    V = sp.Matrix(n,0,[])
    r = A.rank()  # Probably uses the SVD ...
    S = sp.Matrix(r,r,[0]*r*r)
    k = 0
    for i,t in enumerate(S2):    
        # display(t)
        s = sp.sqrt(t[0])

        # Only store non-zero singular values
        if s > 0:            
            S[k,k] = s
            for v in t[2]:
                v = v/v.norm(2)
                V = sp.Matrix.hstack(V,v)

                u = A*v/s
                u = u/u.norm(2)
                U = sp.Matrix.hstack(U,u)
            k += 1
    return U,S,V

In [3]:
A = sp.Matrix(2,2,[2,2,-1,1])

# Problem 4.1 (a)
A = sp.Matrix(2,2,[3, 0, 0, -2])

# Problem 4.1 (c)
A = sp.Matrix(3,2,[0,2,0,0,0,0])

# Problem 4.1 (e)
A = sp.Matrix(4,3,[1]*12)

# More examples
u = sp.Matrix(3,1,[3]*3)
v = sp.Matrix(2,1,[-1]*2)
A = u*v.transpose()  # An outer product

# Examples using symbolic values. 
a,b = sp.symbols('a b')
A = sp.Matrix(3,3,[a,1,0,0,a,0,0,0,b])
A = A.subs({a : 0, b : 2})


display(A)

Matrix([
[0, 1, 0],
[0, 0, 0],
[0, 0, 2]])

In [4]:
U, S, V = SVD(A)
    
print("")
print("U = ")
display(U)
print("")

print("Sigma = ")
display(S)
print("")

print("V = ")
display(V)
print("")
print("U*S*V^T = ")
display(U*S*V.transpose())
    

AtA.eigenvects() = 


[(0,
  1,
  [Matrix([
   [1],
   [0],
   [0]])]),
 (1,
  1,
  [Matrix([
   [0],
   [1],
   [0]])]),
 (4,
  1,
  [Matrix([
   [0],
   [0],
   [1]])])]



U = 


Matrix([
[1, 0],
[0, 0],
[0, 1]])


Sigma = 


Matrix([
[1, 0],
[0, 2]])


V = 


Matrix([
[0, 0],
[1, 0],
[0, 1]])


U*S*V^T = 


Matrix([
[0, 1, 0],
[0, 0, 0],
[0, 0, 2]])