# Singular Value Decomposition

### Singular Value Decomposition

Recall from last time that a _singular value decomposition_ (SVD) of a matrix $M$ is a matrix factorization of the form:

$$M = U \Sigma V^*,$$

where both $U$ and $V$ are unitary, and $\Sigma$ is diagonal.

![SVD](https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Singular-Value-Decomposition.svg/512px-Singular-Value-Decomposition.svg.png)

In this lecture we want to investigate what the SVD tells us about linear transformations.

### The anatomy of the SVD

> **THEOREM.**  
- The columns of $U$ form an orthonormal basis of $N(M^*)\oplus C(M)$.
- The columns of $V$ form an orthonormal basis of $N(M)\oplus C(M^*)$.

**Proof.**

For any $\mathbf{b}\in \mathbb{C}^m$ and any $\mathbf{x}\in \mathbb{C}^m$, we write these in terms of the singular vector basis: 

$$\mathbf{b}^\prime = U^* \mathbf{b},\quad \mathbf{x}^\prime =V^* \mathbf{x}.$$

Observe:
$$\mathbf{b} = M \mathbf{x},\ \Longleftrightarrow\ U^*\mathbf{b}=U^*M\mathbf{x} = U^*U\Sigma V^* \mathbf{x} \ \Longleftrightarrow \ \mathbf{b}^\prime = \Sigma\mathbf{x}^\prime$$

This shows that $A$ reduces to a diagonal matrix when the range is expressed as the columns of $U$ and the domain is expressed as the columns of $V$.

### Rank-Nullity Theorem

> **COROLLARY.** 
- The rank of $M$ is the number of nonzero singular values (counted with multiplicity).
- The nullity of $M$ is the number of columns of $V$ minus $r_T$.

> **COROLLARY. [_Rank-Nullity_]** 
For $T: V\to W$, we have: $$r_T+n_T=\dim W.$$

**Proof.**

We can choose bases such that $T$ is representend by the diagonal matrix $\Sigma$.  By the previous Corollary:

$$r_T+n_T =  \text{# columns in $\Sigma$} = \dim W$$ 

![RankNullity](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/The_four_subspaces.svg/600px-The_four_subspaces.svg.png)

In [1]:
M = [1 2 3 4; 5 6 7 8; 9 10 11 12]
svd(M), rank(M), nullspace(M)

((
3x3 Array{Float64,2}:
 -0.206736  -0.889153   0.408248
 -0.518289  -0.254382  -0.816497
 -0.829842   0.38039    0.408248,

[25.436835633480243,1.7226122475210637,5.140375154629761e-16],
4x3 Array{Float64,2}:
 -0.403618   0.732866   0.445272 
 -0.464744   0.28985   -0.831432 
 -0.525871  -0.153167   0.327048 
 -0.586997  -0.596183   0.0591117),2,
4x2 Array{Float64,2}:
  0.445272    0.318956 
 -0.831432   -0.0933893
  0.327048   -0.770091 
  0.0591117   0.544523 )

### Norms and singular values

> **THEOREM.** $\|M\|_2=\sigma_1$ and $\|M\|_F = \sqrt{\sigma_1^2+\cdots+ \sigma_r^2}$.

> **Proof.** Since $M=U\Sigma V^*$, with $U, V$ unitary, we see:

\begin{align}
\|M\|_2 &= \|U\Sigma V^*\|_2  = \max_{1\leq i\leq r}\{|\sigma_i|\}=\sigma_1.\\
\|M\|_F &= \|U\Sigma V^*\|_F =  \|\Sigma\|_F = \sqrt{\sigma_1^2+\cdots +\sigma_r^2}.
\end{align}

In [2]:
M = [1 2 3 4; 5 6 7 8]
norm(M), vecnorm(M)

(14.22740741263374,14.2828568570857)

In [3]:
U,S,V=svd(M)
sqrt(S[1]^2+S[2]^2)

14.2828568570857

### Relationship between eigenvalues and singular values

> **THEOREM.** The nonzero singular values of $M$ are the square roots of the nonzero eigenvalues of $M^*M$ or $MM^*$.

**Proof.**

$$M^*M= (U\Sigma V^*)^* (U\Sigma V^*) = V\Sigma^* U^* U \Sigma V^* = V(\Sigma^* \Sigma)V^*.$$

Thus $M^*M$ is similar to a matrix with eigenvalues $\{\sigma_i^2\}$.

In [4]:
eigvals(M*M'), eigvals(M'*M)

([1.5808783149344627,202.41912168506553],[-3.579818844880902e-15,9.908090085244154e-15,1.5808783149344634,202.41912168506553])

In [5]:
svdvals(M)*svdvals(M)'

2x2 Array{Float64,2}:
 202.419   17.8885 
  17.8885   1.58088

> **THEOREM.** If $M=M^*$ is Hermitian, the the singular values of $M$ are the absolute values of the eigenvalues of $M$.

**Proof.**

Since $M$ is Hermitian, the Spectral Theorem applies:

$$M=Q\cdot \Lambda \cdot Q^* = Q\cdot |\Lambda|\cdot \left(\text{sign}(\Lambda)\cdot Q^*\right)$$

where $|\Lambda|$ denotes the diagonal matrix with $|\lambda_i|$ as its $i$-th entry, and $\text{sign}(\Lambda)$ is the diagonal matrix with $\text{sign}(\lambda_i)$ as its $i$-th entry.  Notice that if $Q$ is unitary so is $\text{sign}(\Lambda)\cdot Q^*$.

In [6]:
M = [1 0 1; 0 2 1; 1 1 0]
e, s = eigvals(M), svdvals(M)

([-0.879385241571817,1.347296355333861,2.532088886237955],[2.5320888862379562,1.3472963553338608,0.8793852415718167])

> **THEOREM.** For a square matrix $M$, $|\det(M)|=\prod_{i=1}^m \sigma_i$.

**Proof.**

$$|\det(M)| = |\det(U)|\cdot|\det(\Sigma)|\cdot |\det(V^*)| =|\det(\Sigma)|=\prod_{i=1}^m \sigma_i.$$

In [7]:
S = reshape(s, (3,1))
det(M), cumprod(S)[3]

(-3.0,3.0000000000000004)

### Low-Rank Approximations

> **THEOREM.**  $M$ is a sum of $r$ rank-one matrices: $$M=\sum_{j=1}^r \sigma_j \mathbf{u_j} \mathbf{v}_j^*.$$

> **THEOREM.** For any $v$ with $0\leq v \leq r$, define: 
$$M_v=  \sum_{j=1}^v \sigma_j \mathbf{u}_j\mathbf{v}_j^*;$$ 
if $v=p=\min{m,n}$, define $\sigma_{v+1}=0$.  Then: 
- $$\|M-M_v\|_2 = \inf_{\begin{array}{c}N\in \mathbb{C}^{m\times n} \\\mathrm{rank}(N)\leq v\end{array}} \|M-N\|_2 =\sigma_{v+1}.$$
- $$\|M-M_v\|_F = \inf_{\begin{array}{c}N\in \mathbb{C}^{m\times n} \\\mathrm{rank}(N)\leq v\end{array}} \|M-N\|_F =\sqrt{\sigma_{v+1}^2+\cdots + \sigma_r^2}.$$

### Low-Rank Approximations

> *Question:* What is the best approximation of a hyperellipsoid by a line segment?

> *Answer:* Take the line segment to be the longest axis.

> *Question:* What is the best approximation by a two-dimensional ellipsoid?

> *Answer:* Take the ellipsoid spanned by the longest and second-longest axes.

### Applications: Topic Modelling

Topic modeling aims to learn the thematic structure of a text corpus automatically. 

$$\begin{array}{ccccccccc|l}\text{singer} & \text{GDP} & \text{Trump} & \text{election} & \text{moron} &\text{stock}& \text{bass} & \text{market} & \text{band} & \text{Articles}\\\hline
6 & 1 & 1 & 0 & 0 & 1 & 9 & 0 & 9 & a\\ 1 & 0 & 9 & 5 & 8 & 1 &0&1&0 & b\\ 8&1&0&1&0&0&9&1&7& c\\0&7&1&0&0&9&1&7&0& d\\ 0&5&6&7&5&6&0&7&2 & e\\1&0&8&5&9&2&0&0&1& f\end{array}$$

In [8]:
M = [6 1 1 0 0 1 9 0 8; 1 0 9 5 8 1 0 1 0; 8 1 0 1 0 0 9 1 7; 0 7 1 0 0 9 1 7 0; 0 5 6 7 5 6 0 7 2; 1 0 8 5 9 2 0 0 1]

6x9 Array{Int64,2}:
 6  1  1  0  0  1  9  0  8
 1  0  9  5  8  1  0  1  0
 8  1  0  1  0  0  9  1  7
 0  7  1  0  0  9  1  7  0
 0  5  6  7  5  6  0  7  2
 1  0  8  5  9  2  0  0  1

In [9]:
N = reshape(M, 1, 54)
m = mean(N)
R = m*ones(6,9)
T = M-R 

6x9 Array{Float64,2}:
  2.90741  -2.09259  -2.09259  -3.09259  …   5.90741  -3.09259   4.90741
 -2.09259  -3.09259   5.90741   1.90741     -3.09259  -2.09259  -3.09259
  4.90741  -2.09259  -3.09259  -2.09259      5.90741  -2.09259   3.90741
 -3.09259   3.90741  -2.09259  -3.09259     -2.09259   3.90741  -3.09259
 -3.09259   1.90741   2.90741   3.90741     -3.09259   3.90741  -1.09259
 -2.09259  -3.09259   4.90741   1.90741  …  -3.09259  -3.09259  -2.09259

In [10]:
U,S,V = svd(T)

(
6x6 Array{Float64,2}:
 -0.508265   0.191223   0.142117   0.777555   -0.0338885   0.281359
  0.395648   0.451622   0.271247   0.0639653   0.727227    0.181591
 -0.535702   0.193766  -0.088862  -0.107872    0.390873   -0.70934 
  0.114155  -0.686015   0.584552   0.245328    0.193597   -0.277464
  0.377549  -0.195262  -0.686929   0.511127    0.184207   -0.228637
  0.377597   0.461717   0.291134   0.241337   -0.495782   -0.505408,

[19.315869814492267,14.464044158410662,4.985585627382442,2.7743977246887277,1.674875102363549,0.931173205656085],
9x6 Array{Float64,2}:
 -0.375099   0.160469  -0.177133   -0.44946     0.0996694  -0.451276
  0.049672  -0.462056  -0.175873   -0.148511   -0.211938    0.404602
  0.402228   0.332009  -0.0425048   0.447445    0.510826    0.121673
  0.273866   0.145454  -0.736676   -0.129077   -0.0899298  -0.041578
  0.402151   0.380416  -0.0464869  -0.0415399  -0.42472    -0.374751
  0.168352  -0.488743   0.109861    0.448482   -0.261977   -0.565623
 -0.515894   0.1

In [11]:
U[:,1:3], V[:,1:3]

(
6x3 Array{Float64,2}:
 -0.508265   0.191223   0.142117
  0.395648   0.451622   0.271247
 -0.535702   0.193766  -0.088862
  0.114155  -0.686015   0.584552
  0.377549  -0.195262  -0.686929
  0.377597   0.461717   0.291134,

9x3 Array{Float64,2}:
 -0.375099   0.160469  -0.177133 
  0.049672  -0.462056  -0.175873 
  0.402228   0.332009  -0.0425048
  0.273866   0.145454  -0.736676 
  0.402151   0.380416  -0.0464869
  0.168352  -0.488743   0.109861 
 -0.515894   0.102953  -0.104993 
  0.13556   -0.471052  -0.425538 
 -0.381383   0.115291  -0.43227  )

Interpreting this is difficult because of the negative signs.   We need a _nonnegative matrix factorization_ to approximate these by matrices with nonnegative entries.

### K-means Clustering via SVD

In [12]:
using Clustering, RDatasets

xclara = dataset("cluster", "xclara")
names!(xclara, [symbol(i) for i in ["x", "y"]])

INFO: Precompiling module Clustering.


LoadError: LoadError: ArgumentError: module RDatasets not found in current path.
Run `Pkg.add("RDatasets")` to install the RDatasets package.
while loading In[12], in expression starting on line 1

In [None]:
using Gadfly
plot(xclara, x="x", y="y", Geom.point)

INFO: Precompiling module Gadfly.
    color_discrete_manual(ColorTypes.Color...) at /home/juser/.julia/v0.4/Gadfly/src/scale.jl:542
is ambiguous with: 
    color_discrete_manual(AbstractString...) at /home/juser/.julia/v0.4/Gadfly/src/scale.jl:539.
To fix, define 
    color_discrete_manual()
before the new definition.
    color_discrete_manual(Array, ColorTypes.Color...)
is ambiguous with: 
    color_discrete_manual(Array, AbstractString...).
To fix, define 
    color_discrete_manual(Array)
before the new definition.


In [None]:
xclara2 = convert(Array, xclara);
xclara2 = xclara2'

In [None]:
initseeds(:rand, xclara2, 3)

In [None]:
xclara_kmeans = kmeans(xclara2, 3)

In [None]:
plot(xclara, x = "x", y = "y", color = xclara_kmeans.assignments, Geom.point)

### Another Clustering Example

In [None]:
iris = dataset("datasets", "iris")
head(iris)

features = convert(Array,iris[:, 1:4])'
result = kmeans( features, 3 ) 

plot(iris, x = "PetalLength", y = "PetalWidth", color = result.assignments, Geom.point)

### Image Compression with SVD

In [None]:
using TestImages, Images, ImageView
img = testimage("cameraman")

In [None]:
A = convert(Array, img)
U,s,V = svd(A)
S=diagm(s)
n = 100
IMG = V[:,1:n]*S[1:n,1:n]*U[:,1:n]'
image = shareproperties(img,IMG)

In [None]:
using Gadfly
plot(x=collect(1:512),y=s, Geom.line)