# Singular value decomposition

### Singular value decomposition

The singular value decomposition is based on the following geometric observation:

> The image of the unit $n$-sphere under a $m\times n$ matrix is a hyperellipse in $\mathbb{R}^m$.

![svd](https://upload.wikimedia.org/wikipedia/commons/e/e9/Singular_value_decomposition.gif)

![svdstatic](https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Singular-Value-Decomposition.svg/512px-Singular-Value-Decomposition.svg.png)

### Singular value decomposition

Recall an $m\times n$ matrix represents a linear tranformation: $$T:V\to W$$

We would like to make a choice of bases for both $V$ and $W$, such that the matrix representing $T$ is as simple as possible.

When $m=n$ we could use the eigenvalue decomposition.  We mimick this process.

> **DEFINITION.** A _singular value decomposition_ of a matrix $A$ is a factorization: $$M=U\Sigma V^*$$ where $U$ is an $m\times m$ unitary matrix, $V$ is an $n\times n$ unitary matrix and $\Sigma$ is an $m\times n$ diagonal matrix.

The columns of $U$ are called the _left singular vectors_ of $M$.  The columns of $V$ are called the _right singular vectors_ of $M$.  The entries of $\Sigma$ are called the _singular values_ of $M$.

In [1]:
svd([1 2 3 4; 5 6 7 8])

(
2x2 Array{Float64,2}:
 -0.376168  -0.926551
 -0.926551   0.376168,

[14.227407412633742,1.2573298353791105],
4x2 Array{Float64,2}:
 -0.352062   0.758981
 -0.443626   0.321242
 -0.53519   -0.116498
 -0.626754  -0.554238)

### ED vs SVD

> __*Question.*__ If $M$ is normal (hence square), is its singular value decomposition equal to its eigenvalue decomposition?

In [3]:
svd([2 1; 1 2])

(
2x2 Array{Float64,2}:
 -0.707107  -0.707107
 -0.707107   0.707107,

[2.9999999999999996,1.0000000000000002],
2x2 Array{Float64,2}:
 -0.707107  -0.707107
 -0.707107   0.707107)

In [1]:
eig([2 1; 1 2])

([1.0,3.0],
2x2 Array{Float64,2}:
 -0.707107  0.707107
  0.707107  0.707107)

### Computing singular value decompositions

If $M$ is $m\times n$, then $M^*M$ is $n\times n$ and $MM^*$ is $m\times m$.  Both are normal, hence by the Spectral Theorem:
\begin{align} MM^* &= VD_1 V^*\\ M^*M &= U D_2 U^*\end{align}
Where $D_1$ and $D_2$ are diagonal matrices the same nonzero entries counted with multiplicity.

> **THEOREM.** Every matrix has a singular value decomposition.  Furthermore, the singular values are uniquely determined.

This states that _every_ linear transformation is a rotation, followed by certain scalings, followed by a rotation.

**Nota Bene:** The theorem does _not_ say that the singular value decomposition is unique, only that it exists _and_ that the singular values are unique.

> **THEOREM.** The singular value decomposition of a normal matrix is equivalent to the eigenvalue decomposition if and only if the matrix is positive semi-definite.

**Proof.**

It is clear that the left and right singular vectors are equal to the eigenvectors (since $A$ is assumed normal).  Further a matrix is positive semi-definite if and only if its eigenvalues are nonnegative.  The singular values are precisely the positive square roots of the squares of the eigenvalues.

In [14]:
M = [1 1 1; 1 0 1]
svd(M)

(
2x2 Array{Float64,2}:
 -0.788205  -0.615412
 -0.615412   0.788205,

[2.135779205069857,0.6621534468619564],
3x2 Array{Float64,2}:
 -0.657192   0.260956
 -0.369048  -0.92941 
 -0.657192   0.260956)

Observe that the singular vectors are orthonormal, and the $2$-norm of $M$ is always the largest singular value:

In [15]:
norm(M)

2.135779205069857

The proof of Existence of a singular value decomposition will proceed inductively on the number of columns of the matrix.

**Proof of Existence of SVD**

Let $\sigma_1=\|A\|_2$.  By compactness, there are vectors $\mathbf{v}_1\in\mathbb{C}^n$ and $\mathbf{u}_1\in\mathbb{C}^m$ satsifying: $$\|\mathbf{v}_1\|_2 = \|\mathbf{u}\|_2=1,\quad A \mathbf{v}_1 = \sigma_1 \mathbf{u}_1.$$

Extend both of these sets to orthonormal bases of $\mathbb{C}^n$ and $\mathbb{C}^m$, and let $V_1$ and $U_1$ denote the corresponding unitary matrices with these vectors as columns.  Then: $$U_1^*A V_1 = S = \begin{bmatrix} \sigma_1 & \mathbf{w}^* \\ \mathbf{0} & B\end{bmatrix}.$$

We check: $$\left\|\begin{bmatrix} \sigma_1 & \mathbf{w}^*\\ \mathbf{0} & B\end{bmatrix}\begin{bmatrix} \sigma_1\\ \mathbf{w}\end{bmatrix}\right\|_2 \geq \sigma_1^2 + \mathbf{w}^*\mathbf{w} = \left(\sigma_1^2+\mathbf{w}^*\mathbf{w}\right)^{1/2}\left\|\begin{bmatrix} \sigma_1\\ \mathbf{w}\end{bmatrix}\right\|_2,$$

Thus $\|S\|_2\geq \left(\sigma_1^2 + \mathbf{w}^*\mathbf{w}\right)^{1/2}$.  Since $U_1$ and $V_1$ are unitary, we know $\|S\|_2=\|A\|_2=\sigma_1$, hence $\mathbf{w}=0$.

If either $n$ or $m$ are equal to $1$, we are done.   Otherwise, $B$ describes the action of the matrix $A$ on the subspace orthogonal to $\mathbf{v}_1$.  

By induction, $B=U_2 \sigma_2 V_2^*$, and: $$A=U_1 \begin{bmatrix} 1 & \mathbf{0} \\ \mathbf{0} & U_2\end{bmatrix} \begin{bmatrix} \sigma_1 & \mathbf{0}\\ \mathbf{0} & \Sigma_2\end{bmatrix}\begin{bmatrix} 1 & \mathbf{0}\\ \mathbf{0} & V_2\end{bmatrix}^* V_1^*.$$

**Proof of Uniqueness of SVD**

The uniqueness of $\sigma_1$ is clear by maximality, i.e. $\sigma_1 = \|A\|_2$.  Proceeding inductively, by noting that once $\sigma_1,\mathbf{u}_1$ and $\mathbf{v}_1$ are determined the remainder of the SVD is determined by the action of $A$ on the space orthogonal to $\mathbf{v}_1$ --- which is uniquely defined up to sign. Continuing, one can show that: $$\sigma_1\geq \sigma_2 \geq \cdots \geq \sigma_k,$$ where $k=\min(m,n)$.


### Image compression

Singular value decompositions can be used to represent data efficiently. Suppose, for instance, that we wish to transmit the following image:

![pixel](http://www.ams.org/featurecolumn/images/august2009/svd.O.gif)

which consists of an array of $25\times 15$ black or white pixels.

In [2]:
M = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1; 
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1; 
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
     1 1 0 0 0 0 0 0 0 0 0 0 0 1 1;
     1 1 0 0 0 0 0 0 0 0 0 0 0 1 1;
     1 1 0 0 0 0 0 0 0 0 0 0 0 1 1;
     1 1 0 0 0 1 1 1 1 1 0 0 0 1 1;
     1 1 0 0 0 1 1 1 1 1 0 0 0 1 1;
     1 1 0 0 0 1 1 1 1 1 0 0 0 1 1;
     1 1 0 0 0 1 1 1 1 1 0 0 0 1 1;
     1 1 0 0 0 1 1 1 1 1 0 0 0 1 1;
     1 1 0 0 0 1 1 1 1 1 0 0 0 1 1;
     1 1 0 0 0 1 1 1 1 1 0 0 0 1 1;
     1 1 0 0 0 1 1 1 1 1 0 0 0 1 1;
     1 1 0 0 0 1 1 1 1 1 0 0 0 1 1;
     1 1 0 0 0 0 0 0 0 0 0 0 0 1 1;
     1 1 0 0 0 0 0 0 0 0 0 0 0 1 1;
     1 1 0 0 0 0 0 0 0 0 0 0 0 1 1;
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1; 
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1; 
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
]

25x15 Array{Int64,2}:
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  0  0  0  0  0  0  0  0  0  0  0  1  1
 1  1  0  0  0  0  0  0  0  0  0  0  0  1  1
 1  1  0  0  0  0  0  0  0  0  0  0  0  1  1
 1  1  0  0  0  1  1  1  1  1  0  0  0  1  1
 1  1  0  0  0  1  1  1  1  1  0  0  0  1  1
 1  1  0  0  0  1  1  1  1  1  0  0  0  1  1
 1  1  0  0  0  1  1  1  1  1  0  0  0  1  1
 1  1  0  0  0  1  1  1  1  1  0  0  0  1  1
 1  1  0  0  0  1  1  1  1  1  0  0  0  1  1
 1  1  0  0  0  1  1  1  1  1  0  0  0  1  1
 1  1  0  0  0  1  1  1  1  1  0  0  0  1  1
 1  1  0  0  0  1  1  1  1  1  0  0  0  1  1
 1  1  0  0  0  0  0  0  0  0  0  0  0  1  1
 1  1  0  0  0  0  0  0  0  0  0  0  0  1  1
 1  1  0  0  0  0  0  0  0  0  0  0  0  1  1
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 1  1  1  1  1  1  1  1  1  1  1 

In [4]:
U, s, V = svd(M)

(
25x15 Array{Float64,2}:
 -0.254727   -0.183563  -0.0376665  …   6.66134e-17  -2.498e-17  
 -0.254727   -0.183563  -0.0376665     -9.07659e-18   3.01407e-17
 -0.254727   -0.183563  -0.0376665      8.27064e-18   4.66705e-17
 -0.254727   -0.183563  -0.0376665      8.27064e-18  -2.09394e-17
 -0.254727   -0.183563  -0.0376665     -2.57652e-17   3.39131e-17
 -0.0872463   0.192717  -0.349163   …  -6.85833e-18  -4.0722e-17 
 -0.0872463   0.192717  -0.349163      -1.42425e-16   2.71824e-17
 -0.0872463   0.192717  -0.349163      -1.55019e-16   1.47039e-17
 -0.184232    0.22116    0.168101       0.00349757   -1.80873e-16
 -0.184232    0.22116    0.168101      -0.161402      2.14801e-16
 -0.184232    0.22116    0.168101   …   0.934561     -6.56966e-16
 -0.184232    0.22116    0.168101      -0.129443      0.563142   
 -0.184232    0.22116    0.168101      -0.129443      0.591329   
 -0.184232    0.22116    0.168101      -0.129443     -0.288618   
 -0.184232    0.22116    0.168101      -0.129443  

In [11]:
u = [U[:,1] U[:,2] U[:,3]]

25x3 Array{Float64,2}:
 -0.254727   -0.183563  -0.0376665
 -0.254727   -0.183563  -0.0376665
 -0.254727   -0.183563  -0.0376665
 -0.254727   -0.183563  -0.0376665
 -0.254727   -0.183563  -0.0376665
 -0.0872463   0.192717  -0.349163 
 -0.0872463   0.192717  -0.349163 
 -0.0872463   0.192717  -0.349163 
 -0.184232    0.22116    0.168101 
 -0.184232    0.22116    0.168101 
 -0.184232    0.22116    0.168101 
 -0.184232    0.22116    0.168101 
 -0.184232    0.22116    0.168101 
 -0.184232    0.22116    0.168101 
 -0.184232    0.22116    0.168101 
 -0.184232    0.22116    0.168101 
 -0.184232    0.22116    0.168101 
 -0.0872463   0.192717  -0.349163 
 -0.0872463   0.192717  -0.349163 
 -0.0872463   0.192717  -0.349163 
 -0.254727   -0.183563  -0.0376665
 -0.254727   -0.183563  -0.0376665
 -0.254727   -0.183563  -0.0376665
 -0.254727   -0.183563  -0.0376665
 -0.254727   -0.183563  -0.0376665

In [14]:
v = [V[:,1] V[:,2] V[:,3]]

15x3 Array{Float64,2}:
 -0.321159   0.251333   -0.28929 
 -0.321159   0.251333   -0.28929 
 -0.172998  -0.351882   -0.113656
 -0.172998  -0.351882   -0.113656
 -0.172998  -0.351882   -0.113656
 -0.285607   0.0296757   0.342853
 -0.285607   0.0296757   0.342853
 -0.285607   0.0296757   0.342853
 -0.285607   0.0296757   0.342853
 -0.285607   0.0296757   0.342853
 -0.172998  -0.351882   -0.113656
 -0.172998  -0.351882   -0.113656
 -0.172998  -0.351882   -0.113656
 -0.321159   0.251333   -0.28929 
 -0.321159   0.251333   -0.28929 

In [24]:
D = diagm(s)
d = [D[:,1] D[:,2] D[:,3]]
f = [d[1,:]; d[2,:]; d[3,:]]

3x3 Array{Float64,2}:
 14.7243  0.0      0.0    
  0.0     5.21662  0.0    
  0.0     0.0      3.31409

In [26]:
*(u,*(f,transpose(v)))

25x15 Array{Float64,2}:
 1.0  1.0  1.0          1.0          …  1.0          1.0          1.0  1.0
 1.0  1.0  1.0          1.0             1.0          1.0          1.0  1.0
 1.0  1.0  1.0          1.0             1.0          1.0          1.0  1.0
 1.0  1.0  1.0          1.0             1.0          1.0          1.0  1.0
 1.0  1.0  1.0          1.0             1.0          1.0          1.0  1.0
 1.0  1.0  4.996e-16    2.77556e-16  …  3.88578e-16  3.88578e-16  1.0  1.0
 1.0  1.0  4.996e-16    2.77556e-16     3.88578e-16  3.88578e-16  1.0  1.0
 1.0  1.0  4.996e-16    2.77556e-16     3.88578e-16  3.88578e-16  1.0  1.0
 1.0  1.0  9.29812e-16  9.57567e-16     8.60423e-16  8.60423e-16  1.0  1.0
 1.0  1.0  9.29812e-16  9.57567e-16     8.60423e-16  8.60423e-16  1.0  1.0
 1.0  1.0  9.29812e-16  9.57567e-16  …  8.60423e-16  8.60423e-16  1.0  1.0
 1.0  1.0  9.29812e-16  9.57567e-16     8.60423e-16  8.60423e-16  1.0  1.0
 1.0  1.0  9.29812e-16  9.57567e-16     8.60423e-16  8.60423e-16  1.0  1.0
 

### Collaborative filtering

Amy, Bob, Charlie and David rate the following six movies from 1 to 5: $$\begin{array}{ccccl} \text{Amy} & \text{Bob} & \text{Charlie} & \text{David} & \\ 1 & 1 & 5 & 4 & \text{The Dark Knight}\\ 2 & 1 & 4 & 4 & \text{The Amazing Spiderman}\\ 4 & 5 & 2 & 1 & \text{Love Actually}\\ 5 & 4 & 2 & 1 & \text{Bridget Jones's Diary}\\ 4 & 5 & 1 & 2 & \text{Pretty Woman}\\ 1 & 2 & 5 & 4 & \text{Superman 2}\end{array}$$

In [27]:
A = [1 1 5 4; 2 1 4 5; 4 5 2 1; 5 4 2 1; 4 5 1 2; 1 2 5 5]

6x4 Array{Int64,2}:
 1  1  5  4
 2  1  4  5
 4  5  2  1
 5  4  2  1
 4  5  1  2
 1  2  5  5

In [29]:
M = reshape(A, 1, 24)

1x24 Array{Int64,2}:
 1  2  4  5  4  1  1  1  5  4  5  2  5  4  2  2  1  5  4  5  1  1  2  5

In [31]:
m = mean(M)

3.0

We compute the mean-centered ratings matrix:

In [33]:
B = m*ones(6,4)
C = A-B

6x4 Array{Float64,2}:
 -2.0  -2.0   2.0   1.0
 -1.0  -2.0   1.0   2.0
  1.0   2.0  -1.0  -2.0
  2.0   1.0  -1.0  -2.0
  1.0   2.0  -2.0  -1.0
 -2.0  -1.0   2.0   2.0

We compute the singular value decomposition of the mean-centered ratings matrix:

In [35]:
p, q, r = svd(C)

(
6x4 Array{Float64,2}:
 -0.446401  -0.371748     -0.420224   -0.601501   
 -0.390517  -4.68375e-15   0.562549    3.60822e-16
  0.390517   4.68375e-15  -0.562549   -2.77556e-17
  0.384996  -0.601501      0.0833694   0.371748   
  0.384996   0.601501      0.0833694  -0.371748   
 -0.446401   0.371748     -0.420224    0.601501   ,

[7.785085687110693,1.6180339887498958,1.5467517073998112,0.6180339887498946],
4x4 Array{Float64,2}:
  0.478046  -0.371748   0.52103   0.601501
  0.52103    0.601501  -0.478046  0.371748
 -0.478046  -0.371748  -0.52103   0.601501
 -0.52103    0.601501   0.478046  0.371748)

We observe that the first singular value is significantly larger than the rest.   This indicates that the mean centered ratings matrix is well-approximated by the following low rank matrix formed from the first singular vectors:

In [48]:
B+q[1]*p[:,1]*transpose(r[:,1])

6x4 Array{Float64,2}:
 1.33866  1.18928  4.66134  4.81072
 1.54664  1.41596  4.45336  4.58404
 4.45336  4.58404  1.54664  1.41596
 4.43281  4.56164  1.56719  1.43836
 4.43281  4.56164  1.56719  1.43836
 1.33866  1.18928  4.66134  4.81072

The first left-singular vector is:

In [49]:
p[:,1]

6-element Array{Float64,1}:
 -0.446401
 -0.390517
  0.390517
  0.384996
  0.384996
 -0.446401

Similar centered scores indicate similar genre.

The first right-singular vector is:

In [50]:
r[:,1]

4-element Array{Float64,1}:
  0.478046
  0.52103 
 -0.478046
 -0.52103 

Similar centered scores indicate users with similar tastes, thus Amy and Bob are a cluster (both prefer romantic comedies over action).

### Principal components analysis

Consider the following data:
![PCA](http://www.ams.org/featurecolumn/images/august2009/random.data.gif)

In [56]:
M = [-1.03 0.74 -0.02 0.51 -1.31 0.99 0.69 -0.12 -0.72 1.11; 
     -2.23 1.61 -0.02 0.88 -2.39 2.02 1.62 -0.35 -1.67 2.46]
L, Q, R = svd(M)

(
2x2 Array{Float64,2}:
 -0.430715  -0.902488
 -0.902488   0.430715,

[6.0397087855043186,0.21867278363335854],
10x2 Array{Float64,2}:
  0.406673    -0.141455 
 -0.293348     0.117118 
  0.00441479   0.0431487
 -0.167865    -0.371511 
  0.450549     0.698989 
 -0.372441    -0.107093 
 -0.291276     0.34317  
  0.0608567   -0.194134 
  0.300887    -0.317841 
 -0.446746     0.264312 )

With one singular value so much larger than the other, it may be safe to assume that the small value of the second singular value is due to noise in the data.   So we reconstitute the matrix using only the first singular value and its singular vectors.

In [57]:
Q[1]*L[:,1]*transpose(R[:,1])

2x10 Array{Float64,2}:
 -1.05792  0.763113  -0.0114846  0.436682  …  -0.158312  -0.782726  1.16216
 -2.21668  1.59897   -0.024064   0.914991     -0.331715  -1.64006   2.43511

![line](http://www.ams.org/featurecolumn/images/august2009/random.data.svd.gif)