+ This notebook is part of lecture 29 *Singular value decomposition* in the OCW MIT course 18.06 by Prof Gilbert Strang [1]
+ Created by me, Dr Juan H Klopper
    + Head of Acute Care Surgery
    + Groote Schuur Hospital
    + University Cape Town
    + <a href="mailto:juan.klopper@uct.ac.za">Email me with your thoughts, comments, suggestions and corrections</a> 
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/InteractiveResource" property="dct:title" rel="dct:type">Linear Algebra OCW MIT18.06</span> <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName">IPython notebook [2] study notes by Dr Juan H Klopper</span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.

+ [1] <a href="http://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/index.htm">OCW MIT 18.06</a>
+ [2] Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: http://ipython.org

In [None]:
from IPython.core.display import HTML, Image
css_file = 'style.css'
HTML(open(css_file, 'r').read())

In [None]:
from sympy import init_printing, Matrix, symbols, sqrt, Rational
from warnings import filterwarnings

In [None]:
init_printing(use_latex = 'mathjax')
filterwarnings('ignore')

+ This chapter starts with explanations using sympy
+ A proper method using numpy and sympy is described in the example problem at the end

# Singular value decomposition (SVD)

## Derivation

* This is the final form of matrix factorization
* The factors are an orthogonal matrix A, a diagonal matrix &Sigma;, and an orthogonal matrix V
$$ {A}={U}{\Sigma}{V}^{T} $$

+ In case the matrix A is symmetric positive definite, the decomposition is akin to the following
$$  {A}={Q}{\Lambda}{Q}^{T} $$

* Consider a vector *v*<sub>1</sub> in &#8477;<sup>n</sup> row space, transformed into a vector *u*<sub>1</sub> in &#8477;<sup>m</sup> column space by the matrix A
$$ {u}_{1}={A}{v}_{1} $$

* What we are looking for is an orthogonal basis in &#8477;<sup>n</sup> row space, transformed into an orthogonal basis in &#8477;<sup>m</sup> column space
$$ {u}_{1}={A}{v}_{1} \\ { v }_{ 1 }\bot { v }_{ 2 };{ u }_{ 1 }\bot { u }_{ 2 }$$

* It's easy to calculate an orthogonal basis in the row space using Gram-Schmidt
* Now, though, we need something special in A that would ensure that the basis *u*<sub>i</sub> in in &#8477;<sup>m</sup> column space is also orthogonal (and at the same time make it orthonormal, so that *v*<sub>i</sub> ends up as &sigma;<sub>i</sub>*u*<sub>i</sub>)
* The two nullspaces are not required
* So, we are looking for the following
$$ A\begin{bmatrix} \vdots  & \vdots  & \vdots  & \vdots  \\ { v }_{ 1 } & { v }_{ 2 } & \cdots  & { v }_{ r } \\ \vdots  & \vdots  & \vdots  & \vdots  \end{bmatrix}=\begin{bmatrix} \vdots  & \vdots  & \vdots  & \vdots  \\ { u }_{ 1 } & u_{ 2 } & \cdots  & { u }_{ r } \\ \vdots  & \vdots  & \vdots  & \vdots  \end{bmatrix}\begin{bmatrix} { \sigma  }_{ 1 } & \quad  & \quad  & \quad  & \quad  \\ \quad  & { \sigma  }_{ 2 } & \quad  & \quad  & \quad  \\ \quad  & \quad  & \quad \ddots  & \quad  & \quad  \\ \quad  & \quad  & \quad  & { \sigma  }_{ r }\quad  & \quad  \\ \quad  & \quad  & \quad  & \quad  & \left( 0 \right)  \end{bmatrix} \\ {A}{V}={U}{\Sigma}$$

+ In the case that we are not changing spaces V and U would be the same matrix Q (and then Q<sup>-1</sup>)

### Example problem explaining the derivation

+ Look at the next matrix A that is square and invertible (i.e. rank 2)

In [None]:
A = Matrix([[4, 4], [-3, 3]])
A

* We are looking for *v*<sub>1</sub> and *v*<sub>2</sub> in the &#8477;<sup>2</sup> rowspace and *u*<sub>1</sub> and *u*<sub>2</sub> in the &#8477;<sup>2</sup> columnspace, as well as the scaling factors *&sigma;*<sub>1</sub>>0 and *&sigma;*<sub>2</sub>>0

+ Just to be complete, we extend V until *v*<sub>n</sub> with zero columns and U with zero columns until *u*<sub>m</sub>, as well as zeros for &Sigma; to include the nullspaces

+ Now A is not symmetric so that their eigenvectors are not orthogonal (Q), so we can't go that route

+ From above we have the following and because V is square and orthogonal we have
$$ {A}={U}{\Sigma}{V}^{-1} \\ {A}={U}{\Sigma}{V}^{T} $$

+ Multiplying both sides by A<sup>T</sup> we will have a left-hand side that is square and definte (semi)definte
$$ A=U\Sigma { V }^{ T }\\ { A }^{ T }A=V{ \Sigma  }^{ T }{ U }^{ T }U\Sigma { V }^{ T }\\ \because \quad { U }^{ T }U=I\\ \because \quad { \Sigma  }^{ T }\Sigma =\dots { \sigma  }_{ i }^{ 2 }\dots \\ { A }^{ T }A=V{ \Sigma  }^{ T }\Sigma { V }^{ T } $$

+ Because A<sup>T</sup>A is now definite (semi)positive, we have a perfect situation akin to being able to use Q&Lambda;Q<sup>T</sup>
+ The eigenvalues are the squares of the &sigma;<sub>i</sub> values
+ To get U we use AA<sup>T</sup> and use its eigenvalues and eigenvectors

* All of this is easy to accomplish with the mpmath submodule of sympy

In [None]:
from sympy.mpmath import svd

In [None]:
U, S, V = svd(A)

In [None]:
U # The numbers round to zero!!! Please see it as zero

In [None]:
S # Not the final Sigma matrix

In [None]:
V

+ There are square roots, so the values are given instead of symbols

* Now let's do it step-by-step

In [None]:
A.transpose() * A

In [None]:
(A.transpose() * A).eigenvals()

In [None]:
(A.transpose() * A).eigenvects()

* These are not normalized, though
* Also remember to take the square roots of the eigenvalues
* ... and to add zeros to incorporate the correct size for *m* and *n*
* ... and to take the transpose

* Now let's tackle U

In [None]:
A * A.transpose()

In [None]:
(A * A.transpose()).eigenvals()

* The eigenvalues are always the same

In [None]:
(A * A.transpose()).eigenvects()

* Also remember to normalize (see example problem below)

* We now have U, &Sigma; and V<sup>T</sup> (although &Sigma; must still be constructed; see below)

### Example problem to explain the derivation for dependent rows, columns

+ Let's consider this rank=1, 2&times;2 singular matrix
+ The rowspace is just a line (the second row is a constant multiple of the first)
+ The nullspace of this row picture is a line perpendicular to this
+ The columnspace is also on a line, with the nullspace of A<sup>T</sup> being a line perpendicular to this

In [None]:
A = Matrix([[4, 3], [8, 6]])
A

+ Let's use *svd*() first

In [None]:
U, S, V = svd(A, full_matrices = True, compute_uv = True)

* This is likely to be different to the value you calculate for U
* We are talking unit basis vectors, though, which can be in a different direction depending on your choice

In [None]:
U

In [None]:
S

+ Note that the size of our &Sigma; matrix is wrong
+ It has to be 2&times;2 and we have to create it from this info
+ Since A has rank = 1 and all off-diagonal entries must be zero, we will only have a value in the first row, first column position
+ Below I show you how to correct this

In [None]:
V

In [None]:
A.transpose() * A # Which will be symmetric positive definite and of rank = 1

In [None]:
(A.transpose() * A).eigenvals() # One eigenvalue will be zero and the other must then be the trace

In [None]:
(A.transpose() * A).eigenvects()

In [None]:
A * A.transpose()

In [None]:
(A * A.transpose()).eigenvals()

In [None]:
(A * A.transpose()).eigenvects()

+ Inserted below is the three resultant matrices from our calculations above (normalized, etc)

In [None]:
Matrix([[1 / sqrt(5), 2 / sqrt(5)], [2 / sqrt(5), 1 / sqrt(5)]]), Matrix([[sqrt(125), 0], [0, 0]]), Matrix([[0.8, 0.6], [0.6, -0.8]])

+ Now let me show you how to correct the *svd*() solutions

In [None]:
U

In [None]:
S

In [None]:
S = Matrix([[11.1803398874989, 0], [0, 0]])
S # Composed by hand (proper method further below)

In [None]:
V

In [None]:
V = Matrix([[-0.8, -0.6], [-0.6, 0.8]])
V # Remember that this is actually V transpose

+ Let's calculate A&Sigma;V<sup>T</sup>

In [None]:
U * S * V

+ Compensating for rounding, this is the original matrix A

## Summary

+ The orthonormal basis for the rowspace is
$$ {v}_{1},{v}_{2},\dots,{v}_{r} $$
+ The orthonormal basis for the columnspace is
$$ {u}_{1},{u}_{2},\dots,{u}_{r} $$
+ The orthonormal basis for the nullspace is
$$ {v}_{r+1},{v}_{r+2},\dots,{v}_{n} $$
+ The orthonormal basis for the nullspace of A<sup>T</sup>
$$ {u}_{r+1},{u}_{r+2},\dots,{u}_{m} $$

## Example problem

### Example problem 1

* Find the singular value decomposition of the matrix
$$ \begin{bmatrix}5&5\\-1&7\end{bmatrix} $$

#### Solution

+ First off, I'll show you how to make proper use of numpy and scipy (as opposed to sympy) to solve singular value decomposition problems

In [None]:
from numpy import matrix, transpose # Importing the matrix object and the 
# transpose object from numerical python (numpy)
from numpy.linalg import svd, det # Importing the svd and determinant
# methods from the linalg submodule 
from scipy.linalg import diagsvd

In [None]:
type(transpose) # Type tells us what 'something' is (sometimes)

In [None]:
C = matrix([[5, 5], [-1, 7]]) # Using the numpy matrix object
C

* We can see from the determinant that the rows and columns are independent

In [None]:
det(C) # Notice the difference in syntax

+ Let's calculate U by looking at A<sup>T</sup>A

In [None]:
transpose(C) *C # Notice the difference in synmtax

+ This is symmetric, positive definite
+ One eigenvalue will be 0 and the other, the trace (since they (the eigenvalues) must sum to the trace)
+ Remember that the eigenvalues are the squares of the *&sigma;*<sub>i</sub> values

+ Now let's put numpy and sympy to good use

In [None]:
U, S, VT = svd(C) # I use the computer variable VT to remind us that
# this is the transpose of V

+ S will only indicate the eigenvalues and must be converted to the correct sized matrix

In [None]:
M, N = C.shape # Shape returns a tuple (two values), indicating
# row and column size
M, N

In [None]:
Sig = diagsvd(S, M, N) # Creating a m times n matrix from S
Sig

In [None]:
VT

+ Let's check if it worked!

In [None]:
U * Sig * VT

Now, let's use good old sympy

In [None]:
C = Matrix([[5, 5], [-1, 7]])
C

+ We need to work with a positive (semi)definite matrix

In [None]:
CTC = C.transpose() * C # Using the computer variable CTC to remind that
# it is C transpose times C
CTC, CTC.det()

+ Let's look at the eigenvalues

In [None]:
CTC.eigenvals()

+ &Sigma; will contain along its main diagonal the square root of these eigenvalues

In [None]:
Sig = Matrix([[sqrt(20), 0], [0, sqrt(80)]])
Sig

+ For V we require the eigenvectors of C<sup>T</sup>C
+ We need to remember to normalize each vector (dividing each component by the length (norm) of that vector

In [None]:
CTC.eigenvects()

+ Let's normalize each *v*<sub>i</sub> by calculating the length (norm) of each

In [None]:
v1 = Matrix([-3, 1])
v1.norm()

In [None]:
v2 = Matrix([Rational(1, 3), 1])
v2.norm()

+ We'll get each element of V by dividing by these norms

In [None]:
-3 / v1.norm(), 1 / v1.norm()

In [None]:
Rational(1, 3) / v2.norm(), 1 / v2.norm()

In [None]:
V = Matrix([[-3 / sqrt(10), 1 / sqrt(10)], [1 / sqrt(10), 3 / sqrt(10)]])
# Just remember to put the elements of V in the correct place
V

+ Remember that it is equal to the transpose of V

In [None]:
V == V.transpose()

+ Now for U using CC<sup>T</sup>

In [None]:
CCT = C * C.transpose() # Using the computer variable CCT
CCT

+ The eigenvalues will be the same

In [None]:
CCT.eigenvals()

In [None]:
CCT.eigenvects()

In [None]:
u1 = Matrix([-1, 1])
u2 = Matrix([1, 1])

In [None]:
-1 / u1.norm(), 1 / u1.norm()

In [None]:
1 / u2.norm(), 1 / u2.norm()

In [None]:
U = Matrix([[-sqrt(2) / 2, sqrt(2) / 2], [sqrt(2) / 2, sqrt(2) / 2]])
# Just remember to put the elements of U in the correct place
U

+ Let's see if it worked!

In [None]:
U * Sig * V