# Chapter 16: Singular Value Decomposition (SVD)

- content: p. 471 - 502
- exercises: p. 503 - 520

## 16.1 Singular Value Decomposition

- Singular Value Decomposition (SVD) is closely related to eigen-decomposition.
- In fact, eigendecomposition can be seen as a special case of the SVD, with SVD being the generalized algorithm.
  - i.e. eigendecomposition works only on square matrices, SVD works on all matrices.

**Core idea of SVD:**
- provide a set of basis vectors called *singular vectors* for the 4 matrix subspaces (row space, null space, column space, left-null space).
- provide scalar *singular values* that encode the "importance" of each singular vetor.
  - (Singular vectors are similar to eigenvectors, singular values are similar to eigenvalues.)

**Equation for Singular Value Decomposition (SVD):**
$$A = U \Sigma V^T$$

$A$ = The MxN matrix to be decomposed.  It can be square or rectangular, and any rank.

$U$ = The *left singular vectors matrix* (MxM), which provides an orthonormal basis for $\mathbb{R}^M$.  This includes the column space of $A$ and its complementary left-null space.
- The size of $U$ corresponds to the number of rows in $A$ (recall that counter-intuitively, the size of the column space = the number of rows, i.e. the count of total elements in each column).

$\Sigma$ = The *singular values matrix* (MxN), which is diagonal and contains the singular values (the ith singular value is indicated $\sigma_i$).  All singular values are non-negative (that is, positive or zero) and real-valued.
- The size of $\Sigma$ is the same as A.

$V$ = The *right singular vectors matrix* (NxN), which provides an orthonormal basis for $\mathbb{R}^N$.  That includes the row space of $A$ and its complementary null space.
- The size of $V$ corresponds to the number of columns in $A$ (recall that counter-intuitively, the size of the row space = the number of columns, i.e. the count of total elements in each row).
- Notice that the decomposition contains $V^T$; hence, although the right singular vectors are in the *columns* of $V$, it is usually more convenient to speak of the right singular vectors as being the *rows* of $V^T$.

**Sizes of SVD matrices**

<img src='img/16/SVD-sizes.jpg' alt='SVD sizes' width=500>

## 16.2 Computing the SVD

- You may think that computing the SVD is very difficult, but the truth is that once you konw eigendecomposition, the SVD is almost trivial to compute.
- Start by considering eigendecomposition of matrix $A$ of size $M \neq N$
  - eigendecomposition is not defined for non-square matrix, however $A^TA$ is eigendecomposable.
  - Replacing $A^TA$ with the SVD matrices gives us the following:

$$A^TA = (U \Sigma V^T)^T(U \Sigma V^T)$$
$$A^TA = V \Sigma^T U^TU \Sigma V^T$$
$U$ is orthogonal, ergo $U^TU=I$.  Also $\Sigma$ is diagonal, so $\Sigma^T \Sigma = \Sigma^2$
$$A^TA = V \Sigma^2 V^T$$

- you can immediately see why the singular values are non-negative--any real number squared will be non-negative.
- we're missing the U matrix, but we can obtain it via the eigendecomposition of matrix $AA^T$:

$$AA^T = (U \Sigma V^T)(U \Sigma V^T)^T$$
$$AA^T = U \Sigma V^T V \Sigma^T U^T$$
$V$ is orthogonal, ergo $V^TV=I$.  Also $\Sigma$ is diagonal, so $\Sigma \Sigma^T = \Sigma^2$
$$AA^T = U \Sigma^2 U^T$$

- So now we see that the way to compute the SVD of any rectangular matrix is to apply the following steps...

**Steps to compute SVD:**
1. Compute the eigendecomposition of $A^TA$ to get $V$ (and $\Sigma$).
2. Compute the eigendecomposition of $AA^T$ to obtain $U$ (and $\Sigma$).

- note: it's actually not necessary to complete both steps to obtain the SVD.  After completing one step, we can compute the missing matrix by using one of the following formulas:
$$A V \Sigma^{-1} = U$$
$$\Sigma^{-1}U^TA = V^T$$

- quick aside: how do we know that $U$ and $V$ are orthogonal matrices?
  - because they come from the eigendecomposition of a symmetric matrix.  Look back at ch. 15 eigendecomposition of symmetric matrices for more details.

- When first computing the SVD by hadn (which the author recommends doing at least a few times to solidify the concept), you should first decide whether to apply step 1 and then solve for U, or apply step 2 and then solve for V.
- The best strategy depends on the size of the matrix, because you want to compute the eigendecomposition of whichever of $A^TA$ or $AA^T$ is smaller.

## 16.3 Singular values and eigenvalues

## 16.4 SVD of a symmetric matrix

## 16.5 SVD and the four subspaces

## 16.6 SVD and matrix rank

## 16.7 SVD spectral theory

## 16.8 SVD and low-rank approximations

## 16.9 Normalizing singular values

## 16.10 Condition number of a matrix

## SVD and the matrix inverse

## 16.12 The MP Pseudoinverse, part 2

## 16.13 - 16.14 Code Challenges

1. In Chapter 13, you learned about "economy" QR decomposition, which can be useful for large tall matrices. There is a comparable "economy" version of the SVD. Your goal here is to figure out what that means. First, generate three random matrices: square, wide, and tall. Then run the full SVD to confirm that the sizes of the SVD matrices match your expectations (e.g., Figure 16.1). Finally, run the economy SVD on all three matrices and compare the sizes to the full SVD.

2. Obtain the three SVD matrices from eigendecomposition, as described in section 16.2. Then compute the SVD of that matrix using the svd () function, to confirm that your results are correct. Keep in mind the discussions of sign-indeterminacy.

3. Write code to reproduce panels $\mathrm{B}$ and $\mathrm{C}$ in Figure 16.5. Confirm that the reconstructed matrix (third matrix in panel C) is equal to the original matrix. (Note: The matrix was populated with random numbers, so don't expect your results to look exactly like those in the figure.)

4. Create a random-numbers matrix with a specified condition number. For example, create a $6 \times 16$ random matrix with a condition number of $\kappa=42$. Do this by creating random $\mathbf{U}$ and $\mathbf{V}$ matrices, an appropriate $\boldsymbol{\Sigma}$ matrix, and then create $\mathrm{A}=\mathbf{U \Sigma V}^{\mathrm{T}}$. Finally, compute the condition number of $\mathrm{A}$ to confirm that it matches what you specified (42).

5. This and the next two challenges involve taking the SVD of a picture. A picture is represented as a matrix, with the matrix values corresponding to grayscale intensities of the pixels. We will use a picture of Einstein. You can download the file at https://upload.wikimedia.org/wikipedia/en/8/86/Einstein tongue.jpg of course, you can replace this with any other picture a selfie day... However, you may need to apply some image pedding to reduce the image matrix from $3 \mathrm{D}$ to $2 \mathrm{D}$ (thus, processing stead of RGB) and the datatype must be double (MATLAB) or floats (Python).

After importing the image, construct a low-rank approximation using various numbers of singular values. Show the original and low-rank approximations side-by-side. Test various numbers of components and qualitatively evaluate the results. Tip: You don't need to include the top components!


6. Create a scree plot of the percent-normalized singular values. Then test various thresholds for reconstructing the picture (e.g., including all components that explain at least $4 \%$ of the variance). What threshold seems reasonable?

7. The final challenge for this picture-SVD is to make the assessments of the number of appropriate components more quantitative. Compute the error between the reconstruction and the original image. The error can be operationalized as the RMS (root mean square) of the difference map. That is, create a difference image as the subtraction of the original and low-rank reconstructed image, then square all matrix elements (which are pixels), average over all pixels, and take the square root of that average. Make a plot of the RMS as a function of the number of components you included. How does that function compare to the scree plot?

8. What is the pseudoinverse of a column vector of constants? That is, the pseudoinverse of $k 1$. It obviously doesn't have a full inverse, but it is clearly a full column-rank matrix. First, work out your answer on paper, then confirm it in MATLAB or Python.

9. The goal here is to implement the series of equations on page 505 and confirm that you get the same result as with the pinv() function. Start by creating a $4 \times 2$ matrix of random integers between 1 and 6 . Then compute its SVD (Equation 16.29). Then implement each of the next four equations in code. Finally, compute the MP pseudoinverse of the tall matrix. You will now have five versions of the pseudoinverse; make sure they are all equal.

10. This challenge follows up on the first code challenge from the previous chapter (about generalized eigendecomposition implemented as two matrices vs. the product of one matrix and the other's inverse). The goal is to repeat the exploration of differences between eig $(A, B)$ and eig $(\operatorname{inv}(B) * A)$. Use only $10 \times 10$ matrices, but now vary the condition number of the random matrices between $10^{1}$ and $10^{10}$. Do you come to different conclusions from the previous chapter?

11. This isn't a specific code challenge, but instead a general suggestion: Take any claim or proof I made in this chapter (or any other chapter), and demonstrate that concept using numerical examples in code. Doing so (1) helps build intuition, (2) improves your skills at translating math into code, and (3) gives you opportunities to continue exploring other linear algebra principles (I can't cover everything in one book!).