# SVD in action

What happens after calculating the singular value decomposition (SVD) of a matrix? In this topic, you'll explore the main applications of this decomposition. You'll finally get a geometric interpretation of the transpose and easily compute the orthonormal basis of spaces closely related to the matrix.

You'll also develop an alternative form of the SVD that allows you to progressively rebuild any matrix and accurately approximate it.

In the following topic, you'll be working with an $m\times n$ matrix $A$ of rank $r$ with SVD given by

$$
A = U\Sigma V^T.
$$

As before, the columns of its factors are

$$
U = [\,u_1 \;\; u_2 \;\; \cdots \;\; u_m\,],
\qquad
V = [\,v_1 \;\; v_2 \;\; \cdots \;\; v_n\,].
$$

The singular values are ordered non-increasingly:

$$
\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0.
$$

## The geometry of the inverse and the transpose

Here comes the first application of singular values. When all of them are different from zero and $A$ is square, then it is important to note that $\Sigma^{-1}$ is a diagonal matrix whose entries are the multiplicative inverses of the singular values. As a result, $A$ is invertible and its inverse is

$$
A^{-1}=V\Sigma^{-1}U^T.
$$

All the pieces of this decomposition are already known. Therefore, finding $A^{-1}$ is reduced to computing $\Sigma^{-1}$ and performing two matrix multiplications.

Have you ever noticed that we still do not have a geometric interpretation of the transpose? Using the SVD, it becomes clear that

$$
A^T=V\Sigma^TU^T.
$$

Compare the decompositions of $A^{-1}$ and $A^T$. They are very similar. Geometrically, they undo the transformation

$$
A=U\Sigma V^T
$$

but in the opposite order:

1. First, they apply $U^T$ to neutralize the effect of $U$.
2. Then, they stretch the resulting space.
3. Finally, they counteract $V^T$ by applying $V$.


![svd_geometrie.png](img/svd_geometrie.png)

But the second step is the actual difference between $A^{-1}$ and $A^T$. While the former undoes the stretch of $A$, the latter simply stretches by the same amount. Thus, you can roughly think of $A^T$ as rotating in the opposite direction as $A$ but stretching in the same way.

## The four fundamental spaces

The relationship between $A$, its transpose, and the SVD is deeper than it may first appear. The four fundamental spaces of $A$ are

$$
\ker(L_A),\quad \operatorname{Im}(L_A),\quad \ker(L_{A^T}),\quad \operatorname{Im}(L_{A^T}).
$$

Once the SVD is known, all of these spaces can be reconstructed.


### Fundamental subspaces from the SVD

If $A=U\Sigma V^T$ has rank $r$, then

- $\{u_1,u_2,\dots,u_r\}$ is an orthonormal basis of $\operatorname{Im}(L_A)$,
- $\{u_{r+1},u_{r+2},\dots,u_m\}$ is an orthonormal basis of $\ker(L_{A^T})$,
- $\{v_1,v_2,\dots,v_r\}$ is an orthonormal basis of $\operatorname{Im}(L_{A^T})$,
- $\{v_{r+1},v_{r+2},\dots,v_n\}$ is an orthonormal basis of $\ker(L_A)$.


### Example

Consider the matrix

$$
A=
\begin{pmatrix}
1 & 1 & 1 & 1\\
2 & 2 & -2 & -2\\
3 & -3 & 3 & -3
\end{pmatrix}.
$$

An SVD of $A$ is given by

$$
U=
\begin{pmatrix}
0 & 0 & 1\\
0 & 1 & 0\\
1 & 0 & 0
\end{pmatrix},
\qquad
\Sigma=
\begin{pmatrix}
6 & 0 & 0\\
0 & 4 & 0\\
0 & 0 & 2\\
0 & 0 & 0
\end{pmatrix},
$$

and

$$
V=\frac12
\begin{pmatrix}
1 & -1 & 1 & -1\\
1 & 1 & -1 & -1\\
1 & 1 & 1 & 1\\
1 & -1 & -1 & 1
\end{pmatrix}.
$$

Since there are three positive singular values, the rank is $r=3$.

### Fundamental spaces

From $U$ we obtain

$$
\left\{
\begin{pmatrix}0\\0\\1\end{pmatrix},
\begin{pmatrix}0\\1\\0\end{pmatrix},
\begin{pmatrix}1\\0\\0\end{pmatrix}
\right\}
$$

as an orthonormal basis of $\operatorname{Im}(L_A)$, and

$$
\ker(L_{A^T})=\{0\}.
$$

From $V$ we obtain

$$
\left\{
\frac12\begin{pmatrix}1\\-1\\1\\-1\end{pmatrix},
\frac12\begin{pmatrix}1\\1\\-1\\-1\end{pmatrix},
\frac12\begin{pmatrix}1\\1\\1\\1\end{pmatrix}
\right\}
$$

as an orthonormal basis of $\operatorname{Im}(L_{A^T})$, and

$$
\left\{
\frac12\begin{pmatrix}1\\-1\\-1\\1\end{pmatrix}
\right\}
$$

as an orthonormal basis of $\ker(L_A)$.

## An alternative form of the SVD

The SVD can be written as a sum of rank-1 matrices:

$$
A
=\sigma_1 u_1 v_1^T+\sigma_2 u_2 v_2^T+\cdots+\sigma_r u_r v_r^T
=\sum_{j=1}^r \sigma_j\,u_j v_j^T.
$$

Each term $\sigma_j u_j v_j^T$ is a **latent component** of $A$.
Since $u_j v_j^T$ has linearly dependent columns, every term has rank $1$.

### Proof

Assume $m<n$. Then

$$
A=U\Sigma V^T
=[u_1\;u_2\;\cdots\;u_m]
\begin{pmatrix}
\sigma_1&&&\\
&\sigma_2&&\\
&&\ddots&\\
&&&\sigma_m\\
&&&\\
\end{pmatrix}
[v_1\;v_2\;\cdots\;v_n]^T
$$

$$
=
[\sigma_1u_1\;\sigma_2u_2\;\cdots\;\sigma_m u_m]
[v_1\;v_2\;\cdots\;v_n]^T
=\sum_{j=1}^m \sigma_j u_j v_j^T.
$$

Since $\sigma_j=0$ for $j>r$, this reduces to

$$
A=\sum_{j=1}^r \sigma_j u_j v_j^T.
$$

### Example

Let

$$
A=
\begin{pmatrix}
1&0&-2\\
0&1&-1\\
2&1&1
\end{pmatrix}.
$$

An SVD is

$$
U=
\begin{pmatrix}
\frac23&\frac1{\sqrt6}&-\frac15\\
\frac16&\frac1{\sqrt{30}}&\frac15\\
\frac16&-\frac5{\sqrt6}&0
\end{pmatrix},
\quad
\Sigma=
\begin{pmatrix}
6&0&0\\
0&2&0\\
0&0&1
\end{pmatrix},
\quad
V=
\begin{pmatrix}
0&\frac52&-\frac15\\
0&\frac25&\frac15\\
1&0&0
\end{pmatrix}.
$$

Then

$$
\sigma_1u_1v_1^T=
\begin{pmatrix}
0&0&0\\
0&0&0\\
2&1&1
\end{pmatrix},
\quad
\sigma_2u_2v_2^T=
\begin{pmatrix}
5&4&-2\\
5&2&-1\\
0&0&0
\end{pmatrix},
\quad
\sigma_3u_3v_3^T=
\begin{pmatrix}
1&-2&0\\
-2&4&0\\
0&0&0
\end{pmatrix}.
$$

Adding them gives

$$
A=\sum_{j=1}^3 \sigma_j u_j v_j^T.
$$


### Extra: SVD of the linear map

For any vector $x$,

$$
L_A(x)=Ax=U\Sigma V^Tx
=U
\begin{pmatrix}
\sigma_1(v_1\cdot x)\\
\sigma_2(v_2\cdot x)\\
\vdots\\
\sigma_r(v_r\cdot x)\\
0\\
\vdots\\
0
\end{pmatrix}
=\sum_{j=1}^r \sigma_j (v_j\cdot x)u_j.
$$

Once the SVD is known, computing $Ax$ reduces to inner products with $v_j$ and weighted sums of $u_j$.

## Truncated SVD

The alternative form of the SVD is the most important source of the applications of this decomposition. The more latent components you add, the closer you get to the matrix. Each of these partial sums is known as a truncated singular value decomposition. For this reason, for every
$k\in\{1,\dots,r\}$ we define:

$$
A_k=\sum_{j=1}^k \sigma_j\,u_j v_j^T
$$

The important thing about all of this is that, for all
$1\le k\le r$, among all the matrices of rank $k$,
$A_k$ is the one that most resembles $A$. This is the main reason why SVD is used in real applications. You can interpret it as the SVD arranging $A$ into its “most important” and “least important” pieces. For this reason, the largest singular values describe the broad strokes of $A$, whilst the smallest singular values take care of the finer details.

The best way to approximate a high-rank matrix by a low-rank one is by discarding the pieces of its singular value decomposition which have the smallest singular values.

Let's compute the truncated SVD for the matrix from the previous section:

$$
A=
\begin{pmatrix}
1&1&1&1\\
2&2&-2&-2\\
3&-3&3&-3
\end{pmatrix}
$$

Its latent components are:

$$
\sigma_1u_1v_1^T=
\begin{pmatrix}
0&0&0\\
0&0&0\\
2&1&1
\end{pmatrix}
$$

$$
\sigma_2u_2v_2^T=
\begin{pmatrix}
\frac54&\frac52&-2\\
\frac52&5&-1\\
0&0&0
\end{pmatrix}
$$

$$
\sigma_3u_3v_3^T=
\begin{pmatrix}
5&1&0\\
-\frac52&\frac54&0\\
0&0&0
\end{pmatrix}
$$

Then, the best approximations (of rank $1$, $2$ and $3$ respectively) for $A$ are:

$$
A_1=
\begin{pmatrix}
0&0&0\\
0&0&0\\
2&1&1
\end{pmatrix}
$$

$$
A_2=
\begin{pmatrix}
\frac54&\frac52&-2\\
\frac52&5&-1\\
2&1&1
\end{pmatrix}
$$

$$
A_3=
\begin{pmatrix}
1&0&-2\\
0&1&-1\\
2&1&1
\end{pmatrix}
$$

## Image compression

Truncated singular value decomposition often retains a stunningly large level of accuracy even when the values of
$k$ are much smaller than $r$. This is because, in real-world matrices, only a minuscule proportion of singular values are large. As a result,
$A_k$ serves as an accurate approximation of $A$.

This is particularly useful for image compression. A black and white image can be represented as a matrix with values from $0$ to $255$, where $0$ is full black and $255$ equals white. As the numbers increase, lighter and lighter shades are obtained. Let's see truncated SVD in action with this cute panda:

In [None]:
from PIL import Image

Image.open("img/panda/panda_org.webp")

This image corresponds to a $350\times 634$ matrix $A$. Since every column is nearly unique, the rank of $A$ is $350$—the biggest possible. This implies that there are $350$ latent components. The first singular value is the largest, and the first latent component is the best rank-$1$ approximation to the image:

In [None]:
from PIL import Image

img = Image.open("img/panda/panda_k1.webp")
img.thumbnail((634, 350))
img

Perhaps it is not a good approximation, but note that as it is rank $1$, every row is a multiple of any other one—and the same occurs with the columns. Now look at the approximation with $k=5$:

In [None]:
from PIL import Image

img = Image.open("img/panda/panda_k5.webp")
img.thumbnail((634, 350))
img

It is already getting better with only $5$ singular values. But when $k=10,20,50$, the results are amazing.

In [None]:
from PIL import Image

img = Image.open("img/panda/panda_k10.webp")
img.thumbnail((634, 350))
img

In [None]:
from PIL import Image

img = Image.open("img/panda/panda_k20.webp")
img.thumbnail((634, 350))
img

In [None]:
from PIL import Image

img = Image.open("img/panda/panda_k50.webp")
img.thumbnail((634, 350))
img

Using $50$ singular values already gives an excellent result, and note that this is much less than $350$. Since the next singular values are negligible, when $k=100,200$ the approximation is so good that the difference can no longer be distinguished:

In [None]:
from PIL import Image

img = Image.open("img/panda/panda_k100.webp")
img.thumbnail((634, 350))
img

In [None]:
from PIL import Image

img = Image.open("img/panda/panda_k200.webp")
img.thumbnail((634, 350))
img

## Conclusion

When every singular value of $A$ is positive, the matrix is invertible and

$$
A^{-1}=V\Sigma^{-1}U^T.
$$

The geometry of $A^{-1}$ and $A^T$ is closely related to that of $A$.

The four fundamental spaces of $A$ are

$$
\ker(L_A),\quad \operatorname{Im}(L_A),\quad \ker(L_{A^T}),\quad \operatorname{Im}(L_{A^T}),
$$

and the SVD of $A$ gives an orthonormal basis for each of them.

The alternative form of the SVD of $A$ is the sum of its latent components:

$$
A=\sum_{j=1}^r \sigma_j\,u_j v_j^T.
$$

The best way to approximate an $r$-rank matrix $A$ by a $k$-rank one ($k\le r$) is its truncated SVD:

$$
A_k=\sum_{j=1}^k \sigma_j\,u_j v_j^T.
$$

The singular values are ordered non-increasingly:

$$
\sigma_1\ge \sigma_2\ge \cdots\ge \sigma_r>0.
$$

## SVD in python

In [None]:
import numpy as np

# Matrix from the image
A = np.array([
    [1,  1,  1,  1],
    [2,  2, -2, -2],
    [3, -3,  3, -3]
], dtype=float)

# SVD: A = U @ np.diag(s) @ Vt
U, s, Vt = np.linalg.svd(A, full_matrices=True)

print("A shape:", A.shape)
print("U shape:", U.shape)
print("s (singular values):", s)
print("Vt shape:", Vt.shape)

# Build Sigma with same shape as A (m x n)
m, n = A.shape
Sigma = np.zeros((m, n))
Sigma[:len(s), :len(s)] = np.diag(s)

# Verify reconstruction
A_reconstructed = U @ Sigma @ Vt
print("\nReconstruction error (Frobenius norm):", np.linalg.norm(A - A_reconstructed))

# Optional: rank (numerical)
tol = 1e-12
rank = np.sum(s > tol)
print("Numerical rank:", rank)

# Optional: truncated SVD approximation (choose k)
k = 2
A_k = (U[:, :k] * s[:k]) @ Vt[:k, :]
print(f"\nTruncated rank-{k} approximation error:", np.linalg.norm(A - A_k))

## SymPy can compute an exact (symbolic) SVD for that matrix

In [None]:
import sympy as sp

A = sp.Matrix([
    [1, 0, -2],
    [0, 1, -1],
    [2, 1,  1]
])

A.singular_values()

In [None]:
U, S, V = A.singular_value_decomposition()

U, S, V

In [None]:
(U * S * V.T - A).simplify()