# Understanding the Singular Value Decomposition (SVD)

## 1. The Basic Idea

For any $m×n$ matrix A, there exist matrices U, Σ, and V such that
$$
A = U\,\Sigma\,V^{\mathsf T}
$$

Where
- $U$ is an $m \times m$  **orthogonal** matrix,
- $V$ is an $n \times n$ **orthogonal** matrix,
- $Σ$ is an $m \times n$ **diagonal** matrix with non-negative entries.

Orthogonal means
- Columns are orthonormal vectors
- $U^{\mathsf T}U = I, \qquad V^{\mathsf T}V = I$
- Multiplying by U or V is like a rotation and possibly reflection: it preserves lengths and angles

"Diagonal" for a rectangular matrix $Σ$ means: all off-diagonal entries are zero, only positions $(1,1), (2,2)$, $\dots$  may be nonzero, up to $\min(m,n)$.

SVD decomposes any linear map into:
1. Rotate/reflect the input: $V^{\mathsf T}$
2. Scale along axes: $Σ$
3. Rotate/reflect the output: $U$

In other words:
$$
\text{rotation} \;\rightarrow\; \text{scaling} \;\rightarrow\; \text{rotation}.
$$

## 2. What does this mean geometrically?

Think of $A$ as a transformation that takes vectors in $\mathbb{R}^n$ to vectors in $\mathbb{R}^m$.

SVD says: you can always write that transformation as three simpler steps:

$$
\mathbb{R}^n
\;\xrightarrow{\;V^{T}\;}\;
\mathbb{R}^n
\;\xrightarrow{\;\Sigma\;}\;
\mathbb{R}^m
\;\xrightarrow{\;U\;}\;
\mathbb{R}^m.
$$

1.	Apply $V^{T}$:

    Rotate/reflect the input space (no stretching yet).

2. Apply $\Sigma$:

    Stretch/compress along coordinate axes by non-negative factors $\sigma_1, \sigma_2, \dots$ (the diagonal entries). Some directions may be collapsed to zero.

3. Applly $U$:

    Rotate/reflect the result in the output space.

So: SVD decomposes any linear map into rotation → axis-aligned scaling → rotation.

## 3. The pieces: $U$, $\Sigma$, $V$

### 3.1 $\Sigma$: singular values

$\Sigma$ looks like

$$
\Sigma = \begin{pmatrix} \sigma_1 & 0 & \dots & 0 \\ 0 & \sigma_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma_r \end{pmatrix}.
$$

$$
\Sigma =
\begin{pmatrix}
\sigma_1 & 0 & \dots & 0 \\
0 & \sigma_2 & \dots & 0 \\
\vdots & \vdots & \ddots & \vdots\\
0 & 0 & \dots & \sigma_r\\
\vdots & \vdots & & \vdots
\end{pmatrix},
$$

where $\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_r > 0$, remaining entries are 0, r = $\text{rank}(A)$

These $\sigma_i$ are called the singular values of A.

The key fact from the table in your screenshot:

Singular values of A are the non-negative square roots of the eigenvalues of $A A^{T}$ (or equivalently of $A^{T} A$; both have the same non-zero eigenvalues).

Why?
- $A A^{T}$ is $m \times m$, symmetric and positive semi-definite.
- $A^{T} A$ is $n \times n$, also symmetric and positive semi-definite.
- Symmetric positive semi-definite matrices have real, non-negative eigenvalues and an orthonormal basis of eigenvectors.

If $\lambda$ is an eigenvalue of $A^{T}A$, then $\lambda \ge 0$, and we define
$\sigma = \sqrt{\lambda}$ to be a singular value.

So:
$$
\sigma_i = \sqrt{\lambda_i}, \quad i = 1, 2, \dots, r.
\text{eigenvalues of } A^{T}A \quad \Rightarrow \quad \text{singular values of } A
$$

## 3.2 Columns of $U$: left singular vectors

The columns of $U$ are called the left singular vectors of $A$.

From the table:

Left singular vectors of A are the eigenvectors of $A A^{T}$.

That is, if

$$
A A^{T} u_i = \lambda_i u_i,
$$
then
$$
u_i = \text{column } i \text{ of } U, \quad
\sigma_i = \sqrt{\lambda_i}.
$$

So the $u_i$'s form an orthonormal basis of $\mathbb{R}^m$ aligned with how $A$ acts in the output space.


## 3.3 Columns of $V$: right singular vectors

The columns of V are the right singular vectors of $A$.

From the table:

Right singular vectors of A are the eigenvectors of $A^{T} A$.

So if

$A^{T} A v_i = \lambda_i v_i$,

then
$$
v_i = \text{column } i \text{ of } V, \quad
\sigma_i = \sqrt{\lambda_i}.
$$
The $v_i$’s form an orthonormal basis of $\mathbb{R}^n$ aligned with the important directions in the input space.


## 4. How the pieces relate: the core SVD identity

For each nonzero singular value $\sigma_i$, we have a pair of unit vectors $u_i$ and $v_i$ such that
$$
A v_i = \sigma_i u_i.
$$
This is the essence of SVD:
	•	$v_i$ is a direction in the input space.
	•	A sends that direction to the output direction $u_i$, scaled by $\sigma_i$.

In matrix form, you can write:
$$
A = \sum_{i=1}^{r} \sigma_i \, u_i v_i^{T}.
$$
This is a sum of rank-1 matrices, each stretching one input direction into one output direction.


## 5. Why this is “the best decomposition”

People often say **SVD is the “best” decomposition** because:

1. **Works for any matrix**

   Square, rectangular, singular, fat, tall — SVD always exists.

2. **Reveals the rank**

   The number of nonzero singular values

   $$
   \sigma_i \neq 0
   $$
   is exactly the **rank** of $A$.

3. **Gives the optimal low-rank approximation**

   If you keep only the largest $k$ singular values and their singular vectors,
   you obtain the **best possible rank-$k$** approximation of $A$ in the sense of minimizing
   $$
   |A - A_k|_2 \quad \text{and} \quad \|A - A_k\|_F.
   $$

   This is the **Eckart–Young–Mirsky theorem**.


## Computing $AA^T$ and $A^T A$

We calculate the products of the matrix with its transpose.

$$
A = \begin{pmatrix} 3 & 0 \\ -1 & 2 \end{pmatrix}, \quad
A^T = \begin{pmatrix} 3 & -1 \\ 0 & 2 \end{pmatrix}
$$

**1. Compute $A A^T$:**

$$
A A^T =
\begin{pmatrix} 3 & 0 \\ -1 & 2 \end{pmatrix}
\begin{pmatrix} 3 & -1 \\ 0 & 2 \end{pmatrix}
= \begin{pmatrix} 9 & -3 \\ -3 & 5 \end{pmatrix}
$$

**2. Compute $A^T A$:**

$$
A^T A =
\begin{pmatrix} 3 & -1 \\ 0 & 2 \end{pmatrix}
\begin{pmatrix} 3 & 0 \\ -1 & 2 \end{pmatrix}
= \begin{pmatrix} 10 & -2 \\ -2 & 4 \end{pmatrix}
$$

In [None]:
A = [[3, 0],[-1, 2]]

In [None]:
A = np.array(A)

In [None]:
A_At = A @ A.transpose()

In [None]:
A_At

In [None]:
At_A = A.transpose() @ A
At_A

## Finding the Eigenvalues

Since $A A^T$ and $A^T A$ share the same non-zero eigenvalues, we can solve the characteristic equation for one of them, say $A^T A$.

$$
A^T A = \begin{pmatrix} 10 & -2 \\ -2 & 4 \end{pmatrix}
$$

**1. Characteristic Equation:**

We find $\lambda$ such that $\det(A^T A - \lambda I) = 0$:

$$
\det \begin{pmatrix} 10 - \lambda & -2 \\ -2 & 4 - \lambda \end{pmatrix} = 0
$$

**2. Compute Determinant:**

$$
(10 - \lambda)(4 - \lambda) - (-2)(-2) = 0 \\
40 - 14\lambda + \lambda^2 - 4 = 0 \\
\lambda^2 - 14\lambda + 36 = 0
$$

**3. Solve Quadratic:**

Using the quadratic formula $\lambda = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$:

$$
\lambda = \frac{14 \pm \sqrt{196 - 144}}{2} = \frac{14 \pm \sqrt{52}}{2} = 7 \pm \sqrt{13}
$$

So the exact eigenvalues are:

$$
\lambda_1 = 7 + \sqrt{13} \approx 10.606
$$
$$
\lambda_2 = 7 - \sqrt{13} \approx 3.394
$$


In [None]:
# Compute eigenvalues for both matrices
# Note: They should be identical (except for numerical precision issues)
evals_A_At = np.linalg.eigvals(A_At)
evals_At_A = np.linalg.eigvals(At_A)

print("Eigenvalues of A A^T:", np.sort(evals_A_At)[::-1])
print("Eigenvalues of A^T A:", np.sort(evals_At_A)[::-1])

We found that the eigenvalues of both $A A^{T}$ and $A^{T} A$ are
$$
\lambda_1 = 7 + \sqrt{13} \approx 10.60555128, \qquad
\lambda_2 = 7 - \sqrt{13} \approx 3.39444872.
$$

The singular values are the non‑negative square roots of these eigenvalues:
$$
\sigma_1 = \sqrt{\lambda_1} = \sqrt{7 + \sqrt{13}}, \qquad
\sigma_2 = \sqrt{\lambda_2} = \sqrt{7 - \sqrt{13}}.
$$

Numerically,
$$
\sigma_1 \approx \sqrt{10.60555128} \approx 3.2573,
\qquad
\sigma_2 \approx \sqrt{3.39444872} \approx 1.8410.
$$

Putting these into the diagonal of $\Sigma$, we get
$$
\Sigma =
\begin{pmatrix}
\sigma_1 & 0 \\
0 & \sigma_2
\end{pmatrix}
=
\begin{pmatrix}
\sqrt{7 + \sqrt{13}} & 0 \\
0 & \sqrt{7 - \sqrt{13}}
\end{pmatrix}
\approx
\begin{pmatrix}
3.2573 & 0 \\
0 & 1.8410
\end{pmatrix}.
$$

In [None]:
import math

In [None]:
Σ = np.array([[math.sqrt(evals_A_At[0]), 0], [0, math.sqrt(evals_A_At[1])]])
Σ

In [None]:
eigvals, U = np.linalg.eig(A_At)

print("Eigenvalues:\n", eigvals)
print("\nEigenvectors (columns):\n", U)


In [None]:
eigvals, V = np.linalg.eig(At_A)

print("Eigenvalues:\n", eigvals)
print("\nEigenvectors (columns):\n", U)

In [None]:
A_ = U @ Σ @ V.transpose()
A_