# SINGULAR VALUE DECOMPOSITION (SVD)

# Content
- four fundamental spaces of linear algebra
- under-determined and over-determined least squares regressions
- principal components analysis (PCA)

The singular value decomposition (SVD) is a work-horse in applications of least squares projection that form foundations for many statistical and machine learning methods. After defining the SVD, we’ll describe how it connects to four fundamental spaces of linear algebra, under-determined and over-determined least squares regressions, and principal components analysis (PCA).

## The Setting

Let $X$ be an $m\times n$ matrix of rank $p$.

Necessarily, $p\leq min(m,n)$. In much of this notebook, we’ll think of $X$ as a matrix of data in which
- each column is an **individual** – a time period or person, depending on the application
- each row is a **random variable** describing an attribute of a time period or a person, depending on the application

We’ll be interested in two situations
- A short and fat case in which $m << n$, so that there are many more columns (individuals) than rows (attributes).
- A tall and skinny case in which $m >> n$, so that there are many more rows (attributes) than columns (individuals).

We’ll apply a **singular value decomposition** of $X$ in both situations. 

In the $m << n$ case in which there are many more individuals $n$ than attributes $m$, we can calculate sample moments of a joint distribution by taking averages across observations of functions of the observations.

In this $m << n$ case, we’ll look for **patterns** by using a **singular value decomposition** to do a **principal components analysis** (PCA).

In the $m >> n$ case in which there are many more attributes than individuals $n$ and when we are in a time-series setting in which $n$ equals the number of time periods covered in the data set $X$, we’ll proceed in a different way. We’ll again use a singular value decomposition, but now to construct a **dynamic mode decomposition** (DMD)

## Singular Value Decomposition

A **singular value decomposition** of an $m\times n$ matrix $X$ of rank $p\leq min(m,n)$ is
$$
\begin{equation}
X = U\Sigma V^T
\end{equation}\tag{5.1}
$$
where
$$
UU^T = I, U^TU = I
$$
$$
VV^T = I, V^TV = I
$$
and
- $U$ is an $m\times m$ orthogonal matrix of **left singular vectors** of $X$
- Columns of $U$ are eigenvectors of $XX^T$
- $V$ is an $n\times n$ orthogonal matrix of r**ight singular vectors** of $X$
- Columns of $V$ are eigenvectors of $X^TX$
- $\Sigma$ is an $m\times n$ matrix in which the first $p$ places on its main diagonal are positive numbers $\sigma_1, \sigma_2,...,\sigma_p$ called **singular values**; remaining entries of $\Sigma$ are all zero
- The $p$ singular values are positive square roots of the eigenvalues of the $m\times m$ matrix $XX^T$ and also of the $n\times n$ matrix $X^TX$ 
- We adopt a convention that when $U$ is a complex valued matrix, $U^T$ denotes the **conjugate-transpose** or He**rmitian-transpose** of U, meaning that $U_{ij}^T$ is the complex conjugate of $U_{ji}$.
- Similarly, when $V$ is a complex valued matrix, $V^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $V$

The matrices $U, \Sigma, V$ entail linear transformations that reshape in vectors in the following ways:

- multiplying vectors by the unitary matrices $U$ and $V$ **rotates** them, but leaves **angles between vectors** and **lengths of vectors** unchanged.

- multiplying vectors by the diagonal matrix $\Sigma$ leaves **angles between vectors** unchanged but **rescales** vectors.

Thus, representation (5.1) asserts that multiplying an $n\times 1$ vector $y$ by the matrix $m\times n$ amounts to performing the following three multiplications of $y$ sequentially:

- **rotating** $y$ by computing $V^Ty$
- **rescaling** $V^Ty$ by multiplying it by $\Sigma$
- **rotating** $\Sigma V^Ty$ by multiplying it by $U$

This structure of the $m\times n$ matrix $X$ opens the door to constructing systems of data **encoders** and **decoders**.

Thus,

- $V^Ty$ is an encoder
- $\Sigma$ is an operator to be applied to the encoded data
- $U$ is a decoder to be applied to the output from applying operator $\Sigma$ to the encoded data

We’ll apply this circle of ideas later in this notebook when we study Dynamic Mode Decomposition.

**Road Ahead**

What we have described above is called a **full** SVD.

In a full SVD, the shapes of $U, \Sigma$, and $V$ are $(m,m)$, $(m,n)$, $(n,n)$, respectively. Later we’ll also describe an **economy** or **reduced** SVD.

Before we study a **reduced** SVD we’ll say a little more about properties of a **full** SVD.