# Overview of Gilbert Strang's 2018 Matrix Methods course

- Linear algebra => Optimisation => Deep learning
- Linear Algebra => Statistics => Deep learning
- ["Learning from data" book](math.mit.edu/learningfromdata)

## Lecture 01: The column space of $A$ contains all vectors $A x$

- Think of the product $A x$ as a linear combination of the columns of $A$: $A x = A_{:,1} x_1 + A_{:,2} x_2 + \ldots + A_{:,n} x_n$
- $A = C R$ where the columns of $C$ form a basis for $C(A)$, and each column in $C$ is a column of $A$; then $R$ is the first $rank(A)$ rows of (a column-permutation of) $rref(A)$.
- Given $C$, a matrix formed from $r = rank(A)$ l.i. columns of $A$, and $R$, a matrix formed from $r$ l.i. rows of $A$, then there is a matrix $U$ such that $A = C U R$, and $U$ is an $r \times r$ invertible matrix
  - Question: Are there any more properties of $U$?

## Lecture 02: Multiplying and factoring matrices

### 5 key factorisations

- $A = L U$ -- Elimination
- $A = Q R$ -- Gram-Schmidt decomposition
- $S = Q \Lambda Q^T$ -- Spectral theorem (for symmetric matrices $S$)
- $A = X \Lambda X^{-1}$ -- Doesn't work for all matrices
- $A = U \Sigma V^T$ -- Singular Value Decomposition; works for all matrices; orthogonal * diagonal * orthogonal

### LU decomposition in rank-1 picture

- $A = l_1 u_1^T + \begin{pmatrix}0 & 0 \\ 0 & l_2 u_2^T \end{pmatrix} + \begin{pmatrix}0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & l_3 u_3^T \end{pmatrix} + \ldots $

### Orthogonality of fundamental spaces of a matrix $A$

- $C(A^T)$ is orthogonal to $N(A)$ 
  - i.e. because each row in $A$ is orthogonal to any vector in $N(A)$
- $C(A)$ is orthogonal to $N(A^T)$



## Lecture 03: Orthonormal columns in $Q$ give $Q^T Q = I$

$Q$ is used to denote a matrix with orthonormal columns - that is, $q_{:,i}^T q_{:,j} = \delta_{i,j}$.

Thus:
- $Q^T Q = I_m$, and
- $Q Q^T = \begin{pmatrix}I_n & 0 \\ 0 & 0\end{pmatrix}$

If $Q^T Q = Q Q^T = I$, then $Q$ is 'orthogonal'.

### Orthogonal matrices preserve length under $l_2$

i.e. $|Q x| = |x|$

**proof**: $|Q x|^2 = |(Q x)^T (Q x)| = |x^T (Q^T Q) x| = |x^T x| = |x|^2$

### Examples of orthogonal matrices

#### rotation matrices
$\begin{pmatrix} cos{\theta} & sin{\theta} \\ -sin{\theta} & cos{\theta}\end{pmatrix}$

Rotates anti-clockwise by $\theta$ around the origin in 2-d
  
#### reflection matrices
$\begin{pmatrix} cos{\theta} & sin{\theta} \\ sin{\theta} & -cos{\theta}\end{pmatrix}$

Reflects plane in the line at $\theta/2$

#### "Householder reflections"
Given unit vector $u$, then $H = I - 2 u u^T$

#### "Hadamard" matrices

$H_2 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}$

$H_{2^n} = \frac{1}{\sqrt{2}} \begin{pmatrix} H_{2^{n-1}} & H_{2^{n-1}} \\ H_{2^{n-1}} & -H_{2^{n-1}} \end{pmatrix}$

**Conjecture**: There is an orthogonal matrix of size $n \times n$ with entries $1$ and $-1$ for $n$ a multiple of $4$ --- known up to $n=668$.

#### Wavelets

$W_4 = \begin{pmatrix}
1 & 1 & 1 & 0 \\
1 & 1 &-1 & 0 \\
1 &-1 & 0 & 1 \\
1 &-1 & 0 &-1
\end{pmatrix}$

(with some scaling on the columns to make them orthonormal)

Haar invented in 1910; Ingrid Daubechies 1988 - found families of wavelets with entries that were not just 1 and -1.

#### Eigenvectors of a symmetric matrix

Example: discrete fourier transform is the matrix of eigenvectors of $Q^T Q$, with $Q = P_{2,3,\ldots,n-1,n,1}$ (i.e. $Q$ is the permutation matrix that puts row 2 in row 1, row 3 in row 2, etc.)

*I didn't understand this bit*

## Lecture 04: Eigenvalues and Eigenvectors

Useful because they allow you to work with powers of matrices.

The eigenvectors of a general matrix $A$ are not necessarily orthogonal to each other. (*Find an example*)

### Similar matrices
**Definition**: $B$ is *similar* to $A$ if there exists a matrix $M$ such that $B = M^{-1} A M$.

If $B$ is similar to $A$, then they have the same eigenvalues. (Easy to prove)

Corollary: (invertible) $A B$ and $B A$ have the same non-zero eigenvalues.  (Just use $M = B$).

Computing eigenvalues of $A$ --- usually involves picking better and better values of $M$ to find a triangular matrix similar to $A$. (*How does this work?*)

### (Real) Symmetric matrices
- Have real eigenvalues (*prove this*)
- Have orthogonal eigenvectors (*prove this*)
- And thus $Q$, the matrix with eigenvectors as columns is orthonormal, and
- $S = Q \Lambda Q^T$

## Lecture 05: Positive definite and semidefinite matrices

### Symmetric positive definite matrices
Equivalent definitions:
 1. All eigenvalues are real and positive ($\lambda_i > 0$)
 2. Energy $x^T S x > 0$ for any $x \ne 0$
 3. $S = A^T A$ (independent columns in $A$)
 4. All leading determinants are > 0
 5. All pivots in elimination are > 0
 
**Example:**
$S = \begin{pmatrix}3 & 4 \\ 4 & 5\end{pmatrix}$ is *not* positive definite (its determinant is $|A| = 15 - 16 = -1$).

$x^T S x$ is a quadratic form -- if $S$ is positive definite, then $S$ is convex with a unique minimum.

### Symmetric positive semi-definite matrices
Equivalent definitions:
 1. All eigenvalues are real and positive or zero ($\lambda_i \ge 0$)
 2. Energy $x^T S x \ge 0$ for any $x \ne 0$
 3. $S = A^T A$ (dependent columns allowed in $A$)
 4. All leading determinants are $\ge 0$
 5. All pivots in elimination are $\ge 0$

## Lecture 06: Singular Value Decomposition

Like eigenvalues, but works for rectangular and singular matrices

For a symmetric matrix, e'vals and e'vecs exist and are complete

For general square matrix, not the case

For rectangular matrix, certainly not

Under singular value decomp: $A = U \Sigma V^T$, where $\Sigma = \textrm{diag}(s_1, \ldots, s_n)$ where $s_i$ are the 'singular values', and are all positive.

### The details

1. The key observation is that $A^T A$ is symmetric, square, and positive semi-definite.
2. Thus: $A^T A$ can be decomposed: $A^T A = V \Lambda V^T$, with $V$ orthogonal, and $\Lambda$ positive.
3. We also have $A A^T$ is symmetric and positive semi-definite, and we have $A A^T = U \Lambda U^T$
4. Now look for $A v_i = \sigma_i u_i$, where the $v_i$ and $u_i$ are sets of orthogonal vectors.

"We're looking for one set of orthogonal vectors in the 'input space' of $A$, and a set of orthogonal vectors in the 'output space' of $A$ that transform to each other via $A$".

We then have $A V = U \Sigma$, which then gives us $A = U \Sigma V^T$.

Now... what are the $V$s and what are the $U$s.

We can see that if $A = U \Sigma V^T$ then $A^T A = V \Sigma^T U^T U \Sigma V^T = V \Sigma^2 V^T = V \Lambda V^T$ --- that is, the $V$s are the eigenvectors of $A^T A$, and the $\Lambda = \Sigma^2$ are the eigenvalues of $A^T A$. Similarly for $A A^T$.

Haven't quite finished: need to deal with the case of repeated eigenvalues --- and hence have 'eigenspaces'. Need to pick the appropriate eigenvectors from these spaces to satisfy $A v_i = \sigma_i u_i$.

We do this fixing the $v$s to be a particular set of eigenvectors of $A^T A$, and then solving $u_i = A v_i / \sigma_i$ - this ensures that whatever choices we make for the $v_i$, we get the appropriate $u_i$ in the degenerate cases.

Finally just need to show that the $u_i$'s picked in this way are orthogonal: $u_i^T u_i = \frac{v_i^T A^T A v_j}{\sigma_i \sigma_j} = v_i^T v_j \frac{\sigma_j^2}{\sigma_i \sigma_j} = \delta_{i,j}$

### Geometry of the SVD

"Every matrix factors into a rotation, then 'stretch', then rotation"

![test](./images/svd_geometry.png)