# Matrix Spaces 

This text and code is derived from Mike X Cohen's course on linear algebra. For more information, see https://www.udemy.com/linear-algebra-theory-and-implementation/?couponCode=LINALGPX7

## Column Space of a Matrix

Notation: column space is denoted as $C(\mathbf{A})$ 

$C(\mathbf{A})$ is the vector space that is spanned by all of the columns in a matrix. However, these columns **don't have to be a basis**.

Formally, the column space is defined as: 

$$C(\mathbf{A}) = \{\beta_1 \mathbf{a_1} + ... + \beta_n \mathbf{a_n} \}, \beta \in \mathbb{R}$$

$$C(\mathbf{A}) = span(\mathbf{a_1} + ... \mathbf{a_n})$$

Interpretation: all possible linear combinations of all the columns in the matrix. 

## Is v in C(A)?

One of the most important questions in linear algebra is whether a vector **v** is in the column space of matrix A? The answer to this questions lead to another questions: 

### If $\mathbf{v}$ is in $C(\mathbf{A})$, then what are the coefficients?   

This questions could be answered with an equation: 

$$\mathbf{A w} = \mathbf{v}$$

It tells us that v can be constructed from the columns of $\mathbf{A}$ using the weights $\mathbf{w}$

### If $\mathbf{v}$ is not in $C(\mathbf{A})$, then how to get as close as possible to that column space? 

In other words, what coefficients will allow us to minimize the distance? 

For starters, we can write it as follows: 

$$\mathbf{A w} - \mathbf{v} = \mathbf{z}$$,

where $\mathbf{z}$ is some non-zero vector. We can take the magnitued of both sides: 

$$||\mathbf{A w} - \mathbf{v}|| = ||\mathbf{z}||$$,

and then set the objective to minimize that. 

## Row Space of a Matrix


$R(\mathbf{A})$ is the subspace spanned by the rows of A. 
Notation: $R(\mathbf{A})$ OR $C(\mathbf{A^T})$ 

Column space: $\mathbf{A w} = \mathbf{v}$

Row space: $\mathbf{w A} = \mathbf{v}$ (of course, $\mathbf{w}$ and $\mathbf{v}$ are both row vectors in this case). 

**Example**. 

Let's say we have data from N sensors measured at certain time intervals, so there are M observations (each containing data from all sensors - so, N values). That gives us an M by N matrix. 
If we take the column space - $\mathbf{A w}$, we will have the combination of sensors (and if $\mathbf{w}$ is filter, then we will have spatial filtering). 
If we take the row space - $\mathbf{w a}$ and $\mathbf{w}$ is a filter, then we will have temporal filtering. 

## Null Space and Left-Null Spade of a Matrix

Definition: 
$N(\mathbf{A})$ is the set of all vectors $\{ \mathbf{v} \}$ such that $\mathbf{A v = 0}$ and $\mathbf{v \neq 0}$   

**Examples** 


${\begin{bmatrix}
1&1\\
2&2\\
\end{bmatrix}}$, $r(\mathbf{A}) = 1$, $N(\mathbf{A}) = \{ \lambda {\begin{bmatrix}
1\\
-1\\
\end{bmatrix}} \}$

${\begin{bmatrix}
1&1\\
2&1\\
\end{bmatrix}}$, $r(\mathbf{A}) = 2$, $N(\mathbf{A}) = \{ \}$ (nullspace is the empty set)

${\begin{bmatrix}
0&0\\
0&0\\
\end{bmatrix}}$, $r(\mathbf{A}) = 0$, $N(\mathbf{A}) = \{ \lambda_1 {\begin{bmatrix}
0\\
1\\
\end{bmatrix}}, \lambda_2 {\begin{bmatrix}
1\\
0\\
\end{bmatrix}} \}$



Left null space definiton: $N(\mathbf{A^T})$ is the set of all vectors $\{\mathbf{v}\}$ such that $\mathbf{v^T A = 0^T}$ and $\mathbf{v^T \neq 0^T}$

**Interpretation**

Let $A = {\begin{bmatrix}
a&b\\
c&d\\
\end{bmatrix}}$

If $\mathbf{v} \notin N(\mathbf{A})$, then left-multiplying this vector by some compatible transformation matrix will produce some other vector in that space (provided, it's not in the null space of any of these transformation matrices).

If $\mathbf{v} \in N(\mathbf{A})$, then no matter what transformation matrices we have, it will produce $\mathbf{0}$. So, it's like blackhole! (Special thanks to Mike X Cohen for that awesome analogy). :) 

## Orthogonality of Column/Left-null Space and Row/Null Spaces

Quick recap: 

Orthogonality of vectors can be established when their dot product is equal to 0 (means that the angle between them is 90, so the cosine is 0): 

$$ \alpha = \mathbf{a^T b} = ||\mathbf{a}|| ||\mathbf{b}|| \cos(\theta_{\mathbf{ab}}) = 0 \implies \mathbf{a \perp b}$$

When we say that a vector is orthogonal to a colum space (i.e. $\mathbf{v} \perp C(\mathbf{A})$), we mean that: 

$$\mathbf{v^T} \{\alpha_1 \mathbf{a_1} + \alpha_2 \mathbf{a_2} + ... + \alpha_n \mathbf{a_n}\} = 0$$

In other words, vector $\mathbf{v}$ is orthogonal to any possible linear combination of the columns of A.

Now, recall the definition of the left-null space:

$$\mathbf{v^T A = 0^T}$$

We can transpose both sides and rewrite it as: 

$$\mathbf{A^T v = 0}$$

Vector $\mathbf{v}$ is orthogonal to each and every column in $\mathbf{A}$ (or row in $\mathbf{A^T}$). So, we can say that left-null space is orthogonal to the column space of $\mathbf{A}$

To summarize:

$$N(\mathbf{A^T}) \perp C(\mathbf{A})$$    

$$N(\mathbf{A}) \perp R(\mathbf{A})$$    

## Dimensions of Column/Row/Null Spaces 

**Example**

$\mathbf{A} = {\begin{bmatrix}
1&1\\
2&2\\
\end{bmatrix}}$, $\mathbf{A^T} = {\begin{bmatrix}
1&2\\
1&2\\
\end{bmatrix}}$ 

The ambient dimensionality is 2 (we take N M-dimensional vectors)
$N(\mathbf{A^T}) = \{ \lambda {\begin{bmatrix}
-2\\
1\\
\end{bmatrix}} \}$, $N(\mathbf{A^T}) \perp C(\mathbf{A})$

The column space and the left-null space toghter must span the whole ambient space (full dimensionality) $\implies$

$$\dim(C(\mathbf{A})) + \dim(N(\mathbf{A^T})) = M$$

$$\dim(C(\mathbf{A^T})) + \dim(N(\mathbf{A})) = N$$

for $\mathbf{A} \in \mathbb{R}^{M \times N}$. Note that columns are in $\mathbb{R}^M$ and rows are in $\mathbb{R}^N$


## Example of the Four Subspaces

Let $\mathbf{A} \in \mathbb{R}^{M \times N}$

There are two ambient spaces for $\mathbf{A}$: 

- $\mathbb{R}^M: C(\mathbf{A}) \cup N(\mathbf{A^T})$ (together span the whole $\mathbb{R}^M$. 
- $\mathbb{R}^N: R(\mathbf{A}) \cup N(\mathbf{A})$

**Example**  

$\mathbf{A} = {\begin{bmatrix}
1&2&0\\
3&3&-3\\
\end{bmatrix}}$

*Column space of $\mathbf{A}$* 

$$ C(\mathbf{A}) = \{ {\begin{bmatrix}
1\\
3\\
\end{bmatrix}}, {\begin{bmatrix}
2\\
3\\
\end{bmatrix}}\} \in \mathbb{R}^2$$ (as any two vectors out of three would do)

$$\dim(C(\mathbf{A}))=2$$

Other ways to define the column space: 

$$ C(\mathbf{A}) = span( \{ {\begin{bmatrix}
1\\
3\\
\end{bmatrix}}, {\begin{bmatrix}
2\\
3\\
\end{bmatrix}} \}) $$

$$ C(\mathbf{A}) = \lambda {\begin{bmatrix}
1\\
3\\
\end{bmatrix}} + \beta {\begin{bmatrix}
2\\
3\\
\end{bmatrix}}, \lambda, \beta \in \mathbb{R}$$

*Row space of $\mathbf{A}$*

$$R(\mathbf{A}) = \{ {\begin{bmatrix}
1\\
2\\
0\\
\end{bmatrix}}^T, {\begin{bmatrix}
3\\
3\\
-3\\
\end{bmatrix}}^T\} \in \mathbb{R}^3$$


$$\dim(R(\mathbf{A}))=2$$


*Left-null space of $\mathbf{A}$*

$$N(\mathbf{A^T}) = \{ \} \in \mathbb{R}^2$$ (an empty set)


$$\dim(N(\mathbf{A^T}))=0$$

*Null Space of $\mathbf{A}$*

We know that the null space should be orthogonal to the row space - to every row vector. So we need to find a vector such that the dot product of this vector with any of the vectors in the row space produces zero. 
We can find the first two coordinates from the first row: let them be 2 and -1. Now, to produce zero with the second row, we need to set the third coordinate to 1. Thus, the null space is: 

$$N(\mathbf{A^T}) = \{ {\begin{bmatrix}
2\\
-1\\
1\\
\end{bmatrix}}^T \} \in \mathbb{R}^3$$


$$\dim(N(\mathbf{A}))=1$$

## Ax = b and Ax = 0 Revisited 

Some alternative notations: 
- We can safely rewrite $\mathbf{A x = b}$ as $\mathbf{A X = B}$, where $\mathbf{X}$ and $\mathbf{B}$ are Nx1 matrices. 
- In statistics it's often $\mathbf{X} \beta = \mathbf{y}$, where we look for $\beta$ rather than $\mathbf{X}$. Here, $\mathbf{X}$ is the desing matrix, $\mathbf{y}$ is the observed data and $\beta$ are regression coefficients. 


$\mathbf{A x = b}$ has a solution if $b \in C(\mathbf{A})$. And when $b \notin C(\mathbf{A})$, we're interested in finding the closest solution: 

$$\mathbf{A \hat{x} = \hat{b}}$$,
($\hat{\mathbf{b}}$ is chosen such that $\hat{\mathbf{b}} \in C(\mathbf{A})$), and we're minimizing $||\mathbf{\hat{b}-b}||$.

As for $\mathbf{A x = 0}$, we often want to solve: 
$(\mathbf{A} - \lambda \mathbf{I}) \mathbf{x = b}$, where $\lambda$ is the eigenvalue and $\mathbf{x}$ is the eigenvector. The solution to this equation is used for PCA, SVD, FDA, etc. 