_Goal: Explore Linear Algebra and its relations to deeper math: groups, representations, graphs, probability, analysis, geometry, algebras,  ... ???_

# Multiplication

### Inner/Dot product

Is the similarity between two vectors. How much are they pointing in the same direction?

$$
a \cdot b = \parallel a \parallel_2 \parallel  b \parallel_2 cos (\phi) \tag{proof?!?}
$$

### Matmul

> ??? what are we doing here?

If $C = AB$ then we are taking the dot product of rows of A and columns of B. So each element of C is a measure of similarity between rows/columns of A/B. So, the elements of the first row of C are the similarities between the first row of A and each column of B.


A matrix multiplication by a vector is simply a linear combination of the column space. This implies that $rank(A) \leq rank(M)$ (proof?)

$$
\begin{align}
A = M\vec x &= 
\begin{bmatrix}
\vec c_{1} & \vec c_{2} & \vec c_{3} & \vec c_{4} \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2  \\
x_3 \\
x_4 \\
\end{bmatrix} \\
&= \begin{bmatrix}
x_1\vec c_{1} + x_2\vec c_{2} + x_3\vec c_{3} + x_4\vec c_{4} \\
\end{bmatrix}
\end{align}
$$


### Examples and play

ABC = ?

## Kronecker product

> This is what we mean when we combine two independent states.

$$A\otimes B = 
\begin{bmatrix}
a_{11}B & ... & a_{1n}B \\
\vdots & \ddots & \vdots \\
a_{m1}B & ... & a_{mn}B \\
\end{bmatrix}
$$

```julia
function kronecker(A::Array,B::Array)
    n,m = size(A)
    p,q = size(B)
    C = zeros((n*p,q*m))
    for i in 1:n
        for j in 1:m
            C[ p*(i-1)+1:p*i , q*(j-1)+1:q*j] += A[i,j].*B
        end
    end
    return C
end
```

#### Exercises

Show that
* $(A \otimes B)(C \otimes D) = AC \otimes BD$
* $vec(ABC) = (C^T \otimes A ) vec(B)$
* $AX=B \implies (I\otimes A)vec(X) = vec(B)$
* $A\otimes B = U_A\Sigma_A V_A^T \otimes U_B\Sigma_B V_B^T = (U_A \otimes U_B)(\Sigma_A \otimes \Sigma_B )(V_A^T \otimes V_B^T)$ with some reordering?!?
* $tr(x \otimes y) = x \cdot y$ if x and y are the same shape????

# Eigen vecs/vals

Now that we have the covariances between variables/features of X, we can find their eigen values/vectors. Eigenvectors of a matrix, M, are directions that are not rotated when multiplied by M -- $\mathbf{A}\vec{x} = a\vec{x}$. So, the eigenvectors of our covariance matrix are directions that ???

Why does it even make sense that there are m eigen vectors? It seems weird that there must be directions that are invariant to rotation. Why should there be eigenvectors? It seems kinda strange.

$Ax = ax \implies (A-aI)x = 0$

* Hmm. Eigen vectors are not unique??
* What does it mean when eigen values are negative. 
    * Given small positive diagonals and large off-diagonals.
    * Given negative diagonals. And small off-diagonals.
* or imaginary?

Why does covar matrix have positive eigen values, but symetric ones dont necessarily.


$$
\begin{align}
Av &= \lambda v\\
\begin{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22} \\
\end{bmatrix}
\begin{bmatrix}
v_{11} \\ v_{21} \\
\end{bmatrix} &=
\lambda
\begin{bmatrix}
v_{11} \\ v_{21} \\
\end{bmatrix}\tag{let A = ...}\\
\implies 0 &= Av-\lambda v \\
&= (A-\lambda I)v \\
&= det(A-\lambda I) \tag{why?!?} \\
0& = det(\begin{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22} \\
\end{bmatrix}
-\begin{bmatrix}
\lambda & 0 \\ 
0 & \lambda \\
\end{bmatrix})\\
0&= (a_{11}-\lambda)(a_{22}-\lambda) - a_{12}a_{21} \\
&= a_{11}a_{22} - a_{11}\lambda - a_{22}\lambda + \lambda^2  \\
&= \lambda^2 + (-a_{11} - a_{22})\lambda + (- a_{12}a_{21})\\
x&={\frac {-b\pm {\sqrt {b^{2}-4ac\ }}}{2a}}\\
&={\frac {-(-a_{11} - a_{22})\pm {\sqrt {(-a_{11} - a_{22})^{2}-4(- a_{12}a_{21})\ }}}{2}}\\
&={\frac {a_{11} + a_{22}\pm {\sqrt {a_{11}^2 -2a_{11}a_{22}+ a_{22}^2+4 a_{12}a_{21}\ }}}{2}}\\
\end{align}
$$

But what does this mean intuitively? What is the geomentric interpretation?
* Well, just diagonal entries scale the vector.
* Off-diagonal entries add in information from other dimensions.

What about some matrix transforms?

# Whitening data

Let; 
* E be a matrix of stacked eigenvectors
* V be the eigenvalues
* M be some matrix e.g. covariance

We can write the diagonalized covariance as: $V = E^TME$. 
So decorrelated variables, is the set eigenvalues. Why/how does this make sense? Isnt this almost exactally what PCA is doing???

Therefore if we set $y= E^Tx$ then y is a decorelated representation of x. (???)

##### Questions
* Can all matrices be decorrelated? Does this depend on rank?
* Where do the damned eigenvectors come from? Why is it that there (must??) exist a set of orthogonal axes that each ...

##### Resources
* http://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
* https://theclevermachine.wordpress.com/2013/03/30/the-statistical-whitening-transform/

# Singular value decomposition
Used as an alternative way to find eigen vectors? Why?

From before. $Av = \lambda v \therefore V\Lambda V^{-1} = A$

##### Positive semidefinite normal
Must matrix to be factored must be a positive semidefinite normal matrix. What garuntee do we have that our matrix, M, will satisfy this requirement? Well, we are doing SVD on a correlation matrix. A correlation matrix is symmetric by definition, and ??? (which is why we always have positive values for our principle components/eigen values??)

##### M = USV

> * The columns of V (right-singular vectors) are eigenvectors of M∗M.
* The columns of U (left-singular vectors) are eigenvectors of MM∗.
* The non-zero elements of Σ (non-zero singular values) are the square roots of the non-zero eigenvalues of M∗M or MM∗.

Show/prove? this

Let Ax = ax, aka x is an eigen vector of A. Then let A = USV, then Ux = ? = xV = x? As eigen vectors are only scaled, not rotated. 
What does U and V mean? I know they are rotations, (why??) but where do they rotate us?

### How to find U,S,V

How do you find these?? Maybe that will shed some light on what they are.

$\mathbf {M} =\sum _{i}\mathbf {A} _{i}=\sum _{i}\sigma _{i}\mathbf {U} _{i}\otimes \mathbf {V} _{i}^{\dagger }$ where $A = u\otimes v$.


### Relation to eigenvalue decomposition

<i>
> Given an SVD of M, as described above, the following two relations hold:
$$\begin{aligned}\mathbf {M} ^{*}\mathbf {M} &=\mathbf {V} {\boldsymbol {\Sigma }}^{*}\mathbf {U} ^{*}\,\mathbf {U} {\boldsymbol {\Sigma }}\mathbf {V} ^{*}=\mathbf {V} ({\boldsymbol {\Sigma }}^{*}{\boldsymbol {\Sigma }})\mathbf {V} ^{*}\\\mathbf {M} \mathbf {M} ^{*}&=\mathbf {U} {\boldsymbol {\Sigma }}\mathbf {V} ^{*}\,\mathbf {V} {\boldsymbol {\Sigma }}^{*}\mathbf {U} ^{*}=\mathbf {U} ({\boldsymbol {\Sigma }}{\boldsymbol {\Sigma }}^{*})\mathbf {U} ^{*}\end{aligned} $$
The right-hand sides of these relations describe the eigenvalue decompositions of the left-hand sides. Consequently:
* The columns of V (right-singular vectors) are eigenvectors of $M^{-1}M$.
* The columns of U (left-singular vectors) are eigenvectors of $MM^{-1}$.
* The non-zero elements of Σ (non-zero singular values) are the square roots of the non-zero eigenvalues of M∗M or MM∗.</i>

Ahh. That makes some more sense?

$\therefore W = M^TM = V_M \Sigma_M^T \Sigma_M V_M^T = U_W\Sigma_W V^T_W$ which means $V_M = U_W, V_M^T = V^T_W$. I guess this makes sense. As the covariance matrix is symmetric. But now I am confused as we are using the eigen vectors as rotations, but normally they only scale the matrix, in fact that is their definition.

https://www.ling.ohio-state.edu/~kbaker/pubs/Singular_Value_Decomposition_Tutorial.pdf


$$
USV^T = \\
\begin{bmatrix}
u_{11} & u_{12} & u_{13} & u_{14} \\
u_{21} & u_{22} & u_{23} & u_{24} \\
u_{31} & u_{32} & u_{33} & u_{34} \\
u_{41} & u_{42} & u_{43} & u_{44} \\
\end{bmatrix}
\begin{bmatrix}
s_{11} & 0 & 0 \\
0 & s_{22} & 0 \\
0 & 0 & s_{33} \\
0 & 0 & 0 \\
\end{bmatrix}
\begin{bmatrix}
v_{11} & v_{12} & v_{13} \\
v_{21} & v_{22} & v_{23} \\
v_{31} & v_{32} & v_{33} \\
\end{bmatrix} \\
=\begin{bmatrix}
s_{11}u_{11} & s_{22}u_{12} & s_{33}u_{13} \\
s_{11}u_{21} & s_{22}u_{22} & s_{33}u_{23} \\
s_{11}u_{31} & s_{22}u_{32} & s_{33}u_{33} \\
s_{11}u_{41} & s_{22}u_{42} & s_{33}u_{43} \\
\end{bmatrix}
\begin{bmatrix}
v_{11} & v_{12} & v_{13} \\
v_{21} & v_{22} & v_{23} \\
v_{31} & v_{32} & v_{33} \\
\end{bmatrix} \\
= \begin{bmatrix}
v_{11}s_{11}u_{11} + v_{21}s_{22}u_{12} + v_{31}s_{33}u_{13} & v_{12}s_{11}u_{11} + v_{22}s_{22}u_{12} + v_{32}s_{33}u_{13} & v_{13}s_{11}u_{11} + v_{23}s_{22}u_{12} + v_{33}s_{33}u_{13} \\
v_{11}s_{11}u_{21} + v_{21}s_{22}u_{22} + v_{31}s_{33}u_{23} & v_{12}s_{11}u_{21} + v_{22}s_{22}u_{22} + v_{32}s_{33}u_{23} & v_{13}s_{11}u_{21} + v_{23}s_{22}u_{22} + v_{33}s_{33}u_{23} \\
v_{11}s_{11}u_{31} + v_{21}s_{22}u_{32} + v_{31}s_{33}u_{33} & v_{12}s_{11}u_{31} + v_{22}s_{22}u_{32} + v_{32}s_{33}u_{33} & v_{13}s_{11}u_{31} + v_{23}s_{22}u_{32} + v_{33}s_{33}u_{33} \\
v_{11}s_{11}u_{41} + v_{21}s_{22}u_{42} + v_{31}s_{33}u_{43} & v_{12}s_{11}u_{41} + v_{22}s_{22}u_{42} + v_{32}s_{33}u_{43} & v_{13}s_{11}u_{41} + v_{23}s_{22}u_{42} + v_{33}s_{33}u_{43} \\
\end{bmatrix} \\
= \begin{bmatrix}
v_{11}s_{11}u_{11} & v_{12}s_{11}u_{11} & v_{13}s_{11}u_{11} \\
v_{11}s_{11}u_{21} & v_{12}s_{11}u_{21} & v_{13}s_{11}u_{21} \\
v_{11}s_{11}u_{31} & v_{12}s_{11}u_{31} & v_{13}s_{11}u_{31} \\
v_{11}s_{11}u_{41} & v_{12}s_{11}u_{41} & v_{13}s_{11}u_{41} \\
\end{bmatrix}
+ \begin{bmatrix}
v_{21}s_{22}u_{12} & v_{22}s_{22}u_{12} & v_{23}s_{22}u_{12} \\
v_{21}s_{22}u_{22} & v_{22}s_{22}u_{22} & v_{23}s_{22}u_{22} \\
v_{21}s_{22}u_{32} & v_{22}s_{22}u_{32} & v_{23}s_{22}u_{32} \\
v_{21}s_{22}u_{42} & v_{22}s_{22}u_{42} & v_{23}s_{22}u_{42} \\
\end{bmatrix}
+\begin{bmatrix}
v_{31}s_{33}u_{13} & v_{32}s_{33}u_{13} & v_{33}s_{33}u_{13} \\
v_{31}s_{33}u_{23} & v_{32}s_{33}u_{23} & v_{33}s_{33}u_{23} \\
v_{31}s_{33}u_{33} & v_{32}s_{33}u_{33} & v_{33}s_{33}u_{33} \\
v_{31}s_{33}u_{43} & v_{32}s_{33}u_{43} & v_{33}s_{33}u_{43} \\
\end{bmatrix}
$$

## Questions

* When can a matrix not be decomposed into eigenvectors? (existince)
* Uniquness. When do 
* Why is the decomposition so popular? And so useful?

# Rank and Linear independence

If we have two vectors, $x,y \in \mathbb{R}$, such that $x = [1,4], y = [2,8]$ then these vectors are _linearly dependent_ as y = 2x. Therefore, by composing x and y we can only ever get vectors on a line. However, 


* Do that proof thingy of rows = columns.
* Random init = full rank with probability 1 ?!?

# Projection

##### Projection
Let U = (mxm), S = (mxn), V^T = (nxn)
$$
\begin{align}
P_A &= A(A^TA)^{-1}A^T \\
M &= USV^T \\
P_A &= (USV^T) ((USV^T)^T(USV^T))^{-1} (USV^T)^T \\
P_A &= USV^T (VS^TU^TUSV^T)^{-1} (VS^TU^T) \\
P_A &= USV^T (VS^2V^T)^{-1} (VS^TU^T) \\
\end{align}
$$

# Orthogonal matrix

$A^T = A^{-1}$ Proof?

Why are covariance matrices orthogonal?

> _"Orthogonality and statistical independence are not synonyms."_ [SE](http://stats.stackexchange.com/questions/110508/questions-on-pca-when-are-pcs-independent-why-is-pca-sensitive-to-scaling-why)

### Orthogonal projections 

Let x be some vector and L be a subspace such that $L = span(v) = \{cv : c \in \mathbb{R}\}$. Then the projection of x onto L, $proj_L(x) = \frac{x\cdot v}{v \cdot v}v$

$P_A = P_A^2$ <- proof??

$P_A = A^T(A^TA)^{-1}A$

# Positive semi definite
***
> _"positive definiteness is a sufficient condition for strict convexity"_ [SE](http://math.stackexchange.com/questions/210187/relation-between-positive-definite-matrix-and-strictly-convex-function)

Convexity of what? Prove!

***
Prove that a covariance matrix is always positive semi-definite.
http://math.stackexchange.com/questions/114072/what-is-the-proof-that-covariance-matrices-are-always-semi-definite

***

# Norms

What if we want to know how big a matrix is? We need a scale to compare different vectors/matrices/tensors. How should it work? Well, it should be;
* transitive. If $\parallel A \parallel > \parallel B \parallel$ and $\parallel B \parallel > \parallel C \parallel$ then $\parallel A \parallel > \parallel C \parallel$

It's a distance metric?

$$
\begin{align}
\parallel v \parallel &= \sqrt{v_1^2 + v_2^2 ... v_d^2} \tag{from pythagoras} \\
&= \sqrt{v\cdot v} \tag{} \\
\therefore \parallel v \parallel^2 &= v\cdot v \\
\end{align}
$$

$\left\|\mathbf {x} \right\|_{p}:={\bigg (}\sum _{i=1}^{n}\left|x_{i}\right|^{p}{\bigg )}^{1/p}$

# Permutation matrices



# Inverses

So, given a set in $\mathbb R^d$, can we make a group under matrix multiplication? $G = (\mathbb R, \times)$. We need identity, $I$, and inverses, $A^{-1}$


Invertible if A is square and $\exists A^{-1}:I = AA^{-1} = A^{-1}A$. 

Rank deficient case -- https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse


##### Proofs and questions

* Singular iff det(A) = 0
* What if $I = AA^{-1} \neq A^{-1}A$

### Generalised inverse


### Generalised 



## Schur complement

$$
\begin{align}
Mx &= y\\
\begin{bmatrix}
A & B \\
C & D \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
\end{bmatrix} &= 
\begin{bmatrix}
y_1\\
y_2\\
\end{bmatrix} \\
\therefore Ax_1 + Bx_2 &= y_1 \\
Cx_1 + Dx_2 &= y_2 \\
x_2 &= D^{-1}(d-Cx_1) \tag{solve for xs}\\
\dots \\
x_1 &= (A-BD^{-1}C)^{-1}(c-CD^{-1}d) \\
x_2 &= D^{-1}\Big(d-C(A-BD^{-1}C)^{-1}(c-CD^{-1}d) \Big)\\
x &= M^{-1}y\\
\end{align}
$$

* What about a greater number of blocks? E.g. 3x3?

* Let A,B,C be singular/non-invertible.
    * Can we invert the schur complement of D in M?
    * I.e. What is $(A - BD^{-1}C)^{-1}$

### Similarity matrix

If $B = P^{-1}AP$ then A and B have the same;

* rank,
* multi
... ??

# TODO

* Subspaces?!?
* Pictures!!!
* Span
* Column space
* Range (a general idea for functions but has specific definition for vector spaces??)
