+ This notebook is part of lecture 15 *Projections onto subspaces* in the OCW MIT course 18.06 by Prof Gilbert Strang [1]
+ Created by me, Dr Juan H Klopper
    + Head of Acute Care Surgery
    + Groote Schuur Hospital
    + University Cape Town
    + <a href="mailto:juan.klopper@uct.ac.za">Email me with your thoughts, comments, suggestions and corrections</a> 
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/InteractiveResource" property="dct:title" rel="dct:type">Linear Algebra OCW MIT18.06</span> <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName">IPython notebook [2] study notes by Dr Juan H Klopper</span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.

+ [1] <a href="http://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/index.htm">OCW MIT 18.06</a>
+ [2] Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: http://ipython.org

In [None]:
from IPython.core.display import HTML, Image
css_file = 'style.css'
HTML(open(css_file, 'r').read())

In [None]:
from sympy import init_printing, Matrix, symbols
from IPython.display import Image
from warnings import filterwarnings

In [None]:
init_printing(use_latex = 'mathjax')
filterwarnings('ignore')

# Projections onto subspaces

## Geometry in the plane

* Projection of a vector onto another (in the plane)
* Consider the orthogonal projection of **b** onto **a**

In [None]:
Image(filename = 'Orthogonal projection in the plane.png')

* Note that **p** falls on a line, which is a subspace of the plane &#8477;<sup>2</sup>
* Remember from the previous lecture that orthogonal subspaces have A**x** = **0**
* Note that **p** is some scalar multiple of **a**
* With **a** perpendicular to **e** and **e** = **b** - x**a**
* Thus we have **a**<sup>T</sup>(**b** - x**a**) = 0 and x**a**<sup>T</sup>**a** = **a**<sup>T</sup>**b**
* Since **a**<sup>T</sup>**a** is a number we can simplify
$$ x=\frac { { \underline { a }  }^{ T }\underline { b }  }{ { \underline { a }  }^{ T }\underline { a }  }  $$

* We also have **p** = **a**x
$$ \underline { p } =\underline { a } x=\underline { a } \frac { { \underline { a }  }^{ T }\underline { b }  }{ { \underline { a }  }^{ T }\underline { a }  }  $$

* This equation is helpful
    * Doubling (or any other scalar multiple of) **b** doubles (or scalar multiplies) **p**
    * Doubling (or scalar multiple of) **a** has no effect

* Eventually we are looking for proj<sub>**p**</sub> = P**b**, where P is the projection matrix
$$ \underline { p } =P\underline { b } \\ P=\frac { 1 }{ { \underline { a }  }^{ T }\underline { a }  } \underline { a } { \underline { a }  }^{ T } $$

* Properties of the projection matrix P
    * The columnspace of P (C(P)) is the line which contains **a**
    * The rank is 1, rank(P) = 1
    * P is symmetrix, i.e. P<sup>T</sup> = P
    * Applying the projection matrix a second time (i.e. P<sup>2</sup>) nothing changes, thus P<sup>2</sup> = P

## Why project?

(projecting onto more than a one-dimensional line)

* Because A**x** = **b** may not have a solution
    * **b** may not be in the columnspace
    * May have more equations than unknowns
* Solve for the closest vector in the columnspace
    * This is done by solving for **p** instead, where **p** is the projection of **b** onto the columnsapce of A
$$ A\hat { x } =\underline { p }  $$

* Now we have to get **b** orthogonally project (as **p**) onto the column(sub)space
* This is done by calculating two bases vectors for the plane that contains **p**, i.e. **a**<sub>1</sub> and **a**<sub>2</sub>

* Going way back to the graph up top we note that **e** is perpendicular to the plane
* So, we have:
$$ A\hat { x } =\underline { p } $$
* We know that both **a**<sub>1</sub> and **a**<sub>2</sub> is perpendicular to **e**, so:
$$ { a }_{ 1 }^{ T }\underline { e } =0;\quad { a }_{ 2 }^{ T }\underline { e } =0\\ \because \quad \underline { e } =\underline { b } -\underline { p } \\ \because \quad \underline { p } =A\hat { x } \\ { a }_{ 1 }^{ T }\left( \underline { b } -A\hat { x }  \right) =0;\quad { a }_{ 2 }^{ T }\left( \underline { b } -A\hat { x }  \right) =0 $$

* We know that from ...
$$ \begin{bmatrix} { a }_{ 1 }^{ T } \\ { a }_{ 2 }^{ T } \end{bmatrix}\left( \underline { b } -A\hat { x }  \right) =\begin{bmatrix} 0 \\ 0 \end{bmatrix}\\ { A }^{ T }\left( \underline { b } -A\hat { x }  \right) =0 $$
* ... **e** must be in the nullspace of A<sup>T</sup>
* Which is right because from the previous lecture the nullspace of A<sup>T</sup> is orthogonal to the columnspace of A

* Simplifying the last equations we have
$$ {A}^{T}{A} \hat{x} = {A}^{T}{b} $$

* Just look back at the plane example in &#8477;<sup>2</sup> example we started with
* Simplifying things back to a column vector **a** instead of a matrix subspace A in this last equation does give us what we had in &#8477;<sup>2</sup>

* Solving this we have
$$ \hat { x } ={ \left( { A }^{ T }A \right)  }^{ -1 }{ A }^{ T }\underline { b }  $$

* Which leaves us with
$$ \underline { p } =A\hat { x } \\ \underline { p } =A{ \left( { A }^{ T }A \right)  }^{ -1 }{ A }^{ T }\underline { b }  $$

* Making the projection matrix P
$$ P=A{ \left( { A }^{ T }A \right)  }^{ -1 }{ A }^{ T } $$

* Just note that for a square invertible matrix A, P is the identity matrix
* Most of the time A is not square (and thus invertible) so we have to leave the equation as it is
* Also, note that P<sup>T</sup> = P and P<sup>2</sup> = P

## Applications

### Least squares

* Given a set of data points in two dimensions, i.e. with variables (*t*,*b*)
* We need to fit them onto the best line
* So, as an example consider the points (1,1), (2,2), (3,2)

* A best line in this instance means a straight line in the form
$$ {b}={C}+{D}{t} $$
* Using the three points above we get three equations
$$ {C}+{D}=1 \\ {C}+{2D} = 2 \\ {C}+{3D}=2 $$

* If the line goes through all points, we would give a solution
* Instead we have the following
$$ \begin{bmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}\begin{bmatrix} C \\ D \end{bmatrix}=\begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix} $$
* Three equation, two unknowns, no solution, **so** solve ...
$$ { A }^{ T }A\hat { x } ={ A }^{ T }b $$
* ... which for the solution is
$$ \hat { x } ={ \left( { A }^{ T }A \right)  }^{ -1 }{ A }^{ T }b $$

In [None]:
A = Matrix([[1, 1], [1, 2], [1, 3]])
A

In [None]:
b = Matrix([1, 2, 2])
b

In [None]:
(A.transpose() * A).inv() * A.transpose() * b

* Thus, the solution is:
$$ b=\frac { 2 }{ 3 } +\frac { 1 }{ 2 } t $$