In [1]:
using LinearAlgebra, RowEchelon, LaTeXStrings, Plots, SymPy
include("LAcodes.jl");
LAcodes.title( "Orthogonality", sz=30, color="darkred")

# 1. Adding Vector Length to Vector Spaces

## 1.1 Basic Definitions

The definitions below are carefully written to generalize to arbitrary vector spaces.

#### **Allow Both Real and Complex Numbers**

Adding the notion of the **length of a vector** to a vector space proceeds in two steps.<br>
In the following, we allow the scalars to be $\mathbb{F} = \mathbb{R}, \mathbb{Q}$ or $\mathbb{C}$.

To allow complex numbers, we need to allow complex conjugate:<br>
a bar over an expression signifies complex conjugation.

**Example:**  $\overline{3 + 2 i} = 3 - 2 i.$<br>
$\quad\quad\quad\;$ Note that for real numbers $\overline{x} = x.$

#### **Inner Products (Dot Product)**

<div style="background-color:#F2F5A9">
    
**Definition:** An **inner product space** is a vector space $V$ over the scalars $\mathbb{F}$
 with a function $\; \cdot : V \times V \rightarrow \mathbb{F}$<br> $\quad\quad$ with the following properties
$\forall x,y,z \in V, \; \forall \alpha\in \mathbb{F}:$
$$
\begin{align}
 &x \cdot y          &=& \;\overline{ y \cdot x } & \text{(conjugate symmetry)             } \\
 &x \cdot (\alpha y)  &=& \;\alpha x \cdot y       & \text{(linearity in the second argument)} \\
 &x \cdot ( y+z )     &=&\; x \cdot y + x \cdot z        & \\
\end{align}
$$
and
$$
x \cdot x\quad\quad\quad  = \left\{ \begin{align}& c > 0 & \quad x \ne 0\\ & 0 \quad & \text{ otherwise} \end{align} \right. \quad\quad \text{ (positive definite)}
$$
</div>

**Remark:** The complex conjugate is required for complex numbers, since the dot product is not positive definite.
$$
\begin{align} (\alpha + \beta i ) \cdot (\alpha + \beta i ) & = \alpha^2 - \beta^2 + 2 \alpha \beta\; i  \\
 \overline{(\alpha + \beta i )} \cdot (\alpha + \beta i ) & = \alpha^2 + \beta^2  \\
\end{align}
$$

> **Note** that for modulo 2 arithmetic, i.e., $\mathbb{F} = \mathbb{Z}_2$ the dot product is not positive definite.
> 
> <span style="color:red;"><strong>ALL THEOREMS FROM HERE ON REQUIRE THE POSITIVE DEFINITE PROPERTY of the dot product</strong></span>
>
> This is why the Fundamental theorem is broken into two parts!

#### **Distance**

Inner products can be used to define a **distance** function, i.e.,
<div style="background-color:#F2F5A9">

**Definition:** A **metric** for a set $M$ is a function $d : M \times M \rightarrow \mathbb{R}$<br>
$\quad\quad$ with the following properties
$\forall x,y,z \in V, \; \forall \alpha\in \mathbb{F}:$
$$
\begin{align}
d(x,y) & = 0 \; \Leftrightarrow \; x = y & \\
d(x,y) & = d(y,x) & \\
d(x,y) & \le d(x,z) + d(z,y) & \quad\quad \text{ ( triangle inequality ) }\\
\end{align}
$$

**Remark**: The axioms for a metric guarantee $$d(x,y) \ge 0$$
</div>

#### **Length of a Vector**

<div style="background-color:#F2F5A9">

**Definition:** The **norm** of a vector $v$ in an inner product space
    $$\lVert v \rVert = \sqrt{ \overline{v} \cdot v }$$
**Definition:** The **distance** between two vectors $u$ and $v$ in an inner product space is
    $$ d(x,y) = \lVert x-y \rVert $$

</div>

**Remarks:**
* For the dot product in $\mathbb{R}^2$ and $\mathbb{R}^3$, this definition yields the Euclidean length of a vector. E.g.,
$$\lVert \begin{pmatrix}u_1\\u_2 \end{pmatrix} \rVert = \sqrt{ u_1^2 + u_2^2 }$$
* The definition of the norm from the inner product shows that
$$
\lVert \alpha u \rVert = \sqrt{ \overline{\alpha v} \cdot \alpha v  } = \ \lvert \alpha \rvert \ \lVert v \rVert
$$

**Example:**<br>
Let $$u = \begin{pmatrix} 2 \\ 5 \end{pmatrix}, v = \begin{pmatrix} 3 \\ -1 \end{pmatrix}.$$

$$
\lVert u \rVert = \sqrt{ 2^2 + 5^2 } = \sqrt{29}, \quad \lVert v \rVert = \sqrt{ 3^2 + (-1)^2 } = \sqrt{10} = 2 \lVert u \rVert.
$$


$$\lVert 2 u \rVert = \sqrt{ \begin{pmatrix} 4\\10 \end{pmatrix} \cdot \begin{pmatrix} 4\\10 \end{pmatrix} } = \sqrt{ 4 \begin{pmatrix} 2\\5 \end{pmatrix} \cdot \begin{pmatrix} 2 \\ 5 \end{pmatrix} } = 2 \sqrt{ \begin{pmatrix} 2 \\ 5 \end{pmatrix} \cdot \begin{pmatrix} 2 \\ 5 \end{pmatrix} }$$ 

<div style="background-color:#F2F5A9">

**Definition:** A **unit vector** is a vector with norm equal to 1.
    
Such a vector may be constructed from any non-zero vector $u$ with
$$
\hat{u} = \frac{1}{\lVert u \rVert} u
$$
</div>

$$
\begin{align}
    u = \begin{pmatrix} 2 \\ 5 \end{pmatrix} & \Rightarrow \lVert u \rVert = \sqrt{29} \\
                                             & \Rightarrow \hat{u} =\frac{1}{\sqrt{29}}
                                                            \begin{pmatrix} 2 \\ 5 \end{pmatrix}.
\end{align}
$$

## 1.2 Inequalities, Angle, Orthogonal Vectors

#### **Orthogonal Vectors**

<div style="float:left;">

**Constructing Orthogonal Vectors in <span style="color:red;">2D and 3D</span>:**

$$
\begin{align}
\lVert u+v \rVert = \lVert u-v \rVert
& \Leftrightarrow \sqrt{ (u+v) \cdot (u+v) } =  \sqrt{ (u-v) \cdot (u-v) } \\
& \Leftrightarrow      { (u+v) \cdot (u+v) } =       { (u-v) \cdot (u-v) } \\
& \Leftrightarrow      { \lVert u \rVert^2 + \lVert v \rVert^2 + 2 u \cdot v } = 
                       { \lVert u \rVert^2 + \lVert v \rVert^2 - 2 u \cdot v } \\
& \Leftrightarrow      { u \cdot v = 0 } \\ 
\end{align}
$$

**Orthogonal Vectors:**
$$u \perp v \Leftrightarrow u \cdot v = 0$$

</div><div style="float:right;"><img src="OrthogonalDirection.svg" width=300></div>

#### **Generalization: Angles Between Vectors**

<div style="background-color:#F2F5A9">

**Theorem: (Cauchy-Schwartz Inequality)** The inner product between two vectors $u$ and $v$ satisfies $$\lvert \overline{u} \cdot v \rvert \le \lVert u \rVert \ \lVert v \rVert$$
</div>

**Remark:** The equality is trivially satisfied if either $u = 0$ or $v = 0$. When neither of the vectors is zero, we can rewrite this as
$$
-1 \le \frac{ \overline{u} \cdot v }{\lVert u \rVert \ \lVert v \rVert} \le 1
$$

In $\mathbb{R}^2$ this quotient is the cosine of the angle between the vectors $u$ and $v$.<br>
we therefore generalize this to
<div style="background-color:#F2F5A9">

$$
cos ( \angle (u,v) ) = \frac{ \overline{u} \cdot v }{\lVert u \rVert \ \lVert v \rVert}, \quad \text{ where } u \ne 0, v \ne 0
$$
</div>

<div style="background-color:#F2F5A9">

**Remark:** orthogonal non-zero vectors have $\; cos\ 90^\circ = 0$,<br>i.e.,
$$ u \cdot v = 0 \Leftrightarrow u \perp v $$

**Remark:**
* To simplify the previous remark, we define<br>
    **the zero vector to be orthogonal to any other vector**:

    $$
    u ⋅ v = 0 \Leftrightarrow u \perp v \quad \text{ for any two vectors } u \text{ and } v
     \;\textbf{ including the zero vector}.
     $$
</div>

##### **Example:**

Let $u = \begin{pmatrix} 1 \\ 5 \\ 3 \end{pmatrix}, \;\; v = \begin{pmatrix} 4 \\ 1 \\ 1 \end{pmatrix}$

The angle between $u$ and $v$  is $\theta = arccos \frac{u \cdot v}{ \lVert u \rVert\ \lVert v \rVert } 
 = arccos \frac{9}{ \sqrt{37}\ \sqrt{3} } 
 \approx $ 56.41 degrees

The distance from $u$ to $v$ is $\quad\quad\quad\quad\quad\quad\lVert v - u \rVert =\ \lVert\ \begin{pmatrix} 3 \\ -4 \\ -2 \end{pmatrix}\ \rVert\; \approx 5.10$

A detour via $w = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}$ increases the distance to $\lVert v-w \rVert + \lVert w-u \rVert  \approx 7.63$

# 2. Fundamental Theorem of Linear Algebra (Part 2)

## 2.1 Main Definitions and Theorem

### 2.1.1 Linear Independence of Orthogonal Vectors

<div style="float:left;background-color:#F2F5A9">

**Theorem:** Orthogonal vectors are **linearly independent.**

**Fundamental Theorem:** Given a matrix $A$ in $\mathbb{R}^{M \times N}$
* Any two vectors $r \in \mathscr{R}(A), n \in \mathscr{N}(A)$ are orthogonal, i.e., $r \perp n$
* Any two vectors $c \in \mathscr{C}(A), \tilde{n} \in \mathscr{N}(A^t)$ are orthogonal, i.e., $c \perp \tilde{n}$


---
**Remark:**
* The sketch of the Fundamental Theorem presented previously<br>
    depicts this situation accurately!<br><br>
    **Vectors in the two fundamental spaces in the domain and the codomain are orthogonal**.
</div>
<div style="float:right;"><img src="FundamentalTheorem_0.svg" width=250></div>

### 2.1.2 Orthogonal Spaces

<div style="background-color:#F2F5A9;float:left;">

**Definition:** A vector space $U$ is orthogonal to a vector space $V$ iff $\forall u \in U, \forall v \in V, \; u \perp v$

**Definition:** Let $U$ be a subspace of $V$. The **orthogonal complement** $U^\perp = \left\{ v \in V \mid \forall u \in U, \ v \perp u \right\}$.

**Theorem:** Given a vector space $U$, then $(U^\perp)^\perp = U$.<br>
**Theorem:** Given two vector spaces $U$ and $V$ such that $U^\perp = V$, then $V^\perp = U$.

</div>
<div style="float:right;"><img src="FundamentalTheorem_1.svg" width=250></div>

<div style="background-color:#F2F5A9;float:left;">

**Fundamental Theorem:** Given a matrix $A \in \mathbb{R}^{M \times N}$ then $\mathscr{R}(A)^\perp = \mathscr{N}(A)$ in $\mathbb{R}^N$.<br>
**Fundamental Theorem:** Given a matrix $A \in \mathbb{R}^{M \times N}$ then $\mathscr{C}(A)^\perp = \mathscr{N}(A^t)$ in $\mathbb{R}^M$.

**Fundamental Theorem:** Let $A$ be a matrix of size $M \times N.$ The union of the bases for $\mathscr{C}(A)$ and $\mathscr{N}(A^t)$ is a basis for $\mathbb{R}^M$.<br>
**Fundamental Theorem:** Let $A$ be a matrix of size $M \times N.$ The union of the bases for $\mathscr{R}(A)$ and $\mathscr{N}(A)$   is a basis for $\mathbb{R}^N$.
</div>
<div style="float:right;"><img src="FundamentalTheorem_2.svg" width=250></div>

## 2.2 Use the Fundamental Theorem to Decompose a Vector (Naive Method)

### 2.2.1 Introduction

<div style="background-color:#F2F5A9">
    
**Theorem:** Non-zero orthogonal vectors are linearly independent.<br><br>
</div>

Let $u \perp v$ be two non-zero vectors (e.g., $\lVert u \rVert \ne 0.$ Let us check linear independence:

$$
\begin{align}
& \alpha\ u\ +\ \beta\ v\ =\ 0\quad & \Rightarrow & \quad\alpha \overline{u} \cdot u + \beta \overline{v} \cdot u = 0 \quad\Rightarrow \quad \alpha \lVert  u \rVert^2 \Rightarrow \alpha = 0.\\
& \beta\ v \ = \ 0 & \Rightarrow & \quad \beta = 0
\end{align}
$$


----
An immediate cosequence (part of the Fundamental theorem, part II) is

<div style="background-color:#F2F5A9">

**Theorem:** The union of the bases for $\mathscr{C}(A)$ and $\mathscr{N}(A^t)$ <br>
$\quad\quad\quad\;$ of a matrix $A$ of size $M \times N$ is a basis for the whole space $\mathbb{R}^M.$
</div>

Why? Any vector $c$ in $\mathscr{C}(A)$ is orthogonal to any vector $\tilde{n}$ in $\mathscr{N}(A^t)$<br>
since
$$
A^t \tilde{n} = 0 \Leftrightarrow \tilde{y}^t A = 0.
$$

The equation to the right clearly shows that the dot product of the vector $y$ with any column of $A$ is 0.

$\therefore \; \mathbf{y}$ **is orthogonal to each of the columns of** $\mathbf{A}$,<br>
$\quad$ and hence to any linear combination of columns of $A:\quad
y \cdot (\ \alpha_1\ a_1\ +\ \alpha_2\ a_2\ + \dots\ \alpha_n\ a_n\ )\ =\ 0.$

### 2.2.2 A basis for $\mathbb{R}^M$

Let $A$ be a matrix of size $M \times N$ with rank $r$.<br><br>
Let $\left\{ c_1, c_2, \dots c_r \right\}$ be a basis for $\mathscr{C}(A)$,<br>
and $\left\{ ñ_1, ñ_2, \dots ñ_{M-r} \right\}$ be a basis for $\mathscr{N}(A')$.<br>

The combined basis $\quad\left\{\  c_1, c_2, \dots c_r,\;  ñ_1, ñ_2, \dots ñ_{M-r} \ \right\}$ $\quad\quad$
is a basis for $\mathbb{R}^M$.

##### **Example:**

> Let $A = \begin{pmatrix}  1  & 2 \\ 2 & 2 \\ -1 & 1 \end{pmatrix}$.
>
> A quick computation shows
> $$
\text{basis } \mathscr{C}(A) = \left\{\;
c_1= \begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix}, \; c_2 = \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} \; \right\},\quad\quad
\text{basis } \mathscr{N}(A^t) = \left\{\ \tilde{n}\ = \ \begin{pmatrix} -4 \\ 3 \\ 2 \end{pmatrix} \; \right\}.
$$

> Combining the two bases yields a basis for $\mathbb{R}^3$:
> $$
\text{basis } \mathbb{R}^3 = \left\{\;
\begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix}, \;  \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} \;
\begin{pmatrix} -4 \\ 3 \\ 2 \end{pmatrix} \; \right\}.
$$
>
> $\therefore\quad$ **Any vector $b$ in $\mathbb{R}^3$ can be written as a unique linear combination**
$\quad
b = \alpha_1 c_1 + \alpha_2 c_2 + \alpha_3 \tilde{n}. 
$

#### **Conclusion**

##### **Sketch**

<div style="background-color:#F2F5A9;float:left;">

A consequence of the Fundamental Theorem Part II for a matrix $A \in \mathbb{R}^{M \times N}$:
    
Given  $\left\{ c_1, c_2, \dots c_r \right\},$ a basis for $\mathscr{C}(A)$,<br>
and $\left\{ ñ_1, ñ_2, \dots ñ_{M-r} \right\},$ a basis for $\mathscr{N}(A')$.<br>

> Any vector $b \in \mathbb{R}^M$ can be written as a linear combination of these vectors:
$$
\begin{align}
&b           \; = \color{blue}{b_{\parallel}} + \color{red}{b_{\perp}},  &\\
 \text{ where }\quad &\color{blue}{b_{\parallel}  = \alpha_1 c_1 + \alpha_2 c_2 \dots + \alpha_r c_r} &\\
 \text{ and }\quad &\color{red}{b_\perp        = \beta_1 ñ_1 + \beta_2 ñ_2 \dots \beta_{M-r} ñ_{M-r}}.&
\end{align}
$$
</div>
<div style="float:right;">
The result is depicted in the following Figure:<br>
    (note this is the codomain of $y=A x$ turned on its side)<br><br>
<img src="./NormalEquations.svg"  width="350">
</div>

<div style="float:center;>
* the red vector $b_\parallel$ is the part of $b$<br>
    that lies in the $\mathscr{C}(A)$ hyperplane (the linear combination formed with $\alpha_i c_i$)<br>
  it is the orthogonal projection $Proj_{\mathscr{C}(A)}^\perp b$ onto the column space $\mathscr{C}(A)$
* the blue vector $b_\perp$ is the part of $b$<br>
    that lies in the $\mathscr{N}(A')$ hyperplane (the linear combination formed with $\beta_j ñ_j$)$\quad\quad$
* these two vector components are orthogonal.
            </div>

<div style="float:left;">$\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad$</div>
<div style="float:left;">

* the red vector $b_\parallel$ is the part of $b$<br>
that lies in the $\mathscr{C}(A)$ hyperplane (the linear combination formed with $\alpha_i c_i$)<br>
it is the orthogonal projection $Proj_{\mathscr{C}(A)}^\perp b$ onto the column space $\mathscr{C}(A)$
* the blue vector $b_\perp$ is the part of $b$<br>
that lies in the $\mathscr{N}(A')$ hyperplane (the linear combination formed with $\beta_j ñ_j$)
* these two vector components are orthogonal.
</div>

##### **Example Continued**

> **Decompose a Vector into Two Orthogonal Components ( <span style="color:red;"> Naive Method</span> )** 

> $\quad\quad\;$ Let $A = \begin{pmatrix}  1  & 2 \\ 2 & 2 \\ -1 & 1 \end{pmatrix}$, the matrix from the previous example.
>
> **Step 1:** We had found the bases
>
>
> $$
\text{basis } \mathscr{C}(A) = \left\{\;
c_1= \begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix}, \; c_2 = \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} \; \right\}, \quad\quad
\text{basis } \mathscr{N}(A^t) = \left\{\ \tilde{n}\ = \ \begin{pmatrix} -4 \\ 3 \\ 2 \end{pmatrix} \; \right\}.
$$

> **Step 2:** Split the vector $b = \begin{pmatrix} -3 \\ 7 \\ -2 \end{pmatrix}\;$
into two orthogonal components: $b = \color{red}{b_{//}} + \color{blue}{b_\perp}\;$,
where $\color{red}{b_{//}}$ is in $\mathscr{C}(A)$ and $\color{blue}{b_\perp}$ is in $\mathscr{N}(A^t)$
 

> $\quad\quad$ We need to solve $\quad b = \color{red}{ \alpha_1 c_1 + \alpha_2 c_2 } + \color{blue}{ \alpha_3 \tilde{n}} \; \Leftrightarrow \;
\begin{pmatrix} \color{red}1 & \color{red}2 & \color{blue}{-4} \\ \color{red}2 & \color{red}2 & \color{blue}3 \\  \color{red}{-1} &  \color{red}1 &  \color{blue}2 \end{pmatrix}
\begin{pmatrix} \color{red}{\alpha_1} \\ \color{red}{\alpha_2} \\ \color{blue}{\alpha_3} \end{pmatrix} = \begin{pmatrix} -3 \\ 7 \\ -2 \end{pmatrix}.
$

> $\quad\quad$ This yields
> $$
\begin{align}
\color{red}{b_{//}}   &= \color{red}{3 a_1 - a2} &=& \; \color{red}{\begin{pmatrix} 1 \\ 4 \\ -4 \end{pmatrix}} \\
\color{blue}{b_\perp} &= \color{blue}{\tilde{n}} &=& \; \color{blue}{\begin{pmatrix} -4 \\ 3 \\ 2 \end{pmatrix}} \\
\end{align}
$$

> **Check:** Let's check orthogonality: $\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\color{red}{b_{//}} \cdot \color{blue}{b_\perp} = 0.$

> ---
> This computation was involved: we needed to
> * **Step 1:** find the bases of both $\mathscr{C}(A)$ and $\mathscr{N}(A^t)$
> * **Step 2:** Solve the resulting $A x = b$ type problem
> * **Step 3:** Split the resulting column view decomposition of $b = \color{red}{b_\parallel} + \color{blue}{b_\perp}.$
>
> **It turns out we can do better!**

## 2.3 Lead up: Use the Normal Equation to Decompose a Vector

### 2.3.1 Key Observation 1: Decomposing a Vector into Orthogonal Components

**Key observation:** decomposition of a vector $b$
<div style="float:left;">
> Any vector $b \in \mathbb{R}^M$ can be written as a linear combination:
$$
\begin{align}
                         &b                         \;&=\;& \color{red}{b_{\parallel}} + \color{blue}{b_{\perp}},\\
\text{ where }\quad\quad &\color{red}{b_{\parallel}}  &=\;& \color{red}{\alpha_1 c_1 + \alpha_2 c_2 \dots + \alpha_r c_r} &\\
\text{ and }  \quad\quad &\color{blue}{b_\perp}       &=\;& \color{blue}{\beta_1 ñ_1 + \beta_2 ñ_2 \dots \beta_{M-r} ñ_{M-r}},&
\end{align}
$$
the $c_i$ vectors form a basis for a hyperplane containing $b_\parallel$,<br>
the $ñ_j$ vectors form a basis for the orthogonal complement of this hyperplane.
</div><div style="float:right;">
<img src="./NormalEquations.svg"  width="350">
</div>

----
> The basic idea is to replace the system of equations for the coefficients $\alpha_i, \beta_j$ by taking dot products with each of the $c_j$ in the column space $\mathscr{C}(A)$:
> $$
\begin{align}
(\xi) & \Leftrightarrow b &\;=\;& \color{red}{\alpha_1 c_1 + \alpha_2 c_2 \dots + \alpha_r c_r} + \color{blue}{b_\perp} \\
      & \Rightarrow \color{red}{c_j} \cdot b &\;=\;& \color{red}{\alpha_1 c_j \cdot c_1 + \alpha_2  c_j \cdot c_2 \dots + \alpha_r  c_j \cdot c_r} + \color{red}{c_j} \cdot \color{blue}{b_\perp}, \quad\quad j=1,2, \dots r
\end{align}
$$
>
> Since $\color{red}{c_j} \perp \color{blue}{b_\perp}$, the $\color{red}{c_j} \cdot \color{blue}{b_\perp} = 0$: **we are left with a set of equations that only involve the unknown coefficients $\color{red}{\alpha_i}$!**

##### **Example**

> Let's return to $A = \begin{pmatrix}  1  & 2 \\ 2 & 2 \\ -1 & 1 \end{pmatrix}$, which has
$
\text{basis } \mathscr{C}(A) = \left\{\;
c_1= \begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix}, \; c_2 = \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} \; \right\} \quad
$ and the vector $b = \begin{pmatrix} -3 \\ 7 \\ -2 \end{pmatrix}\;$

> Set $ \alpha_1  \begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix} + \alpha_2 \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} + b_\perp = \begin{pmatrix} -3 \\ 7 \\ -2 \end{pmatrix}.$
>

> Taking the dot product with $c_1$ and $c_2$ yields
> $$ \left.
\begin{align}
c_1 : \quad & 6 \alpha_1 + 5 \alpha_2 =& 13 \\
c_2 : \quad & 5 \alpha_1 + 9 \alpha_2 =& 6 \\
\end{align}
\right\} \Leftrightarrow \left\{ \begin{aligned}
\alpha_1 =&\; 3 \\ \alpha_2 =& -1, \end{aligned} \right.
$$
the same solution we found before.

### 2.3.2 Key Observation 2: No Need to Identify the Column Space of $A$ and the Null Space of $A^t$

**Key observation:** we do **not need to identify the fundamental spaces!**

> Keeping all the columns of $A$ instead of just the pivot columns $c_j, j=1,2,\dots r$<br>
potentially yields an infinite number of ways of expressing the same solution $\color{red}{b_\parallel}:$<br>
We would get the same solution as before by setting the additional free variables equal to zero.

**Key observation:** remember that $b = \color{red}{b_\parallel} + \color{blue}{b_\perp}$.

> We are given vectors $\color{blue}{a_1, a_2, \dots a_n}$ and $b$.
>
> Once we compute $\color{red}{b_\parallel}$, we obtain $\color{blue}{b_\perp} = b - \color{red}{b_\parallel}.$

### 2.3.3 The Final Touch: Rewrite the Equations in Matrix Form

 Finally, observe that we can rewrite our equations in matrix form:
 
 $$
 b \;=\; \color{red}{\alpha_1 a_1 + \alpha_2 a_2 \dots + \alpha_N a_N} + \color{blue}{b_\perp}
 \Leftrightarrow b =  \color{red}{A x} + \color{blue}{b_\perp}.
 $$
 
 Taking the dot products with the columns $ \color{red}{a_j = 1, 2, \dots N}$ of $A$ yields the **normal equation**
 $$
 \color{red}{A^t A x = A^t b}
 $$
 

----
All we need is any one solution, however: we get $ \color{red}{b_\parallel = A x}$ for some vector $x$:
 thus $\quad b =  \color{red}{A x} +  \color{blue}{b_\perp}$.
  <br><br>
  * Multiplying $b =  \color{red}{A x} +  \color{blue}{b_\perp}$ by $A^t$ from the left still zeroes out the $ \color{blue}{b_\perp}$ term: we are left with the equations
> $$
   \begin{align}
   &A^t A x     &= A^t b &\quad \text{ known as the }\textbf{normal equation} \\
   &b_\parallel &= A x   &
   \end{align}\label{eq1}\tag{1}
$$
> To solve for $b_\perp$, it is sufficient to realize that $b = b_\parallel + b_\perp$, so
> $$
b_\perp  = b - b_\parallel \label{eq2}\tag{2}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad 
$$

# 3. The Normal Equation

## 2.1 Basic Properties of the Normal Equation

<div style="background-color:#F2F5A9">
    
The **normal equation** $$A^t A x = A^t b$$
allows us to decompose vectors $b$ into two orthogonal components $$b = b_\parallel + b_\perp$$
such that $b_\parallel \in \mathscr{C}(A)$ and
$b_\perp \in \mathscr{N}(A^t)$:
$$
   \begin{align}
   &A^t A x     &=&\ A^t b  \\
   &b_\parallel &=&\ A x    \\
   &b_\perp     &=&\ b - b_\parallel
   \end{align}.
$$
</div>

The key observation was that the multiplication of $A x = b = b_\parallel + b_\perp$ by $A^t$ zeros out the $b_\perp$ term.

<div style="background-color:#F2F5A9">
    
**Theorem:** $\quad\quad\quad\quad\quad\mathscr{N}(A) = \mathscr{N}(A^t A)$<br><br>

</div>

which **guarantees that the set of solutions of the normal equation is identical to the set of solution of $A x = b_\parallel$**.

In [None]:
The proof is of interest:

* Let $x$ be a solution of $A x = 0.$ Multiplying by $A^t$ from the left yields $A^t A x = 0,$

A second key observation is that the triangle inequality shows that the solutions to the normal equations also solve
the problem
<div style="background-color:#F2F5A9">
    
$$
x^* = \arg\min_x { \lVert b - A x \rVert }
$$
</div>
since $b_\parallel = b - A x^*$ is the shortest vector from a point in $\mathscr{C}(A)$ to $b$.<br>
Note that the solution need not be unique: as before, we are interested in
$$
\begin{align}
   &b_\parallel &=&\ A x    \\
   &b_\perp     &=&\ b - b_\parallel,
\end{align}
$$
so homogeneous solutions do not enter.

### 2.2.1 Split a Vector

In [68]:
A = [ 1  3 4 
      5  3 8
      1 -1 0
      2  2 4
]
b = [15; 23; -2; 9]

LAcodes.title("Find the shortest vector from b to "*L"\mathscr{C}(A)", sz=15,height=8)
LAcodes.title("where", sz=12)
println("A =")
Base.print_matrix(stdout, A)
println("\n\nb = $b")

A =
 1   3  4
 5   3  8
 1  -1  0
 2   2  4

b = [15, 23, -2, 9]


In [69]:
ne_rref = Int64.(round.(rref([A'A A'b]), digits=0))
LAcodes.title("Reduced row echelon form of the normal equation:", sz=15)
LAcodes.ge_layout( ne_rref, col_divs=3)

0,1,2,3,4
1,0,1,,2
0,1,1,,4
0,0,0,,0


In [175]:
x_star = [0;2;2]
println("a particular solution is $x_star")
println("\n.  Check by substituting in the normal equation: $(A'A * x_star - A'b)")
println("\nSplit the vector b")
b_parallel = A*x_star
b_perp     = b - b_parallel
println(".  b_parallel = $b_parallel")
println(".  b_perp     = $b_perp")

println("\n\nAdditional check")
println(".  b_parallel and b_perp are indeed orthogonal:  b_parallel ⋅ b_perp = $(dot(b_parallel, b_perp))")

a particular solution is [0, 2, 2]

.  Check by substituting in the normal equation: [0, 0, 0]

Split the vector b
.  b_parallel = [14, 22, -2, 12]
.  b_perp     = [1, 1, 0, -3]


Additional check
.  b_parallel and b_perp are indeed orthogonal:  b_parallel ⋅ b_perp = 0


### 2.1.2 Distance of a vector from a hyperplane

In [181]:
LAcodes.title( "The distance of point b from a span of vectors")

println("Given a set of vectors, write them into a matrix as columns A =")
Base.print_matrix(stdout, A)
A = [ 1  3 4 
      5  3 8
      1 -1 0
      2  2 4
]
println("\n\nThe span of the vectors is { w = A x}")
b = [15; 23; -2; 9]
println("\nThe distance of b=$b to the span of vectors is min || b - A x||,\ni.e., || b_perp ||")

println("\n\nSolving the normal equation for A x = b, we find")
x_star = [0;2;2]
println(".  a particular solution x_star = $x_star")
b_parallel = A*x_star
b_perp     = b - b_parallel
println(".  b_parallel                   = $b_parallel")
println(".  b_perp                       = $b_perp")

dist_b_colspace_A = norm(b_perp)
println("\nThe distance between b and C(A) = $(round(dist_b_colspace_A,digits=2))")

Given a set of vectors, write them into a matrix as columns A =
 1   3  4
 5   3  8
 1  -1  0
 2   2  4

The span of the vectors is { w = A x}

The distance of b=[15, 23, -2, 9] to the span of vectors is min || b - A x||,
i.e., || b_perp ||


Solving the normal equation for A x = b, we find
.  a particular solution x_star = [0, 2, 2]
.  b_parallel                   = [14, 22, -2, 12]
.  b_perp                       = [1, 1, 0, -3]

The distance between b and C(A) = 3.32


In [182]:
LAcodes.title(L"b_\perp"*" is shorter than any other "*L"b_{other} = b - A x")
mapslices( norm, b .- 0.5A*randn(3, 50_000),dims= 1)
println( "The minimum random distance was $(minimum(lengths))")
println( "                    compared to $dist_b_colspace_A")

The minimum random distance was 3.334793149958816
                    compared to 3.3166247903554


### 2.1.3 Special Case: the columns of $A$ are mutually orthogonal

The equations simplify considerably when the columns $\{ a_1, a_2, \dots a_N \}$ of $A$ are **mutually orthogonal**, i.e, when
$$
a_i \cdot a_j = \left\{ \begin{align} \lVert a_i \rVert^2 \quad  &\ \text{ when } i = j\\ 0 \quad &\ \text{ otherwise} \end{align} \right.
$$

The normal equation takes the form
$$
A^t A x = A^t b \Leftrightarrow D x = A^t b,
$$
where  $D$ is a diagonal matrix
$$
D = \begin{pmatrix} \lVert a_1 \rVert^2 & 0                   & \dots & 0 \\
                    0                   & \lVert a_1 \rVert^2 & \dots & 0 \\
                    \                   &     \               &  \    & 0 \\
                    0                   & 0                   & \dots & \lVert a_N \rVert^2 \end{pmatrix}
$$

Assuming $a_i \ne 0, i =1,2, \dots N$, we obtain the following solution:
<div style="background-color:#F2F5A9">

For mutually orthogonal vectors $a_i, i=1,\dots N$, the normal equations reduce to
$$
\begin{align}
x_i          =& \frac{ b \cdot a_i }{ a_i \cdot a_i } \\
b_\parallel  =&\sum_{i=1}^{N}{ \frac{ b \cdot a_i }{ a_i \cdot a_i} a_i } \\
b_\perp      =& b - \sum_{i=1}^{N}{ \frac{ b \cdot a_i }{ a_i \cdot a_i} a_i } \\
\end{align}
$$

**Remark**: the equations simplify even further when the $a_i$ are mutually orthonormal, i.e., when $a_i \cdot a_i = 1.$
</div>

# 3. Projection Matrices, Orthogonal Matrices and Unitary Matrices

## 3.1 Orthogonal Projection Matrices

### 3.1.1 Theory

Let's take another look at the normal equation for a matrix $A$  of size $M \times N$ for a matrix that has full column rank $rank(A) = N$.<br>
(We know that we only need one solution of the normal equation, so we could omit all dependent columns in any given matrix.)

The matrix $A^t A$ is square of size $N \times N$. Since $\mathscr{C}(A^t A) = \mathscr{C}(A)$, it has full column rank and is therefore invertible: we therefore have
$$
   \begin{align}
   &A^t A x     &=&\ A^t b  &\Leftrightarrow x = (A^t A)^{-1} A^t b\\
   &b_\parallel &=&\ A (A^t A)^{-1} A^t b  = P b& \\
   &b_\perp     &=&\ b - b_\parallel = (I - P) b,&
   \end{align}
$$
where we have set $P = A (A^t A)^{-1} A^t$.

We see that this matrix $P$ projects the vector $b$ orthogonally onto the column space $\mathscr{C}(A)$.

<div style="background-color:#F2F5A9">

Let $\{ a_1, a_2, \dots a_N \}$
be a set of linearly independent vectors, and let
$A = \left( a_1\ a_2\ \dots \ a_N \right)$.

The orthogonal projection matrix onto the span of the vectors $a_i, i=1,2, \dots N$ is given by
$$
P = A \left( A^t A \right) A^t
$$
The orthogonal projection matrix onto the orthogonal complement of the span of the vectors
$a_i, i=1,2, \dots N$ is given by
$$
P = I - A \left( A^t A \right) A^t
$$
    
</div>

### 3.1.2 Examples

In [201]:
LAcodes.title("Projection Matrix Onto a Line")
v = [1; 1; 2]
P = v * inv(v'*v) * v'
# For a single vector, this reduces to
P = v * v' // (v'*v)

LAcodes.title( "The projection matrix onto the line span{$v} is", sz=10 )
LAcodes.ge_layout( P )
LAcodes.title( "The projection matrix onto the plane orthogonal to span{$v} is", sz=10 )
LAcodes.ge_layout( I-P )

println("Check P v = v: $(P*v - v)")
println("Check (I - P) v = 0: $((I-P)*v)")

0,1,2
1 ⁄ 6,1 ⁄ 6,1 ⁄ 3
1 ⁄ 6,1 ⁄ 6,1 ⁄ 3
1 ⁄ 3,1 ⁄ 3,2 ⁄ 3


0,1,2
5 ⁄ 6,-1 ⁄ 6,-1 ⁄ 3
-1 ⁄ 6,5 ⁄ 6,-1 ⁄ 3
-1 ⁄ 3,-1 ⁄ 3,1 ⁄ 3


Check P v = v: Rational{Int64}[0//1, 0//1, 0//1]
Check (I - P) v = 0: Rational{Int64}[0//1, 0//1, 0//1]


In [227]:
LAcodes.title("Projection Matrix Onto a Plane")
v1  = [1; 0; -1]
v2  = [1; 1;  0]
A   = [v1 v2]

# find the inverse of A'A (over the rationals, to make it easy to see)
AtA    = A'A
invAtA = [ AtA[2,2] -AtA[1,2]; -AtA[2,1] AtA[1,1] ] // (AtA[1,1]*AtA[2,2]-AtA[1,2]*AtA[2,1])
#println("Check the inverse: inv(A'A)*(A'A) == I"); Base.print_matrix(stdout, invAtA*AtA)

P  = A * invAtA * A'

LAcodes.title( "The projection matrix onto the plane spanned by {v1, v2} is", sz=10 )
LAcodes.ge_layout( P )
LAcodes.title( "The projection matrix onto the line orthogonal to the plane spanned by {v1, v2} is", sz=10 )
LAcodes.ge_layout( I-P )

println("Check P v = v: $(P*v - v)")
println("Check (I - P) v = 0: $((I-P)*v)")

0,1,2
2 ⁄ 3,1 ⁄ 3,-1 ⁄ 3
1 ⁄ 3,2 ⁄ 3,1 ⁄ 3
-1 ⁄ 3,1 ⁄ 3,2 ⁄ 3


0,1,2
1 ⁄ 3,-1 ⁄ 3,1 ⁄ 3
-1 ⁄ 3,1 ⁄ 3,-1 ⁄ 3
1 ⁄ 3,-1 ⁄ 3,1 ⁄ 3


Check the inverse: inv(A'A)*(A'A) == I
 1//1  0//1
 0//1  1//1Check P v = v: Rational{Int64}[-2//3, 2//3, -2//3]
Check (I - P) v = 0: Rational{Int64}[2//3, -2//3, 2//3]


**Remark:** since the equation for the projection onto a line is simpler,
the problem above might also be approached by computing a single basis vector for $\mathscr{N}(A^t)$
and using it to obtain the orthogonal projection matrix onto this line

In [249]:
LAcodes.title( "Projection matrix onto "*L"$\mathscr{N}(A^t)$")
# GJ for [A I] to get the basis vector
rref_AI = Int64.(rref( [A [1 0 0; 0 1 0; 0 0 1]] ))
v3      = rref_AI[3,3:end]
LAcodes.ge_layout( rref_AI, pivots=[((1,-3), (2,-2))], col_divs=2)

0,1,2,3,4,5
1,0,,0,0,-1
0,1,,0,1,0
0,0,,1,-1,1


In [250]:
# now we have the single vector case
P_N_At = v3 * v3' //  (v3'*v3)
P_C_A  = I - P_N_At
LAcodes.title( "The projection matrix onto the plane spanned by {v1, v2} is", sz=10 )
LAcodes.ge_layout( P_C_A )
LAcodes.title( "The projection matrix onto the line orthogonal to the plane spanned by {v1, v2} is", sz=10 )
LAcodes.ge_layout( P_N_At )

0,1,2
2 ⁄ 3,1 ⁄ 3,-1 ⁄ 3
1 ⁄ 3,2 ⁄ 3,1 ⁄ 3
-1 ⁄ 3,1 ⁄ 3,2 ⁄ 3


0,1,2
1 ⁄ 3,-1 ⁄ 3,1 ⁄ 3
-1 ⁄ 3,1 ⁄ 3,-1 ⁄ 3
1 ⁄ 3,-1 ⁄ 3,1 ⁄ 3


## 3.2 Orthogonal Matrices and Unitary Matrices

### 3.2.1 A Naive Construction Method for Orthogonal Matrices

Can we find sets of mutually orthogonal vectors?

One way to do this is to use the fundamental theorem of linear algebra:<br>
The span of the columns of a matrix $A$ is orthogonal to the null space $\mathscr{N}(A^t)$

We can proceed one vector at a time:
* starting with a single vector $v_1$, set $A=(v_1)$ and use GE on $(A I)$ to find a vector $v_2 \in \mathscr{N}(A^t)$
* set $A=(v_1\; v_2)$ and use GE on $(A I)$ to find a vector $v_3 \in \mathscr{N}(A^t)$
* set $A=(v_1\; v_2\; v_3)$ and use GE $\dots$
We will have constructed a set of mutually orthogonal vectors

**Remark:** this method works well in exact arithmetic. It is not useful for actual computations (**see Gramm-Schmidt for a better method**)

In [273]:
LAcodes.title("An Example of Mutually Orthogonal Vectors")
v  = [1//1; 2; -1; 1]
A  = [v zeros(Int64,4,3) Matrix(1I,4,4)]
E1 = [1 0 0 0; -2 1 0 0; 1 0 1 0; -1 0 0 1] ; A1 = E1*A
LAcodes.title("first vector", sz=10)
LAcodes.ge_layout( A, [E1 A1], [], col_divs=4)

A[:,2]=A1[2,5:end]; A1=E1*A
E2 = [1//1 0 0 0; 0 1 0 0; 0 2//5 1 0; 0 -2//5 0 1] ; A2 = E2*A1
LAcodes.title("add in second vector, complete GE for this vector and the right hand side (nothing else changes)", sz=10)
LAcodes.ge_layout( A, [E1 A1; E2 A2], [], col_divs=4)

A[:,3]=A2[3,5:end]; A1=E1*A; A2=E2*A1
E3 = [1//1 0 0 0; 0 1 0 0; 0 0 1 0; 0 0 1//6 1] ; A3 = E3*A2
A[:,4]=A3[4,5:end]

LAcodes.title("add in the third vector, complete GE for this vector and the right hand side (nothing else changes)", sz=10)
LAcodes.ge_layout( A, [E1 A1; E2 A2; E3 A3], [], col_divs=4)
LAcodes.title("add in the last vector, no more computation required", sz=10)

0,1,2,3,4,5,6,7,8,9,10,11,12,13
,,,,,1,0,0,0,,1,0,0,0
,,,,,2,0,0,0,,0,1,0,0
,,,,,-1,0,0,0,,0,0,1,0
,,,,,1,0,0,0,,0,0,0,1
1.0,0.0,0.0,0.0,,1,0,0,0,,1,0,0,0
-2.0,1.0,0.0,0.0,,0,0,0,0,,-2,1,0,0
1.0,0.0,1.0,0.0,,0,0,0,0,,1,0,1,0
-1.0,0.0,0.0,1.0,,0,0,0,0,,-1,0,0,1


0,1,2,3,4,5,6,7,8,9,10,11,12,13
,,,,,1,-2,0,0,,1,0,0,0
,,,,,2,1,0,0,,0,1,0,0
,,,,,-1,0,0,0,,0,0,1,0
,,,,,1,0,0,0,,0,0,0,1
1.0,0,0.0,0.0,,1,-2,0,0,,1,0,0,0
-2.0,1,0.0,0.0,,0,5,0,0,,-2,1,0,0
1.0,0,1.0,0.0,,0,-2,0,0,,1,0,1,0
-1.0,0,0.0,1.0,,0,2,0,0,,-1,0,0,1
1.0,0,0.0,0.0,,1,-2,0,0,,1,0,0,0
0.0,1,0.0,0.0,,0,5,0,0,,-2,1,0,0


0,1,2,3,4,5,6,7,8,9,10,11,12,13
,,,,,1,-2,1 ⁄ 5,-1 ⁄ 6,,1,0,0,0
,,,,,2,1,2 ⁄ 5,-1 ⁄ 3,,0,1,0,0
,,,,,-1,0,1,1 ⁄ 6,,0,0,1,0
,,,,,1,0,0,1,,0,0,0,1
1.0,0,0,0.0,,1,-2,1 ⁄ 5,0,,1,0,0,0
-2.0,1,0,0.0,,0,5,0,0,,-2,1,0,0
1.0,0,1,0.0,,0,-2,6 ⁄ 5,0,,1,0,1,0
-1.0,0,0,1.0,,0,2,-1 ⁄ 5,0,,-1,0,0,1
1.0,0,0,0.0,,1,-2,1 ⁄ 5,0,,1,0,0,0
0.0,1,0,0.0,,0,5,0,0,,-2,1,0,0


In [275]:
LAcodes.title("Check that the columns of the resulting matrix are orthogonal", sz=10)
A=A[:,1:4]
LAcodes.ge_layout(A'A)

0,1,2,3
7,0,0,0
0,5,0,0
0,0,6 ⁄ 5,0
0,0,0,7 ⁄ 6


**Definition:** An **orthogonal matrix** is a square matrices $Q$ with *orthonormal* columns: $Q^t Q = I$.<br>
                A **unitary matrix** is a square matrix $Q$ such that $Q^H Q = I.$ ($Q^H$ is the conjugate transpose of $Q$, aka the hermitian transpose.)

Since $Q$ is square and $Q^t Q = I$, we have $Q^{-1} = Q^t$

**Remark:** the matrix must be square. Otherwise $Q^t$ is a left inverse only!

### 3.2.2 Important Examples

<div style="background-color:#F2F5A9">

* A useful example of orthogonal matrices is the **Haar matrix**. See [wikipedia](https://en.wikipedia.org/wiki/Haar_wavelet)
* A useful example of a unitary matrix is the **Discrete Fourier Transform (DFT) matrix**. See [wikipedia](https://en.wikipedia.org/wiki/DFT_matrix)
    </div>

### 3.2.3 Important Properties of Orthogonal and Unitary Matrice

Applying orthogonal and unitary matrices to a vector does not change the length of a vector,
and does not change the angle between vectors:<br>
Computations involving orthogonal vectors will not overflow!

**Theorem:** Let $Q$ be an orthogonal (or a unitary) matrix, and let $x$ and $y$ be vectors of consistent length
* $\lVert Q x \rVert =  \lVert x \rVert, \quad\quad$ i.e., lengths are conserved
*  $< Q x, Q y > = <x,y>,\quad$ i.e., angles are conserved
*  $Q x \perp Q y \Leftrightarrow x \perp y, \quad$ i.e. orthgonal anlges are conserved.

## 3.3 Gramm-Schmidt Orthogonalization

# 4. Gramm-Schmidt Orthogonalization ...
$\dots$