In [1]:
using LinearAlgebra, RowEchelon, LaTeXStrings, Plots, SymPy
include("LAcodes.jl");
LAcodes.title( "Orthogonality", sz=30, color="darkred")

# 1. Adding Vector Length to Vector Spaces

## 1.1 Basic Definitions

The definitions below are carefully written to generalize to arbitrary vector spaces.

#### **Allow Both Real and Complex Numbers**

Adding the notion of the **length of a vector** to a vector space proceeds in two steps.<br>
In the following, we allow the scalars to be $\mathbb{F} = \mathbb{R}, \mathbb{Q}$ or $\mathbb{C}$.

To allow complex numbers, we need to allow complex conjugate:<br>
a bar over an expression signifies complex conjugation.

**Example:**  $\overline{3 + 2 i} = 3 - 2 i.$<br>
$\quad\quad\quad\;$ Note that for real numbers $\overline{x} = x.$

#### **Inner Products (Dot Product)**

<div style="background-color:#F2F5A9">
    
**Definition:** An **inner product space** is a vector space $V$ over the scalars $\mathbb{F}$
 with a function $\; \cdot : V \times V \rightarrow \mathbb{F}$<br> $\quad\quad$ with the following properties
$\forall x,y,z \in V, \; \forall \alpha\in \mathbb{F}:$
$$
\begin{align}
 &x \cdot y          &=& \;\overline{ y \cdot x } & \text{(conjugate symmetry)             } \\
 &x \cdot (\alpha y)  &=& \;\alpha x \cdot y       & \text{(linearity in the second argument)} \\
 &x \cdot ( y+z )     &=&\; x \cdot y + x \cdot z        & \\
\end{align}
$$
and
$$
x \cdot x\quad\quad\quad  = \left\{ \begin{align}& c > 0 & \quad x \ne 0\\ & 0 \quad & \text{ otherwise} \end{align} \right. \quad\quad \text{ (positive definite)}
$$
</div>

**Remark:** The complex conjugate is required for complex numbers, since the dot product is not positive definite.
$$
\begin{align} (\alpha + \beta i ) \cdot (\alpha + \beta i ) & = \alpha^2 - \beta^2 + 2 \alpha \beta\; i  \\
 \overline{(\alpha + \beta i )} \cdot (\alpha + \beta i ) & = \alpha^2 + \beta^2  \\
\end{align}
$$

> **Note** that for modulo 2 arithmetic, i.e., $\mathbb{F} = \mathbb{Z}_2$ the dot product is not positive definite.
> 
> <span style="color:red;"><strong>ALL THEOREMS FROM HERE ON REQUIRE THE POSITIVE DEFINITE PROPERTY of the dot product</strong></span>
>
> This is why the Fundamental theorem is broken into two parts!

#### **Distance**

Inner products can be used to define a **distance** function, i.e.,
<div style="background-color:#F2F5A9;float:left;width:18cm;">

**Definition:** A **metric** for a set $M$ is a function $d : M \times M \rightarrow \mathbb{R}$<br>
$\quad\quad$ with the following properties
$\forall x,y,z \in V, \; \forall \alpha\in \mathbb{F}:$
$$
\begin{align}
d(x,y) & = 0 \; \Leftrightarrow \; x = y & \\
d(x,y) & = d(y,x) & \\
d(x,y) & \le d(x,z) + d(z,y) & \quad\quad \text{ ( triangle inequality ) }\\
\end{align}
$$

**Remark**: The axioms for a metric guarantee $$d(x,y) \ge 0$$
</div><img src="NormAndDistance.svg" style="float:center;" width=250>

#### **Length of a Vector**

<div style="background-color:#F2F5A9;width:18cm;">

**Definition:** The **norm** of a vector $v$ in an inner product space
    $$\lVert v \rVert = \sqrt{ \overline{v} \cdot v }$$
**Definition:** The **distance** between two vectors $x$ and $y$ in an inner product space is
    $$ d(x,y) = \lVert x-y \rVert $$

</div>

##### **Remarks:**

* For the dot product in $\mathbb{R}^2$ and $\mathbb{R}^3$, this definition yields the Euclidean length of a vector. E.g.,
$$\lVert \begin{pmatrix}v_1\\v_2 \end{pmatrix} \rVert = \sqrt{ v_1^2 + v_2^2 }$$
* The definition of the norm from the inner product shows that
$$
\lVert \alpha v \rVert = \sqrt{ \overline{\alpha v} \cdot \alpha v  } = \ \lvert \alpha \rvert \ \lVert v \rVert
$$

##### **Example:**

Let $$u = \begin{pmatrix} 2 \\ 5 \end{pmatrix}, v = \begin{pmatrix} 3 \\ -1 \end{pmatrix}.$$

$$
\lVert u \rVert = \sqrt{ 2^2 + 5^2 } = \sqrt{29}, \quad \lVert v \rVert = \sqrt{ 3^2 + (-1)^2 } = \sqrt{10}.
$$


$$\lVert 2 u \rVert = \sqrt{ \begin{pmatrix} 4\\10 \end{pmatrix} \cdot \begin{pmatrix} 4\\10 \end{pmatrix} } = \sqrt{ 4 \begin{pmatrix} 2\\5 \end{pmatrix} \cdot \begin{pmatrix} 2 \\ 5 \end{pmatrix} } = 2 \sqrt{ \begin{pmatrix} 2 \\ 5 \end{pmatrix} \cdot \begin{pmatrix} 2 \\ 5 \end{pmatrix} }$$ 

<div style="background-color:#F2F5A9">

**Definition:** A **unit vector** is a vector with norm equal to 1.
    
Such a vector may be constructed from any non-zero vector $u$ with
$$
\hat{u} = \frac{1}{\lVert u \rVert} u
$$
</div>

$$
\begin{align}
    u = \begin{pmatrix} 2 \\ 5 \end{pmatrix} & \Rightarrow \lVert u \rVert = \sqrt{29} \\
                                             & \Rightarrow \hat{u} =\frac{1}{\sqrt{29}}
                                                            \begin{pmatrix} 2 \\ 5 \end{pmatrix}.
\end{align}
$$

## 1.2 Inequalities, Angle, Orthogonal Vectors

#### **Orthogonal Vectors**

<div style="float:left;">

**Constructing Orthogonal Vectors in <span style="color:red;">2D and 3D</span>:**

$$
\begin{align}
\lVert u+v \rVert = \lVert u-v \rVert
& \Leftrightarrow \sqrt{ (u+v) \cdot (u+v) } =  \sqrt{ (u-v) \cdot (u-v) } \\
& \Leftrightarrow      { (u+v) \cdot (u+v) } =       { (u-v) \cdot (u-v) } \\
& \Leftrightarrow      { \lVert u \rVert^2 + \lVert v \rVert^2 + 2 u \cdot v } = 
                       { \lVert u \rVert^2 + \lVert v \rVert^2 - 2 u \cdot v } \\
& \Leftrightarrow      { u \cdot v = 0 } \\ 
\end{align}
$$

**Orthogonal Vectors:**
$$u \perp v \Leftrightarrow u \cdot v = 0$$

</div><div style="float:right;"><img src="OrthogonalDirection.svg" width=300></div>

**Remark:** The construction shown allows the vector $v = 0$.

<div style="background-color:#F2F5A9;">

**Definition:** The **zero vector** in is orthogonal to any other vector in the metric space.
</div>

#### **Generalization: Angles Between Vectors**

<div style="background-color:#F2F5A9">

**Theorem: (Cauchy-Schwartz Inequality)** The inner product between two vectors $u$ and $v$ satisfies $$\lvert \overline{u} \cdot v \rvert \le \lVert u \rVert \ \lVert v \rVert$$
</div>

**Remark:** The equality is trivially satisfied if either $u = 0$ or $v = 0$. When neither of the vectors is zero, we can rewrite this as
$$
-1 \le \frac{ \overline{u} \cdot v }{\lVert u \rVert \ \lVert v \rVert} \le 1
$$

In $\mathbb{R}^2$ this quotient is the cosine of the angle between the vectors $u$ and $v$.<br>
we therefore generalize this to
<div style="background-color:#F2F5A9">

$$
\cos ( \angle (u,v) ) = \frac{ \overline{u} \cdot v }{\lVert u \rVert \ \lVert v \rVert}, \quad \text{ where } u \ne 0, v \ne 0
$$
</div>

**Remarks:**
* orthogonal non-zero vectors have $\; cos\ 90^\circ = 0$,<br>we recover our previous result
$$ u \cdot v = 0 \Leftrightarrow u \perp v $$
* the definition for the angle is frequently rewritten
$$ \overline{u} \cdot v = \lVert u \rVert\ \lVert v \rVert \ \cos ( \angle (u,v) )$$
which holds for all vectors $u, v$ (including zero vectors). 

##### **Example:**

Let $u = \begin{pmatrix} 1 \\ 5 \\ 3 \end{pmatrix}, \;\; v = \begin{pmatrix} 4 \\ 1 \\ 1 \end{pmatrix}$

The **angle between $u$ and $v$**  is $\theta = arccos \frac{u \cdot v}{ \lVert u \rVert\ \lVert v \rVert } 
 = arccos \frac{12}{ \sqrt{35}\ \sqrt{18} } 
 \approx $ 61.44 degrees

The **distance from $u$ to $v$** is $\quad\quad\quad\quad\quad\quad\lVert v - u \rVert =\ \lVert\ \begin{pmatrix} 3 \\ -4 \\ -2 \end{pmatrix}\ \rVert\; \approx 5.39$

A **detour** via $w = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}$ increases the distance to $\lVert v-w \rVert + \lVert w-u \rVert  \approx 7.42$

# 2. Fundamental Theorem of Linear Algebra (Part 2)

## 2.1 Main Definitions and Theorem

### 2.1.1 Linear Independence of Orthogonal Vectors

<div style="float:left;background-color:#F2F5A9;width:15cm;">

**Theorem:** Non-zero orthogonal vectors are **linearly independent.**

</div><div style="float:center;">
$\quad\quad$ Let $u \perp v \Leftrightarrow u \cdot v = 0.$ 
    
$\quad\quad\begin{align}
\alpha u + \beta v = 0 & \Rightarrow \alpha u \cdot u + \beta u \cdot v = 0 \\
                       & \Rightarrow \alpha \lVert u \rVert^2 = 0.\\
\end{align}$
</div>

<div style="float:left;background-color:#F2F5A9;width:15cm;">

**Fundamental Theorem:** Given a matrix $A$ in $\mathbb{R}^{M \times N}$
* Any two vectors $r \in \mathscr{R}(A), n \in \mathscr{N}(A)$ are orthogonal, i.e., $r \perp n$
* Any two vectors $c \in \mathscr{C}(A), \tilde{n} \in \mathscr{N}(A^t)$ are orthogonal, i.e., $c \perp \tilde{n}$


---
**Remark:**
* The sketch of the Fundamental Theorem presented previously<br>
    depicts this situation accurately!<br><br>
    **Vectors in the two fundamental spaces<br>in the domain and the codomain are orthogonal**.
</div>
<div style="float:center;">$\quad$<img src="FundamentalTheorem_0.svg" width=250></div>

> Start with a vector in the nullspace $\mathscr{N}(A),$ i.e., $A x = 0:$
> 
> $\quad\quad A x = 0 \Rightarrow \text{rows of } A \cdot x = 0, $ so each row of $A$ is orthogonal to $x$!

> Since any vector in the row space can be written as a linear combination of the rows,<br>
> $\quad\quad (\alpha_1 R_1 + \alpha_2 R_2 + \dots \alpha_M R_M ) \cdot x = 0.$

> $ \therefore \quad$ **any vector in $\mathscr{R}(A)$ is orthogonal to any vector in $\mathscr{N}(A)$**.

> **Remark:**<br>
> Given one or more vectors $a_i, i=1,2, N$, we now know<br>
**how to find a vector that is orthogonal**
to the hyperplane $span \{ a_1, a_2, \dots a_n \}:$
>
> * write the $a_i$ into a matrix $A$ as rows, and find a vector in the nullspace (a homogeneous solution of $A x = 0$ ).

### 2.1.2 Mutually Orthogonal Vectors

##### **Example**

> Look at the following 3 Vectors<br><br>
> $ a_1 = \begin{pmatrix} -4\\ -8\\  1 \end{pmatrix}, \;
  a_2 = \begin{pmatrix}  7\\ -4\\ -4 \end{pmatrix}, \;
  a_3 = \begin{pmatrix}  4\\ -1\\  8 \end{pmatrix} \quad \Rightarrow \quad
 \left\{ \begin{align}  a_1 \cdot a_1 =  81, &\quad a_2 \cdot a_2 =  81, \quad a_3 \cdot a_3 = 81, \\
                        a_1 \cdot a_2 = \;0, &\quad a_1 \cdot a_3 = \;0, \quad\; a_2 \cdot a_3 = \;0. \end{align} \right.
$

> The dot products show
> * each of the vectors $a_i$ has length $\lVert a_i \rVert = \sqrt{ a_i \cdot a_i } = 9.$
> * each of the vectors $a_i$ is orthogonal to the other two: $a_i \cdot a_j = 0\;$ for $i \ne j$.

<div style="background-color:#F2F5A9;float:left;width:12cm;">

**Definition:** A set of vectors $a_1, a_2, \dots a_n$ is **mutually orthogonal**$\quad$<br>
$\quad\quad$ iff for all $i, j$ in $1,2,\dots N,$<br><br>
$$
a_i \cdot a_j = \left\{ \begin{align} \; \lVert a_i \rVert^2 \ne 0 \quad  &\ \text{ when } i = j\\ 0 \quad &\ \text{ otherwise} \end{align} \right.
$$

</div>
<div style="float:right;">

**Remarks:**
* If the vectors $a_i$ are the columns of a matrix $A$, then<br>
$\quad\quad$  **the matrix $A^t A$ is diagonal**<br>
$\quad\quad$  with entry $i,i$ equal to $a_i \cdot a_i = \lVert a_i \rVert^2 .$
* the matrix $A^t A$ is symmetric

##### **Example Revisited**

<div style="float:left;width:12cm">
> Let $A = \begin{pmatrix} -4 & 7 & 4 \\ -8 & -4 & -1 \\  1 & -4 & 8 \end{pmatrix} \Rightarrow
A^t A = \begin{pmatrix} 81 & 0 & 0 \\ 0 & 81 & 0 \\ 0 & 0 & 81 \end{pmatrix}
$
</div><div style="float:right;">

**Remark:** the example created a square matrix $A$.<br>$\quad\quad$ We can have more entries in a vector<br>
$\quad\quad$ than vectors, e.g.,<br><br>
$\quad\quad A = ( a_1 \; a_2 )$.
    
The resulting matrix $A^t A$ is always **square.**
</div>

##### **Important Remarks:**

* A set of mutually orthogonal vectors is often called an **orthogonal set** of vectors
* The vectors in an **orthogonal set** are **linearly independent.**

* The vectors in an **orthogonal set form a basis** for their span.
* If the vectors in an **orthogonal set** are **unit vectors**, then $A^t A = I$.<br>
  $\quad\quad$ Think $i,j$ coordinate vectors, possibly in $\mathbb{R}^3$.

### 2.1.3 Orthogonal Spaces

<div style="background-color:#F2F5A9;float:left;width:15cm">

**Definition:** A vector space $U$ is orthogonal to a vector space $V$<br>
    $\quad\quad$ iff $\forall u \in U, \forall v \in V, \; u \perp v$

**Definition:** Let $U$ be a subspace of $V$.<br>
    $\quad\quad$ The **orthogonal complement** $U^\perp = \left\{ v \in V \mid \forall u \in U, \ v \perp u \right\}$.

**Theorem:** Given a vector space $U$, then $(U^\perp)^\perp = U$.<br>
**Theorem:** Given two vector spaces $U$ and $V$ such that $U^\perp = V$, then $V^\perp = U$.

</div>
<div style="float:right;"><img src="FundamentalTheorem_1.svg" width=250></div>

<div style="background-color:#F2F5A9;float:left;width:15cm">

**Fundamental Theorem:** Given a matrix $A \in \mathbb{R}^{M \times N}$<br>
    $\quad\quad$ then $\mathscr{R}(A)^\perp = \mathscr{N}(A)$ in $\mathbb{R}^N$.<br>
**Fundamental Theorem:** Given a matrix $A \in \mathbb{R}^{M \times N}$<br>
    $\quad\quad$  then $\mathscr{C}(A)^\perp = \mathscr{N}(A^t)$ in $\mathbb{R}^M$.

**Fundamental Theorem:** Let $A$ be a matrix of size $M \times N.$<br>
    $\quad\quad$  The union of the bases for $\mathscr{C}(A)$ and $\mathscr{N}(A^t)$ is a basis for $\mathbb{R}^M$.<br>
**Fundamental Theorem:** Let $A$ be a matrix of size $M \times N.$<br>
    $\quad\quad$  The union of the bases for $\mathscr{R}(A)$ and $\mathscr{N}(A)$   is a basis for $\mathbb{R}^N$.
</div>
<div style="float:right;"><img src="FundamentalTheorem_2.svg" width=250></div>

## 2.2 Use the Fundamental Theorem to Decompose a Vector (Naive Method)

Let $A$ be a matrix of size $M \times N$ with rank $r$.<br><br>
Let $\left\{ c_1, c_2, \dots c_r \right\}$ be a basis for $\mathscr{C}(A)$,<br>
and $\left\{ ñ_1, ñ_2, \dots ñ_{M-r} \right\}$ be a basis for $\mathscr{N}(A')$.<br>

The combined basis $\quad\left\{\  c_1, c_2, \dots c_r,\;  ñ_1, ñ_2, \dots ñ_{M-r} \ \right\}$ $\quad\quad$
is a basis for $\mathbb{R}^M$.

##### **Example:**

> Let $A = \begin{pmatrix}  1  & 2 \\ 2 & 2 \\ -1 & 1 \end{pmatrix}$.
>
> A quick computation shows
> $$
\text{basis } \mathscr{C}(A) = \left\{\;
c_1= \begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix}, \; c_2 = \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} \; \right\},\quad\quad
\text{basis } \mathscr{N}(A^t) = \left\{\ \tilde{n}\ = \ \begin{pmatrix} -4 \\ 3 \\ 2 \end{pmatrix} \; \right\}.
$$

> Combining the two bases yields a basis for $\mathbb{R}^3$:
> $$
\text{basis } \mathbb{R}^3 = \left\{\;
\begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix}, \;  \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} \;
\begin{pmatrix} -4 \\ 3 \\ 2 \end{pmatrix} \; \right\}.
$$
>
> $\therefore\quad$ **Any vector $b$ in $\mathbb{R}^3$ can be written as a unique linear combination**
$\quad
b = \alpha_1 c_1 + \alpha_2 c_2 + \alpha_3 \tilde{n}. 
$

#### **Conclusion**

##### **Sketch**

<div style="background-color:#F2F5A9;float:left;">

A consequence of the Fundamental Theorem Part II for a matrix $A \in \mathbb{R}^{M \times N}$:
    
Given  $\left\{ c_1, c_2, \dots c_r \right\},$ a basis for $\mathscr{C}(A)$,<br>
and $\left\{ ñ_1, ñ_2, \dots ñ_{M-r} \right\},$ a basis for $\mathscr{N}(A')$.<br>

> Any vector $b \in \mathbb{R}^M$ can be written as a linear combination of these vectors:
$$
\begin{align}
&b           \; = \color{blue}{b_{\parallel}} + \color{red}{b_{\perp}},  &\\
 \text{ where }\quad &\color{blue}{b_{\parallel}  = \alpha_1 c_1 + \alpha_2 c_2 \dots + \alpha_r c_r} &\\
 \text{ and }\quad &\color{red}{b_\perp        = \beta_1 ñ_1 + \beta_2 ñ_2 \dots \beta_{M-r} ñ_{M-r}}.&
\end{align}
$$
</div>
<div style="float:right;">
The result is depicted in the following Figure:<br>
    (note this is the codomain of $y=A x$ turned on its side)<br><br>
<img src="./NormalEquations.svg"  width="350">
</div>

<div style="float:center;>
* the red vector $b_\parallel$ is the part of $b$<br>
    that lies in the $\mathscr{C}(A)$ hyperplane (the linear combination formed with $\alpha_i c_i$)<br>
  it is the orthogonal projection $Proj_{\mathscr{C}(A)}^\perp b$ onto the column space $\mathscr{C}(A)$
* the blue vector $b_\perp$ is the part of $b$<br>
    that lies in the $\mathscr{N}(A')$ hyperplane (the linear combination formed with $\beta_j ñ_j$)$\quad\quad$
* these two vector components are orthogonal.
            </div>

<div style="float:left;">$\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad$</div>
<div style="float:left;">

* the red vector $b_\parallel$ is the part of $b$<br>
that lies in the $\mathscr{C}(A)$ hyperplane (the linear combination formed with $\alpha_i c_i$)<br>
it is the orthogonal projection $Proj_{\mathscr{C}(A)}^\perp b$ onto the column space $\mathscr{C}(A)$
* the blue vector $b_\perp$ is the part of $b$<br>
that lies in the $\mathscr{N}(A')$ hyperplane (the linear combination formed with $\beta_j ñ_j$)
* these two vector components are orthogonal.
</div>

##### **Example Continued**

> **Decompose a Vector into Two Orthogonal Components ( <span style="color:red;"> Naive Method</span> )** 

> $\quad\quad\;$ Let $A = \begin{pmatrix}  1  & 2 \\ 2 & 2 \\ -1 & 1 \end{pmatrix}$, the matrix from the previous example.
>
> **Step 1:** We had found the bases
>
>
> $$
\text{basis } \mathscr{C}(A) = \left\{\;
c_1= \begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix}, \; c_2 = \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} \; \right\}, \quad\quad
\text{basis } \mathscr{N}(A^t) = \left\{\ \tilde{n}\ = \ \begin{pmatrix} -4 \\ 3 \\ 2 \end{pmatrix} \; \right\}.
$$

> **Step 2:** Split the vector $b = \begin{pmatrix} -3 \\ 7 \\ -2 \end{pmatrix}\;$
into two orthogonal components: $b = \color{red}{b_{//}} + \color{blue}{b_\perp}\;$,
where $\color{red}{b_{//}}$ is in $\mathscr{C}(A)$ and $\color{blue}{b_\perp}$ is in $\mathscr{N}(A^t)$
 

> $\quad\quad$ We need to solve $\quad b = \color{red}{ \alpha_1 c_1 + \alpha_2 c_2 } + \color{blue}{ \alpha_3 \tilde{n}} \; \Leftrightarrow \;
\begin{pmatrix} \color{red}1 & \color{red}2 & \color{blue}{-4} \\ \color{red}2 & \color{red}2 & \color{blue}3 \\  \color{red}{-1} &  \color{red}1 &  \color{blue}2 \end{pmatrix}
\begin{pmatrix} \color{red}{\alpha_1} \\ \color{red}{\alpha_2} \\ \color{blue}{\alpha_3} \end{pmatrix} = \begin{pmatrix} -3 \\ 7 \\ -2 \end{pmatrix}.
$

> $\quad\quad$ This yields
> $$
\begin{align}
\color{red}{b_{//}}   &= \color{red}{3 a_1 - a2} &=& \; \color{red}{\begin{pmatrix} 1 \\ 4 \\ -4 \end{pmatrix}} \\
\color{blue}{b_\perp} &= \color{blue}{\tilde{n}} &=& \; \color{blue}{\begin{pmatrix} -4 \\ 3 \\ 2 \end{pmatrix}} \\
\end{align}
$$

> **Check:** Let's check orthogonality: $\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\color{red}{b_{//}} \cdot \color{blue}{b_\perp} = 0.$

> ---
> This computation was involved: we needed to
> * **Step 1:** find the bases of both $\mathscr{C}(A)$ and $\mathscr{N}(A^t)$
> * **Step 2:** Solve the resulting $A x = b$ type problem
> * **Step 3:** Split the resulting column view decomposition of $b = \color{red}{b_\parallel} + \color{blue}{b_\perp}.$
>
> **It turns out we can do better!**

## 2.3 Use the Fundamental Theorem to Decompose a Vector (Refinement)

### 2.3.1 Key Observation 1: Decomposing a Vector into Orthogonal Components

**Key observation:** decomposition of a vector $b$
<div style="float:left;">
> Any vector $b \in \mathbb{R}^M$ can be written as a linear combination:
$$
\begin{align}
                         &b                         \;&=\;& \color{red}{b_{\parallel}} + \color{blue}{b_{\perp}},\\
\text{ where }\quad\quad &\color{red}{b_{\parallel}}  &=\;& \color{red}{\alpha_1 c_1 + \alpha_2 c_2 \dots + \alpha_r c_r} &\\
\text{ and }  \quad\quad &\color{blue}{b_\perp}       &=\;& \color{blue}{\beta_1 ñ_1 + \beta_2 ñ_2 \dots \beta_{M-r} ñ_{M-r}},&
\end{align}
$$
the $c_i$ vectors form a basis for a hyperplane containing $b_\parallel$,<br>
the $ñ_j$ vectors form a basis for the orthogonal complement of this hyperplane.
</div><div style="float:right;">
<img src="./NormalEquations.svg"  width="350">
</div>

----
> The basic idea is to replace the system of equations for the coefficients $\alpha_i, \beta_j$ by taking dot products with each of the $c_j$ in the column space $\mathscr{C}(A)$:
> $$
\begin{align}
(\xi) & \Leftrightarrow b &\;=\;& \color{red}{\alpha_1 c_1 + \alpha_2 c_2 \dots + \alpha_r c_r} + \color{blue}{b_\perp} \\
      & \Rightarrow \color{red}{c_j} \cdot b &\;=\;& \color{red}{\alpha_1 c_j \cdot c_1 + \alpha_2  c_j \cdot c_2 \dots + \alpha_r  c_j \cdot c_r} + \color{red}{c_j} \cdot \color{blue}{b_\perp}, \quad\quad j=1,2, \dots r
\end{align}
$$
>
> Since $\color{red}{c_j} \perp \color{blue}{b_\perp}$, the $\color{red}{c_j} \cdot \color{blue}{b_\perp} = 0$: **we are left with a set of equations that only involve the unknown coefficients $\color{red}{\alpha_i}$!**

##### **Example**

> Let's return to $A = \begin{pmatrix}  1  & 2 \\ 2 & 2 \\ -1 & 1 \end{pmatrix}$, which has
$
\text{basis } \mathscr{C}(A) = \left\{\;
c_1= \begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix}, \; c_2 = \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} \; \right\} \quad
$ and the vector $b = \begin{pmatrix} -3 \\ 7 \\ -2 \end{pmatrix}\;$

> Set $ \alpha_1  \begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix} + \alpha_2 \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix} + b_\perp = \begin{pmatrix} -3 \\ 7 \\ -2 \end{pmatrix}.$
>

> Taking the dot product with $c_1$ and $c_2$ yields
> $$ \left.
\begin{align}
c_1 : \quad & 6 \alpha_1 + 5 \alpha_2 =& 13 \\
c_2 : \quad & 5 \alpha_1 + 9 \alpha_2 =& 6 \\
\end{align}
\right\} \Leftrightarrow \left\{ \begin{aligned}
\alpha_1 =&\; 3 \\ \alpha_2 =& -1, \end{aligned} \right.
$$
the same solution we found before.

### 2.3.2 Key Observation 2: No Need to Identify the Column Space of $A$ and the Null Space of $A^t$

**Key observation:** we do **not need to identify the fundamental spaces!**

> Keeping all the columns of $A$ instead of just the pivot columns $c_j, j=1,2,\dots r$<br>
potentially yields an infinite number of ways of expressing the same solution $\color{red}{b_\parallel}:$<br>
We would get the same solution as before by setting the additional free variables equal to zero.

**Key observation:** remember that $b = \color{red}{b_\parallel} + \color{blue}{b_\perp}$.

> We are given vectors $\color{blue}{a_1, a_2, \dots a_n}$ and $b$.
>
> Once we compute $\color{red}{b_\parallel}$, we obtain $\color{blue}{b_\perp} = b - \color{red}{b_\parallel}.$

### 2.3.3 The Final Touch: Rewrite the Equations in Matrix Form

 Finally, observe that we can rewrite our equations in matrix form:
 
 $$
 b \;=\; \color{red}{\alpha_1 a_1 + \alpha_2 a_2 \dots + \alpha_N a_N} + \color{blue}{b_\perp}
 \Leftrightarrow b =  \color{red}{A x} + \color{blue}{b_\perp}.
 $$
 
 Taking the dot products with the columns $ \color{red}{a_j = 1, 2, \dots N}$ of $A$ yields the **normal equation**
 $$
 \color{red}{A^t A x = A^t b}
 $$
 

----
All we need is any one solution, however: we get $ \color{red}{b_\parallel = A x}$ for some vector $x$:
 thus $\quad b =  \color{red}{A x} +  \color{blue}{b_\perp}$.
  <br><br>
  * Multiplying $b =  \color{red}{A x} +  \color{blue}{b_\perp}$ by $A^t$ from the left still zeroes out the $ \color{blue}{b_\perp}$ term: we are left with the equations
> $$
   \begin{align}
   &A^t A x     &= A^t b &\quad \text{ known as the }\textbf{normal equation} \\
   &b_\parallel &= A x   &
   \end{align}\label{eq1}\tag{1}
$$
> To solve for $b_\perp$, it is sufficient to realize that $b = b_\parallel + b_\perp$, so
> $$
b_\perp  = b - b_\parallel \label{eq2}\tag{2}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad 
$$

# 3. The Normal Equation

## 3.1 Basic Properties of the Normal Equation

### 3.1.1 The Normal Equation

<div style="background-color:#F2F5A9;">
    
The **normal equation:** Given a set of vectors $\{ \; a_1, a_2 \dots a_k \; \}$ in $\mathbb{R}^n$,<br>
$\quad\quad$ and let $A = ( a_1 \ a_2 \ \dots a_k ).$
    
$$A^t A x = A^t b$$
allows us to decompose vectors $b$ into two orthogonal components $$b = b_\parallel + b_\perp$$
such that $b_\parallel \in S = span\{  a_1, a_2 \dots a_k \}$ and
$b_\perp \in S^\perp$:
$$
   \begin{align}
   &A^t A x     &=&\ A^t b  \\
   &b_\parallel &=&\ A x    \\
   &b_\perp     &=&\ b - b_\parallel
   \end{align}.
$$
</div>

The key observation was that the multiplication of $A x = b = b_\parallel + b_\perp$ by $A^t$ zeros out the $b_\perp$ term.

<div style="background-color:#F2F5A9">
    
**Theorem:** $\quad\quad\quad\quad\quad\mathscr{N}(A) = \mathscr{N}(A^t A)$<br><br>

</div>

which **guarantees that the set of solutions of the normal equation is identical to the set of solution of $A x = b_\parallel$**.

<div style="float:left;">
The proof is of interest:

* Let $x$ be a solution of $A x = 0.$ Multiplying by $A^t$ from the left yields $A^t A x = 0,$<br>
    and therefore $\mathscr{N}(A) \subseteq \mathscr{N}(A^t A).$
* Let $x$ be a solution of $A^t A x = 0.$<br>
    $$\begin{align}
    A x = 0 \Rightarrow & x^t A^t A x = 0\\
            \Rightarrow & (A x )^t (A x ) = 0 \\
            \Rightarrow & \lVert A x \rVert = 0 \\
            \Rightarrow & A x = 0,
    \end{align}
    $$
    and therefore $\mathscr{N}(A^t A) \subseteq \mathscr{N}(A).$
</div><div  style="float:right;"><img src="VennDiagram_AtA.svg" width=300>
</div>

### 3.1.2 The Equivalent Minimization Problem

<div style="float:left;">
The triangle inequality shows that the solutions to the normal equations also solve
the problem
<div style="background-color:#F2F5A9">
    
$$
x^* = \arg\min_x { \lVert b - A x \rVert }
$$<br>
</div>
since $b_\parallel = b - A x^*$ is the shortest vector from a point in $\mathscr{C}(A)$ to $b$.<br><br>
Note that the solution need not be unique: as before, we are interested in
$$
\begin{align}
   &b_\parallel &=&\ A x^*    \\
   &b_\perp     &=&\ b - b_\parallel,
\end{align}
$$
so homogeneous solutions do not enter.

<div style="background-color:#F2F5A9">

**Definition:** The distance $d( b, \mathscr{C}(A) ) = \lVert b_\perp \rVert.$<br><br>
    
</div></div><div style="float:right;"><img src="NormalEqMinDist.svg" width=350></div>

### 3.1.3 Example: Projection Onto a Hyperplane

##### **Problem**

Let $A = \begin{pmatrix}
 1 & 3 &4 \\ 
 5 & 3 &8 \\
 1 &-1 &0 \\
 2 & 2 &4
\end{pmatrix}, \quad$ and
$\; b = \begin{pmatrix} 15\\ 23\\ -2\\ 9 \end{pmatrix}.$

##### **Normal Equation**

$$A^t A x = A^t b \;\Leftrightarrow\; \left( \begin{array}{rrr|r}
 31 & 21 & 52 & 146\\
 21 & 23 & 44 & 134\\
 52 & 44 & 96 & 280 \end{array} \right) \quad \Leftrightarrow x =\begin{pmatrix} 0 \\ 2 \\ 2 \end{pmatrix}
$$

##### **Split the $b$ Vector**

$b = b_\parallel + b_\perp,$ where

$$
b_\parallel = A x = \begin{pmatrix} 14 \\ 22 \\ -2 \\ 12 \end{pmatrix},\quad \text{ and } \;
b_\perp     = b - b_\parallel = \begin{pmatrix} 1 \\ 1 \\ 0 \\ -3 \end{pmatrix}; \quad\quad \textbf{Check orthogonality: }
b_\parallel \cdot b_\perp = 0.
$$

##### **Distance from $b$ to the hyperplane $\mathscr{C}(A)$**

$$
d(\ b,\ \mathscr{C}(A)\ )\ =\ \lVert b_\perp \rVert\ =\ \sqrt{11}.
$$

## 3.2 Special Case: Projection onto a Line

##### **Problem Statement**

<div style="float:left;width:40%">
Decompose a vector $b = b_\parallel + b_\perp$,  where

* $\quad\quad b_\parallel$ is a vector in the span$\{ a \}$, and $\quad\quad\quad\quad\quad$
* $\quad\quad b_\perp$ is a vector orthogonal to $a$.

**Remark:** Here $A = ( a ),$ $x = ( \alpha ) $
</div>
<img src="NormalProjOntoLine.svg" width=250 style="float:left;">
<div style="float:right;width:6cm;height:3.5cm;border:1px solid black;">
$\quad$ Consider<br>
$$\quad a = \begin{pmatrix} 3 \\ 4 \\ 0 \end{pmatrix}, \quad b =  \begin{pmatrix} 0 \\ 5 \\ 2 \end{pmatrix}.$$
</div>

##### **Step 1: Solve the Normal Equation**

<div style="float:left;width:40%">

$$
\begin{align}
\alpha\ a\ + \ b_\perp =\ b \;& \Rightarrow\;&
\alpha\ a \cdot a      = a \cdot b \; \\
& \Leftrightarrow\;& \alpha = \frac{ a \cdot b }{a \cdot a}.\\
\end{align}
$$
</div><div style="float:left;background-color:#F2F5A9;">

$$
\begin{align}
A x + \ b_\perp =\ b &\; \Rightarrow    \;& A^t A x  = A^t A b \\
                     &\; \Leftrightarrow\;& x = \frac{ a \cdot b }{a \cdot a}.\\
\end{align}
$$
</div>
<div style="float:right;width:6cm;height:2cm;border:1px solid black;">
<br>
$$ \alpha =  \frac{4}{5}$$
</div>

##### **Step 2: Decompose the Vector**

<div style="float:left;width:40%;">
$$
\begin{align}
& b_\parallel        = \ \alpha\ a \quad  =  \frac{ a \cdot b }{a \cdot a}\ a\\
& b_\perp            = \ b - b_\parallel  \\
\end{align}
$$
</div><div style="float:left;background-color:#F2F5A9;height:2cm;">
\begin{align}
\quad b_\parallel        =&\ A x  =&  \frac{ a \cdot b }{a \cdot a}\ a &\quad\quad\quad\quad\quad \\
\quad b_\perp            =&\ b - b_\parallel &
\end{align}   
</div><div style="float:right;width:6cm;height:2cm;border:1px solid black;">
$$
b_\parallel = \frac{1}{5}\begin{pmatrix} 12\\16\\0 \end{pmatrix},\; b_\perp = \frac{1}{5}\begin{pmatrix} -12 \\ 9\\ 2 \end{pmatrix}
$$
</div>

## 3.3 Special Case: the Columns of $A$ are Mutually Orthogonal

The equations simplify considerably when the columns $\{ a_1, a_2, \dots a_N \}$ of $A$<br> are **mutually orthogonal vectors**, i.e, when

$$
a_i \cdot a_j = \left\{ \begin{align} \; \lVert a_i \rVert^2 \ne 0 \quad  &\ \text{ when } i = j\\ 0 \quad &\ \text{ otherwise} \end{align} \right.
$$

The normal equation takes the form
$$
A^t A x = A^t b \Leftrightarrow D x = A^t b \Leftrightarrow x = (A^t A)^{-1} A^t b,
$$
where  $D$ is a diagonal matrix
$$
D = A^t A = \begin{pmatrix} \lVert a_1 \rVert^2 & 0                   & \dots & 0 \\
                    0                   & \lVert a_1 \rVert^2 & \dots & 0 \\
                    \                   &     \               &  \    & 0 \\
                    0                   & 0                   & \dots & \lVert a_N \rVert^2 \end{pmatrix}
$$

---
Assuming $a_i \ne 0, i =1,2, \dots N$, we obtain the following solution:
<div style="background-color:#F2F5A9">

For mutually orthogonal non-zero vectors $a_i, i=1,\dots N$, the normal equations reduce to
$$
\begin{align}
x_i          =&\ \frac{ b \cdot a_i }{ a_i \cdot a_i } \\
b_\parallel  =&\ \sum_{i=1}^{N}{ \frac{ b \cdot a_i }{ a_i \cdot a_i} a_i } \\
b_\perp      =&\ b - \sum_{i=1}^{N}{ \frac{ b \cdot a_i }{ a_i \cdot a_i} a_i } \\
\end{align}
$$

**Remark**:
* the **equations simplify even further when the $a_i$ are mutually orthonormal**, i.e., when $a_i \cdot a_i = 1.$
* Orthonormal Coordinate vectors are NICE TO HAVE!
</div>

#### **Example**

Let $A = \begin{pmatrix}
1 & 1 &  1 \\
1 & 1 & -1 \\
1 &-1 &  0 \\
1 &-1 &  0 \\
\end{pmatrix}, \quad b = \begin{pmatrix} 4\\-4\\8\\12 \end{pmatrix}\;\Rightarrow\;\;
A^t A = \begin{pmatrix} \color{red}4&0&0\\ 0&\color{red}4&0\\ 0&0&\color{red}2\end{pmatrix}
,$ so the columns of $A$ are orthogonal.

##### **Step 1: Solve the Normal Equation**

$A^t A x = A^t b\quad $ yields
$\quad
\begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} =
\begin{pmatrix} \color{red}4&0&0\\ 0&\color{red}4&0\\ 0&0&\color{red}2\end{pmatrix}^{-1} \;
\begin{pmatrix} 20 \\ -20 \\ 8 \end{pmatrix}\;
= \begin{pmatrix} 5 \\ -5 \\ 4 \end{pmatrix}
$

##### **Step 2: Decompose the Vector**

$$
b_\parallel = A x = \begin{pmatrix} 4 \\ -4 \\ 10 \\ 10 \end{pmatrix},\quad
b_\perp  = b - b_\parallel = \begin{pmatrix} 0 \\ 0 \\ -2 \\ 2 \end{pmatrix},\quad
$$

# 4. Take Away

## 4.1 The Fundamental Theorem of Linear Algebra 

<div style="background-color:#F2F5A9;float:left;height:5cm; width:12cm">

**Fundamental Theorem (Part II)**
* $\mathscr{R}(A)^\perp = \mathscr{N}(A)$<br>
* $\mathscr{C}(A)^\perp = \mathscr{N}(A^t)$
    
**Nota Bene:** Part II requires a **positive definite** inner product.
</div>
<img src="FundamentalTheorem.svg" width=300 style="float:center;">

## 4.2 The Normal Equation

<div>
<style type="text/css">
.tftable {font-size:12px;color:#333333;width:100%;border-width: 1px;border-color: #729ea5;border-collapse: collapse;}
.tftable th {font-size:12px;background-color:#acc8cc;border-width: 1px;padding: 8px;border-style: solid;border-color: #729ea5;text-align:left;}
.tftable tr {background-color:#ffffff;}
.tftable td {font-size:12px;border-width: 1px;padding: 8px;border-style: solid;border-color: #729ea5;}
</style>

<table class="tftable" border="1">
<tr><th style="width:6cm;">Equation</th><th style="width:6cm;">Orthogonal Projection</th><th>Comment</th></tr>
<tr><td>$x = \arg\min_x { \lVert b - A x \rVert }$</td><td>$b_\parallel=A x$</td><td>Minimize Distance to $\mathscr{C}(A)$</td></tr>
<tr><td>$A^t A x = A^t b$</td><td>$b_\parallel=A x$</td><td>Remove $\mathscr{N}(A^t)$ component from $b$</td></tr>
<tr><td>$x = \frac{a \cdot b}{a \cdot a}$</td><td>$b_\parallel=\frac{a \cdot b}{a \cdot a}\;a$</td><td>Column Vector Case: $A = a$</td></tr>
<tr><td>$x_i = \frac{a_i \cdot b}{a_i \cdot a_i}$</td><td>$b_\parallel=\sum_i\frac{a_i \cdot b}{a_i \cdot a_i}\;a_i$</td><td>Orthogonal Vectors $a_i$ Case: $A = (a_1 \ a_2 \ \dots)$</td></tr>
<tr><td>$x_i = q_i \cdot b$</td><td>$b_\parallel=\sum_i{q_i \cdot b\;q_i}$</td><td>Orthonormal Vectors $q_i$ Case: $A = (q_1\ q_2 \dots )$</td></tr>
</table>
</div>