In [1]:
using LinearAlgebra, RowEchelon, LaTeXStrings, Plots, SymPy
include("LAcodes.jl")

Main.LAcodes

# 1. Inner Product Spaces and Metrics

## 1.1 Basic Definitions

In the following, a bar over an expression signifies complex conjugation.

<div style="background-color:#F2F5A9">
    
**Definition:** An **inner product space** is a vector space $V$ over the scalars $\mathbb{F}$
 with a function $\; <.,.> : V \times V \rightarrow \mathbb{F}$<br> $\quad\quad$ with the following properties
$\forall x,y,z \in V, \; \forall \alpha\in \mathbb{F}:$
$$
\begin{align}
 &<x,y>          &=& \;\overline{ <y,x> } & \text{(conjugate symmetry)             } \\
 &<x, \alpha y>  &=& \;\alpha <x,y>       & \text{(linearity in the second argument)} \\
 &<x, y+z>     &=& <x, y> + <x, z>        & \\
\end{align}
$$
and
$$
<x,x>\quad = \left\{ \begin{align}& c > 0 & \quad x \ne 0\\ & 0 \quad & \text{ otherwise} \end{align} \right. \quad\quad \text{ (positive definite)}
$$
</div>

These properties are modeled on the dot product:
$$<u,v> = \overline{u} \cdot v$$
is an inner product for $\mathbb{F} = \mathbb{Q}, \mathbb{F} = \mathbb{R}$ and $\mathbb{C} = \mathbb{R}.$<br>
  The complex conjugate is required for complex numbers, since the dot product is not positive definite

**Note** that for $\mathbb{F} = \mathbb{Z}_2$ the dot product is not positive definite.

Inner products can be used to define a **distance** function, i.e.,
<div style="background-color:#F2F5A9">

**Definition:** A **metric** for a set $M$ is a function $d : M \times M \rightarrow \mathbb{R}$<br>
$\quad\quad$ with the following properties
$\forall x,y,z \in V, \; \forall \alpha\in \mathbb{F}:$
$$
\begin{align}
d(x,y) = 0 \Leftrightarrow x= y \\
d(x,y) = d(y,x) \\
d(x,y) \le d(x,z) + d(z,y)\\
\end{align}
$$

**Remark**: The axioms for a metric guarantee $$d(x,y) \ge 0$$
</div>

<div style="background-color:#F2F5A9">

**Definition:** The **norm** of a vector $v$ in an inner product space
    $$\lVert v \rVert = \sqrt{ <v,v> }$$
**Definition:** The **distance** between two vectors $u$ and $v$ in an inner product space is
    $$ d(x,y) = \lVert x-y \rVert $$

</div>

**Remarks:**
* For the dot product in $\mathbb{R}^2$ and $\mathbb{R}^3$, this definition yields the Euclidean length of a vector. E.g.,
$$\lVert \begin{pmatrix}u_1\\u_2 \end{pmatrix} \rVert = \sqrt{ u_1^2 + u_2^2 }$$
* The definition of the norm from the inner product shows that
$$
\lVert \alpha u \rVert = \sqrt{ < \alpha v, \alpha v > } = \ \lvert \alpha \rvert \ \lVert v \rVert
$$

A **unit vector** is a vector with norm equal to 1. Such a vector may be constructed from any non-zero vector $u$ with
$$
\hat{u} = \frac{1}{\lVert u \rVert} u
$$

In [2]:
LAcodes.title("Inner product and norms", sz=15)
u = [3 0 4 0 ]
v = [1 2 0 2 ]
println( "The dot product of u=$u and v=$v is u⋅v = $(u ⋅ v)")
println( "The norm of u is              √(u⋅u) = $(sqrt(u ⋅ u))")
println( "The distance from u to v is d(u,v) = $(sqrt( (u-v)⋅(u-v))) ")
println()
u = [3im 0 4 0 ]
v = [1 2 1im 4 ]
println( "The dot product of u=$u and v=$v is u⋅v = $(u ⋅ v)")
println( "The norm of u is              √(u⋅u) = $(sqrt(u ⋅ u))")
println( "The distance from u to v is d(u,v) = $(sqrt( (u-v)⋅(u-v))) ")
println()
println( "Using the norm() function, the norm of u is $(norm(u))")
println( "Using the norm() function, the distance from of u to v is $(norm(u-v))")

The dot product of u=[3 0 4 0] and v=[1 2 0 2] is u⋅v = 3
The norm of u is              √(u⋅u) = 5.0
The distance from u to v is d(u,v) = 5.291502622129181 

The dot product of u=Complex{Int64}[0 + 3im 0 + 0im 4 + 0im 0 + 0im] and v=Complex{Int64}[1 + 0im 2 + 0im 0 + 1im 4 + 0im] is u⋅v = 0 + 1im
The norm of u is              √(u⋅u) = 5.0 + 0.0im
The distance from u to v is d(u,v) = 6.855654600401044 + 0.0im 

Using the norm() function, the norm of u is 5.0
Using the norm() function, the distance from of u to v is 6.855654600401044


In [3]:
LAcodes.title("Unit Vectors", sz=15)
u=[ 2   6 9]; println( "A unit vector pointing i the same direction as u=$u is 1/11*$(11*u/norm(u))")
u=[ 2im 6 9]; println( "A unit vector pointing i the same direction as u=$u is 1/11*$(11*u/norm(u))")

A unit vector pointing i the same direction as u=[2 6 9] is 1/11*[2.0 6.0 9.0]
A unit vector pointing i the same direction as u=Complex{Int64}[0 + 2im 6 + 0im 9 + 0im] is 1/11*Complex{Float64}[0.0 + 2.0im 6.0 + 0.0im 9.0 + 0.0im]


## 1.2 Inequalities, Angle, Orthogonal Vectors

<div style="background-color:#F2F5A9">

**Theorem: (Cauchy-Schwartz Inequality)** The inner product between two vectors $u$ and $v$ satisfies $$\lvert <u,v> \rvert \le \lVert u \rVert \ \lVert v \rVert$$
</div>

**Remark:** The equality is trivially satisfied if either $u = 0$ or $v = 0$. When neither of the vectors is zero, we can rewrite this as
$$
-1 \le \frac{ <u,v> }{\lVert u \rVert \ \lVert v \rVert} \le 1
$$

In $\mathbb{R}^2$ with $<u,v> = u \cdot v$, this quotient is the cosine of the angle between the vectors $u$ and $v$.<br>
we therefore generalize this to
<div style="background-color:#F2F5A9">

$$
cos ( \angle (u,v) ) = \frac{ <u,v> }{\lVert u \rVert \ \lVert v \rVert}, \quad \text{ where } u \ne 0, v \ne 0
$$
</div>

<div style="background-color:#F2F5A9">

**Remark:** orthogonal non-zero vectors have $cos\ 90^\circ = 0$, i.e.,
$$ u \cdot v = 0 \Leftrightarrow u \perp v $$

**Remark:** To simplify the previous remark, we define the zero vector to be orthogonal to any other vector: $u ⋅ v = 0 \Leftrightarrow u \perp v$ for any two vectors $u$ and $v$ (including the zero vector).
</div>

In [4]:
u = [1 5 3]; v = [4 1 2]; w=[1 1 1]
println( "The angle between $u and $v is approximately $(round(acosd( (u ⋅v)/(norm(u)*norm(v))), digits=2)) degrees" )
println( "The distance from u=$u to v=$v is \t\t$(round(norm(u-v),digits=2))")
println( "A detour via w=$w increases the distance to \t$(round(norm(w-u)+norm(w-v),digits=2))")

The angle between [1 5 3] and [4 1 2] is approximately 56.41 degrees
The distance from u=[1 5 3] to v=[4 1 2] is 		5.1
A detour via w=[1 1 1] increases the distance to 	7.63


## 1.3 Fundamental Theorem of Linear Algebra (Part 2)

### 1.3.1 Main Definitions and Theorem

<div style="background-color:#F2F5A9">

**Theorem:** Orthogonal vectors are linearly independent

**Corollary:** Given a matrix $A$ in $\mathbb{R}^{M \times N}$
* Any two vectors $r \in \mathscr{R}(A), n \in \mathscr{N}(A)$ are orthogonal, i.e., $r \perp n$
* Any two vectors $c \in \mathscr{C}(A), \tilde{n} \in \mathscr{N}(A^t)$ are orthogonal, i.e., $c \perp \tilde{n}$

**Definition:** A vector space $U$ is orthogonal to a vector space $V$ iff $\forall u \in U, \forall v \in V, \; u \perp v$

**Definition:** Let $U$ be a subspace of $V$. The **orthogonal complement** $U^\perp = \left\{ v \in V \mid \forall u \in U, \ v \perp u \right\}$.

**Theorem:** Given a vector space $U$, then $(U^\perp)^\perp = U$.<br>
**Theorem:** Given two vector spaces $U$ and $V$ such that $U^\perp = V$, then $V^\perp = U$.

**Corollary:** Given a matrix $A \in \mathbb{R}^{M \times N}$ then $\mathscr{R}(A)^\perp = \mathscr{N}(A)$ in $\mathbb{R}^N$.<br>
**Corollary:** Given a matrix $A \in \mathbb{R}^{M \times N}$ then $\mathscr{C}(A)^\perp = \mathscr{N}(A^t)$ in $\mathbb{R}^M$.

**Theorem:** Let $A$ be a matrix of size $M \times N.$ The union of the bases for $\mathscr{C}(A)$ and $\mathscr{N}(A^t)$ is a basis for $\mathbb{R}^M$.<br>
**Theorem:** Let $A$ be a matrix of size $M \times N.$ The union of the bases for $\mathscr{R}(A)$ and $\mathscr{N}(A)$   is a basis for $\mathbb{R}^N$.
</div>

In [5]:
LAcodes.title("Fundamental Theorem (Part 2) Example", sz=15)
x,y,z=symbols("x,y,z")
r = [2;6;-1]
p = r[1]*x + r[2]*y +r[3]* z
println("Consider the system    $p = 0")

n1 = [6;-2;0]; n2=[1;0;2]
println("Its nullspace is a plane in R^3: span{ $n1, $n2 }")
println("Its row space is a line in R^3:  span( $r )")
println("Verify orhogonality: $r ⋅ $n1 = $(r ⋅ n1); $r ⋅ $n2 = $(r ⋅ n2)")
println()
println("Combining the bases for R(A) and N(A) yields a basis for R^3 = span( $r, $n1, $n2)")
println("Check the that matrix (r,n1,n2) is full column rank\n   (i.e., we indeed have a a basis for R^3:.\n   Its reduced row echelon form is:")
Base.print_array( stdout, rref([r n1 n2]));

Consider the system    2*x + 6*y - z = 0
Its nullspace is a plane in R^3: span{ [6, -2, 0], [1, 0, 2] }
Its row space is a line in R^3:  span( [2, 6, -1] )
Verify orhogonality: [2, 6, -1] ⋅ [6, -2, 0] = 0; [2, 6, -1] ⋅ [1, 0, 2] = 0

Combining the bases for R(A) and N(A) yields a basis for R^3 = span( [2, 6, -1], [6, -2, 0], [1, 0, 2])
Check the that matrix (r,n1,n2) is full column rank
   (i.e., we indeed have a a basis for R^3:.
   Its reduced row echelon form is:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  0.0  1.0

### 1.3.2 Use the Fundamental Theorem to Decompose a Vector (Naive Method)

Let $A$ be a matrix of size $M \times N$ with rank $r$.<br>
Let $\left\{ c_1, c_2, \dots c_r \right\}$ be a basis for $\mathscr{C}(A)$, and $\left\{ ñ_1, ñ_2, \dots ñ_{M-r} \right\}$ be a basis for $\mathscr{N}(A')$.<br>

The combined basis $\left\{  c_1, c_2, \dots c_r, ñ_1, ñ_2, \dots ñ_{M-r} \right\}$ is a basis for $\mathbb{R}^M$.

> Any vector $b \in \mathbb{R}^M$ can therefore be written as a linear combination of these vectors:
$$
\begin{align}
&b           \; = \color{blue}{b_{\parallel}} + \color{red}{b_{\perp}},  & \text{ where} \\
&\color{blue}{b_{\parallel}  = \alpha_1 c_1 + \alpha_2 c_2 \dots + \alpha_r c_r} &\\
&\color{red}{b_\perp        = \beta_1 ñ_1 + \beta_2 ñ_2 \dots \beta_{M-r} ñ_{M-r}}.&
\end{align}
$$

The result is depicted in the following Figure:
* the red vector $b_\parallel$ is the part of $b$ that lies in the $\mathscr{C}(A)$ hyperplane (the linear combination formed with $\alpha_i c_i$)<br>
  it is the orthogonal projection $Proj_{\mathscr{C}(A)}^\perp b$ onto the column space $\mathscr{C}(A)$
* the blue vector $b_\perp$ is the part of $b$ that lies in the $\mathscr{N}(A')$ hyperplane (the linear combination formed with $\beta_j ñ_j$)
* these two vector components are orthogonal.

<img src="./NormalEquations.svg"  width="500">


In [6]:
LAcodes.title("Expressing a vector in the C(A), N(A^t) basis", sz=15)
A     = [1 2 -1; 2 2 1 ]'
a1    = A[:,1]
a2    = A[:,2]

println("Consider the array A = ")
Base.print_array( stdout, A ); println()

#Base.print_array( stdout, Int64.(2*rref(Float64.(A'))) ); println()

println("\nFind the bases:")
ñ = [-4; 3; 2]
println(".  basis C(A)  = { a1=$a1, a2=$a2 }")
println(".  basis N(A') = { ñ =$ñ }")
println()
println(".  Check ñ is in the null space N(A'):  A' ñ = $(A'*ñ)")

println("\nAny vector b = α1 a1 + α2 a2 + β ñ\n.  To obtain this decomposition, we need to solve: [A ñ][α1;α1;β] = b\n")

b = [-3;7;-2]
println( "Let b = $b" )
coeffs = [a1 a2 ñ] \ b   # These happen to be integers
coeffs = Int64.(coeffs)  # Change to integers (will print in a nicer format)
println( ".  Solving, we obtain b = $(coeffs[1]) a1 + $(coeffs[2]) a2 + $(coeffs[3]) ñ")
b_parallel = coeffs[1]*a1 + coeffs[2]*a2
b_perp     = coeffs[3]*ñ
println( ".  b_parallel = $(coeffs[1]) a1 + $(coeffs[2]) a2 = $b_parallel")
println( ".  b_perp     = $(coeffs[3]) ñ          = $b_perp")
println()
println("Check orthogonality:           b_parallel ⋅ b_perp    = $(b_parallel ⋅ b_perp)")
println("Check these vectors sum to b:  b -b_parallel - b_perp = $(b -b_parallel - b_perp)" )

Consider the array A = 
  1  2
  2  2
 -1  1

Find the bases:
.  basis C(A)  = { a1=[1, 2, -1], a2=[2, 2, 1] }
.  basis N(A') = { ñ =[-4, 3, 2] }

.  Check ñ is in the null space N(A'):  A' ñ = [0, 0]

Any vector b = α1 a1 + α2 a2 + β ñ
.  To obtain this decomposition, we need to solve: [A ñ][α1;α1;β] = b

Let b = [-3, 7, -2]
.  Solving, we obtain b = 3 a1 + -1 a2 + 1 ñ
.  b_parallel = 3 a1 + -1 a2 = [1, 4, -4]
.  b_perp     = 1 ñ          = [-4, 3, 2]

Check orthogonality:           b_parallel ⋅ b_perp    = 0
Check these vectors sum to b:  b -b_parallel - b_perp = [0, 0, 0]


---
This computation was involved: we needed to find the bases of both $\mathscr{C}(A)$ and $\mathscr{N}(A^t)$ to perform this decomposition.

It turns out we can do better!

### 1.3.3 Use the Normal Equation to Decompose a Vector

The key observation in the naive method above was the decomposition of a vector $b$
> Any vector $b \in \mathbb{R}^M$ can therefore be written as a linear combination of these vectors:
$$
\begin{align}
&b           \; = b_{\parallel} + b_{\perp},  & \text{ where} \\
&b_{\parallel}  = \alpha_1 c_1 + \alpha_2 c_2 \dots + \alpha_r c_r &\\
&b_\perp        = \beta_1 ñ_1 + \beta_2 ñ_2 \dots \beta_{M-r} ñ_{M-r},&
\end{align}
$$
where the $c_i$ vectors form a basis for a hyperplane containing $b_\parallel$,<br>
and the $ñ_j$ vectors form a basis for the orthogonal complement of this hyperplane.

The basic idea is to replace the system of equations for the coefficients $\alpha_i, \beta_j$ by taking dot products with each of the $c_j$ in the column space $\mathscr{C}(A)$:
$$
\begin{align}
(\xi) & \Leftrightarrow b = \alpha_1 c_1 + \alpha_2 c_2 \dots + \alpha_r c_r + b_\perp \\
      & \Rightarrow c_j b = \alpha_1 c_j \cdot c_1 + \alpha_2  c_j \cdot c_2 \dots + \alpha_r  c_j \cdot c_r + \color{red}{c_j \cdot b_\perp}, \quad\quad j=1,2, \dots r
\end{align}
$$

Since $c_j \perp b_\perp$, the $c_j \cdot b_\perp = 0$: we are left with a set of equations that only involve the unknown coefficients $\alpha_i$!

**Remark**: We can rewrite the above in matrix form:
* let $\tilde{A} = \begin{pmatrix} c_1 & c_2 & \dots & c_r \end{pmatrix}, \quad x = \begin{pmatrix} \alpha_1 & \alpha_2 & \dots & \alpha_r \end{pmatrix}$.<br>
  Our equations are $(\xi) \Leftrightarrow \tilde{A} x = b \Rightarrow \tilde{A}^t \tilde{A} x = \tilde{A}^t b.$
* This would require us to identify a basis for the column space of $A$ to form the $\tilde{A}$ matrix. A little thought show this is not necessary!
  * If we express $b_\parallel$ as a linear combination of all of the columns of $A$, we no longer have a unique solution if the columns of $A$ are not linearly independent.<br>
  All we need is any one solution, however: we get $b_\parallel = A x$ for some vector $x$: thus $b = A x + b_\perp$.
  * Multiplying $b = A x + b_\perp$ by $A^t$ from the left still zeroes out the $b_\perp$ term: we are left with the equations
   $$
   \begin{align}
   &A^t A x     &= A^t b &\quad \text{ known as the normal equation} \\
   &b_\parallel &= A x   &
   \end{align}\label{eq1}\tag{1}
   $$.
* To solve for $b_\perp$, it is sufficient to realize that $b = b_\parallel + b_\perp$, so
$$
b_\perp  = b - b_\parallel \quad\quad\quad\quad\quad\quad \label{eq2}\tag{2}
$$

In [23]:
LAcodes.title("Repeat the previous example with the Normal Equation", sz=15)
A     = [1 2 -1; 2 2 1 ]'
b = [-3;7;-2]
println("Consider the array A = ")
Base.print_array( stdout, A ); println()
println( "\nLet b = $b" )

println("\nCompute any one solution of the normal equation  A_transpose A x = A_transpose b")
x = A'A \ A'b
println( ".  x = $(Int64.( round.(x,digits=0)))")
b_parallel = A * x
println("\nCompute b_parallel = Ax             = $(Int64.(round.(b_parallel, digits=0)))")
b_perp = b - b_parallel
println("\nCompute b_perp     = b - b_parallel = $(Int64.(round.(b_perp, digits=0)))")
println("\nThis is the same solution we obtained before")

Consider the array A = 
  1  2
  2  2
 -1  1

Let b = [-3, 7, -2]

Compute any one solution of the normal equation  A_transpose A x = A_transpose b
.  x = [3, -1]

Compute b_parallel = Ax             = [1, 4, -4]

Compute b_perp     = b - b_parallel = [-4, 3, 2]

This is the same solution we obtained before


# 2. The Normal Equation