In [1]:
import numpy as np
import matplotlib.pyplot as plt

## 3.1 Norms

**Definition of Norm:**
- A norm on a vector space $V$ is a function $\|\cdot \| : V \rightarrow \reals$ which assigns each vector $\mathbb{x}$ its length $\|\mathbb{x}\| \in \reals$, such that the following conditions hold for all $\lambda \in \reals, \ \mathbb{x}, \mathbb{y} \in V$:
    - ***Absolute Homogeneity***: $\|\lambda\mathbb{x}\| = |\lambda|\|\mathbb{x}\|$
        - i.e. the norm of its scaled version is equal to the absolute value of its scaled norm
    - ***Triangle Inequality***: $\|\mathbb{x} + \mathbb{y}\| \le \|\mathbb{x}\| + \| \mathbb{y}\|$
        - i.e. the norm of a sum of vectors is equal to the sum of their norms
        - named as such bc in geometry, for any triangle, the sum of any two sides must be greater than or equal to the length of the remaining side
    - ***Positive Definite***: $\|\mathbb{x}\| \ge 0$

***$\mathcal{l}_1$ - "Manhattan" Norm***
$$\|\mathbb{x}\|_1 \coloneqq \sum_{i=1}^n|x_i|$$

***$\mathcal{l}_2$ - "Euclidean" Norm***
$$\|\mathbb{x}\|_2 \coloneqq \sqrt{\sum_{i=1}^n x_i^2} = \mathbb{x}^\intercal \mathbb{x}$$

## 3.2 Inner Products

**Dot Product:**
- For vectors $x$ and $y$:
$$\mathbb{x}^\intercal \mathbb{y} = \sum_{i=1}^n x_i y_i $$

### Generalized Inner Products

A ***Bilinear Mapping*** $\Omega$ is a mapping with two arguments that is linear in each argument. For a vector space $V$, the following conditions hold for such a mapping:
$$\Omega(\lambda x + \psi y, \ z) = \lambda \Omega(x, z) + \psi\Omega(y, z) \\ \Omega(x, \ \lambda y + \psi z) = \lambda \Omega(x, y) + \psi\Omega(x, z)$$
Look at this closesly, $\Omega$ has two arguments. The first condition above states that $\Omega$ is linear in its first argument while the second condition states that $\Omega$ is linear in its second argument.

#### More Definitions
For a *bilinear mapping* $\Omega:V \times V \rightarrow \reals$ that takes two vectors and maps them onto a real number:
- ***Symmetric***
    - $\Omega$ is symmetric if $\Omega(x,y) = \Omega(y,x) \ \ \forall x,y \in V$
    - i.e. $\Omega$ is symmetric if the order of its arguments does not matter
- ***Positive Definite***
    - $\Omega$ is positive definite if:
    $$ \forall x\in V \setminus \{\mathbb{0}\} \ : \ \Omega(x, x) > 0, \ \Omega(\mathbb{0}, \mathbb{0}) = 0$$
    - i.e. $\Omega$ is positive definite if it maps to only positive numbers in $\reals$ for all vectors in $V$ other than the zero vector $\mathbb{0}$
    - Note that this is only for $\Omega(\bf x, x)$, *not* for $\Omega(\bf x, y), \ x \ne y$

**Definition of Inner Product:**
- A bilinear mapping $\Omega \ : \ V \times V \rightarrow \reals$ that takes two vectors and maps them onto a real number is called an *inner product* if it is both **symmetric** and **positive definite**
    - Inner products on vectors $x$ and $y$ are typically denoted $\langle x, y \rangle$
    - The *Inner Product Space* of a vector space $V$ is denoted $\big(V, \langle\cdot, \cdot\rangle\big)$

### Symmetric, Positive Definite Matrices

For vector space $V$ and inner product $\langle \cdot, \cdot \rangle : \ V \times V \rightarrow \reals$, ordered basis $B=(\mathbf{b}_1, ..., \mathbf{b}_n)$ of $V$, and vectors $\mathbf{x}, \mathbf{y} \in V$:
$$\langle \mathbf{x}, \mathbf{y} \rangle = \bigg\langle \sum_{i=1}^n \psi_i \mathbf{b}_i, \sum_{j=1}^n\lambda_j\mathbf{b}_j \bigg\rangle = \sum_{i=1}^n \sum_{j=1}^n \psi_i \langle \mathbf{b}_i, \mathbf{b}_j \rangle \lambda_j = \hat{\mathbf{x}}^\intercal \mathbf{A} \hat{\mathbf{y}}$$
Where $\psi$ and $\lambda$ are scalars, $A_{i,j} \coloneqq \langle \mathbf{b}_i, \mathbf{b}_j \rangle$, and $\hat{\mathbf{x}}$ and $\hat{\mathbf{y}}$ are the *coordinates* of $\mathbf{x}$ and $\mathbf{y}$ with respect to the basis $B$.

So, the inner product $\langle \cdot, \cdot \rangle$ is *uniquely determined* by the matrix $\mathbf{A}$. Furthermore, $\mathbf{A}$ is a symmetric positive definite matrix, by the definition of an inner product.

**Definition 3.4 - Symmetric, Positive Definite Matrix:**
- A matrix $\mathbf{A}$ is symmetric if:
$$\mathbf{A} = \mathbf{A}^\intercal$$
- A matrix $\mathbf{A}$ is positive definite if:
$$\forall \mathbf{x} \in V \setminus \{\mathbf{0}\} : \mathbf{x}^\intercal \mathbf{A} \mathbf{x} > 0$$
- A matrix $\mathbf{A}$ is symmetric positive definite if it is both symmetric and positive definite
    - A matrix $\mathbf{A}$ is **positive semidefinite** if only $\ge$ holds in the condition for positive definiteness

If $\mathbf{A} \in \reals^{n \times n}$ is symmetric, positive definite, then:
$$\langle \bf x, y \rangle = \hat{x}^\intercal A \hat{y}$$
*defines* an inner product w.r.t. an ordered basis $B$, where $\bf \hat{x}$ and $\bf \hat{y}$ are the coordinate representations of $\mathbf{x}, \mathbf{y} \in V$ w.r.t. $B$.

**Theorem 3.4**:
For a vector space $V \in \reals$ and an ordered basis $B$ of $V$, $\langle \cdot, \cdot \rangle : V \times V \rightarrow \reals$ is an *inner product* if and only if there exists a symmetric, positive definite matrix $\mathbf{A} \in \reals^{n\times n}$ with:
$$\begin{equation} \langle \bf x, y \rangle = \hat{x}^\intercal A \hat{y} \end{equation}$$

Properties of a symmetric and positive definite matrix $\mathbf{A} \in \reals^{n \times n}$:
- The null space of $\bf A$ consists *only* of $\bf 0$
    - This is because $\mathbf{x}^\intercal \mathbf{A} \mathbf{x} > 0, \ \forall \bf x \ne 0$
    - This implies that $\bf Ax \ne 0, \ \forall x \ne 0$
    - So, we cannot lose information by applying the transform $\bf A$
- The diagonal elements $a_{ii}$ of $\bf A$ are positive
    - This follows from the positive definite property of $\bf A$, specifically the fact that $a_{ii} = \mathbf{e}_i^\intercal \mathbf{A} \mathbf{e}_i > 0$
        - Where $\mathbf{e}_i$ is the $i^{th}$ standard basis vector in $\reals^n$

## 3.3 Lengths and Distances

Inner products and norms are closely related. Inner products *induce* norms. They map vectors to a real number such that the properties of norms hold. Specifically:
$$\|\bf x \| \coloneqq \sqrt{\langle x, x \rangle}$$

However, *not every norm* is induced by an inner product. E.g., the $l_1$ or "Manhatten" norm does not have a corresponding inner product. 

For an inner product vector space $(V, \langle \cdot, \cdot \rangle)$ the induced norm $\|\cdot\|$ satisfies the ***Cauchy-Shwarz Inequality***:
$$\bf |\langle x, y \rangle| \le \|x\|\|y\|$$
Written out: the absolute value of the inner product of $\mathbf{x}, \mathbf{y} \in V$ is at most the product of the two vectors' norms.

The norm of a vector induced by one inner product *need not be equal* to the norm of the same vector induced by a different inner product (see example 3.5). However, by the Cauchy-Shwarz inequality, all inner products must be at most the squared norm of the vector. This means that the *length* of a vector may be different depending on which inner product we use to evaluate it. This is not ground-breaking, since we already have two notions of length ($l_1$ vs. $l_2$) which evaluate to different values.

**Definition 3.6**: Distance and Metric\
Consider an inner product vector space $(V, \langle \cdot, \cdot \rangle)$:
$$d(\bf x, y) \coloneqq \sqrt{\langle x - y, x - y \rangle}$$
$d(\bf x, y )$ is the ***Distance*** between $\bf x$ and $\bf y$ for all $\mathbf{x}, \mathbf{y} \in V$. The Euclidean distance is given when the inner product is the dot-product.

**NOTE:**\
We may compute the distance between vectors without an inner product, having a norm of the two vectors is sufficient. E.g. the Manhatten distance relies on the $l_1$ norm and is given by: 
$$d(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\|_1 = \sum_{n=1}^N |x_i - y_i |$$

The *mapping* from $\bf x, y$ to $d(\bf x, y)$ is called a ***Metric***:
$$d : V \times V \rightarrow \reals \\ (\mathbf{x}, \mathbf{y}) \mapsto d(\bf x, y)$$

A metric $d$ satisfies the following properties:
1. $d$ is positive definite
    - $d(\mathbf{x}, \mathbf{y}) \ge 0, \ \forall \mathbf{x}, \mathbf{y} \in V$
    - $d(\mathbf{x}, \mathbf{y}) = 0 \iff \mathbf{x} = \mathbf{y}$
2. $d$ is symmetric
    - $d(\mathbf{x}, \mathbf{y}) = d(\mathbf{y}, \mathbf{x}), \ \forall \mathbf{x}, \mathbf{y} \in V$
3. Adheres to the Triangel Inequality
    - $d(\mathbf{x}, \mathbf{z}) \le d(\mathbf{x}, \mathbf{y}) + d(\mathbf{y}, \mathbf{z}), \ \forall \mathbf{x}, \mathbf{y}, \mathbf{z} \in V$

**NOTE**:\
The properties of inner products and of metrics are very similar. However, note that inner products and metrics have opposite behaviors. Specifically, vectors that are similar have larger inner products while they have smaller metrics. Notably, $\langle \bf x, y \rangle$ increases as $\bf x$ approaches $\bf y$ while $d(\bf x, y)$ decreases towards $0$ as $\bf x$ approaches $\bf y$.

## 3.4 Angles and Orthogonality

We may define the angle between two vectors within an inner-product space using the Cauchy-Shwarz inequality:
$$-1 \le \frac{\langle \mathbf{x}, \mathbf{y} \rangle}{\|\mathbf{x}\|\| \mathbf{y} \|} \le 1$$
From the unit-circle, there exists a unique $\omega \in [0, \pi]$ such that:
$$\cos\omega = \frac{\langle \mathbf{x}, \mathbf{y} \rangle}{\|\mathbf{x}\|\| \mathbf{y} \|}$$
Or, equivalenetly:
$$\cos\omega = \frac{\langle \mathbf{x}, \mathbf{y} \rangle}{\sqrt{\langle \bf x, x \rangle \langle y, y \rangle}}$$

This unique $\omega$ is the *angle* between the vectors $\bf x$ and $\bf y$

**Definition 3.7** Orthogonality:\
Two vectors $\bf x$ and $\bf y$ are ***Orthogonal*** if and only if $\langle \mathbf{x}, \mathbf{y} \rangle = 0$, denoted $\bf x \perp y$. If the vectors are *unit vectors* (i.e. their magnitudes are both $1$) then they are ***Orthonormal***
- It follows that the vector $0$ is orthogonal to *every vector* in a vector space.
- We can see that orthogonal vectors are perpendicular from the definition of vecotr angles above:
$$\cos 90\degree = 0$$

**NOTE**:\
Vectors that are orthogonal w.r.t. one inner product, *may not be orthogonal* w.r.t a different inner product. When dealing with real-valued spaces, we typically deal with dot-products as inner products. However, we may want to use a different inner product for some purpose, in which case we may need to be cautious about assuming or evaluating orthogonality.

**Definition 3.8** Orthogonal Matrix:\
A square matrix $\mathbf{A} \in \reals^{n \times n}$ is an orthogonal matrix if and only if its columns are orthonormal so that:
$$\bf AA^\intercal = I = A^\intercal A$$
This implies that $\bf A^{-1} = A^\intercal$\
Note that these matrices may be more precisely called "orthonormal matrices".\
Transformations with orthogonal matrices preseve distances and angles. So, a vector $\bf x$ is not changed when transforming it using an orthogonal matrix $\bf A$:
$$\| \bf A x \|_2^2 = (Ax)^\intercal(Ax) = x^\intercal A^\intercal Ax = x^\intercal Ix = x^\intercal x = \|x\|_2^2$$

So, transformations like rotations and flips are orthogonal matrices.

## 3.5 Orthonormal Basis

The orthonormal basis is a special set of basis vectors in whic each vector is a unit vector, and all vectors are *orthogonal* to each other.

**Definition 3.9** Orthonormal Basis:\
A basis $\{\mathbf{b}_1, ..., \mathbf{b}_n\}$ of $V\in\reals^n$ is *orthonormal* if:
$$\langle \mathbf{b}_i, \mathbf{b}_j \rangle = 0, \ \forall i \ne j \\ \langle \mathbf{b}_i, \mathbf{b}_i \rangle = 1$$

## 3.6 Orthogonal Complement

Vector spaces may also be orthogonal to each other. Consider $V\in\reals^n$ and $U\in\reals^m$ where $U \subseteq V$. The ***Orthogonal Complement*** $U^\perp$ is an $(N-M)$-dimensional subspace of $V$ and contains all vectors in $V$ that are orthogonal to every vector in $U$. Furthermore, $U\cap U^\perp = \{\mathbf{0}\}$ so any vector $\mathbf{x} \in V$ can be uniquely decomposed into:
$$\mathbf{x} = \sum_{m=1}^M \lambda_m\mathbf{b}_m + \sum_{j=1}^{N-M} \psi_j \mathbf{b}_j^\perp, \ \ \lambda_m, \psi_j \in \reals$$
Where $(\mathbf{b}_1, ..., \mathbf{b}_M)$ and $(\mathbf{b}_1^\perp, ..., \mathbf{b}_{N-M}^\perp)$ are bases of $U$ and $U^\perp$ respectively.

## 3.7 Inner Product of Functions

An inner product of two functions $u: \reals \rightarrow \reals$ and $v: \reals \rightarrow \reals$ can be defined as a *definite integral*:
$$\langle u, v \rangle \coloneqq \int_a^b u(x)v(x)dx$$
- Orthogonality:
    - if $\langle u, \rangle v = 0$, then $u$ and $v$ are *orthogonal functions*

## 3.8 Orthogonal Projections

**Definition 3.10** Projection:\
Let $V$ and $U$ be vector spaces with $U\subseteq V$. A linear mapping $\pi:V\rightarrow U$ is a *projection* if: $$\pi^2 = \pi \circ \pi = \pi$$
A *projection matrix* $\mathbf{P}_\pi$ is then a special kind of transformation matrix that exhibits the property: $$\mathbf{P}_\pi^2 = \mathbf{P}_\pi$$

Intuitively, we may read this as the effect of applying a projection twice is the same as applying it once.

### Projection onto One-Dimensional Subspaces (Lines)

Consider a line through the origin with a basis vector $\mathbf{b} \in \reals^n$. The line is a one-dimensional subspace $U \subseteq \reals^n$ spanned by $\bf b$. The orthogonal projection of a vector $\mathbf{x}\in \reals^n$ onto this line $U$ is $\pi_U(\mathbf{x}) \in U$ that is *closest* to $\bf x$.
- "*Closest*" means that the distance $\|\mathbf{x} - \pi_U(\mathbf{x})\|$ is minimized. This minimization occurs at the point (vector) on $U$ that is orthogonal to $\bf x$, or more precisely, the line segment $\pi_U(\mathbf{x}) - \bf x$ is orthogonal to $U$.
- The projection $\pi_U(\bf{x})$ of $\bf x$ onto $U$ *must be* an element of $U$ and, therefore, a multiple of $\bf b$, so:
$$\pi_U(\mathbf{x}) = \lambda \bf b, \ \lambda \in \reals$$

1. Finding the coordinate $\lambda$
    $$\text{From orthogonality: } \ \langle \mathbf{x} - \lambda \mathbf{b}, \mathbf{b} \rangle = 0$$
    $$\text{From bilinearity of the inner product: } \ \langle\mathbf{x}, \mathbf{b} \rangle - \lambda \langle\mathbf{b}, \mathbf{b}\rangle = 0 \iff \lambda = \bf \frac{\langle b, x \rangle}{\|b\|_2^2}$$
    $$\implies \lambda = \bf \frac{b^\intercal x}{b^\intercal b}$$

2. Finding the projection point $\pi_U(\mathbf{x}) \in U$
$$\pi_U(\mathbf{x}) = \lambda \bf b = \frac{b^\intercal x}{\|b\|_2^2}b$$
It follows that the length of the projection is:
$$\|\pi_U(\mathbf{x})\| = \|\lambda\mathbf{b}\| = |\lambda|\|\bf b\|$$
Which, when the inner product is the dot product, may be expressed as:
$$\|\pi_U(\mathbf{x})\| = |\cos \omega|\|\bf x\|$$
By the definition of the angle between vectors.

3. Finding the projection matrix $\mathbf{P}_\pi$\
Because $\mathbf{P}_\pi$ is a linear mapping, $\pi_U(\mathbf{x}) = \mathbf{P}_\pi\bf x$:
$$\pi_U(\mathbf{x}) = \lambda \bf b = \frac{bb^\intercal}{\|b\|_2^2}x$$
$$\implies \mathbf{P}_\pi = \frac{bb^\intercal}{\|b\|_2^2}$$

**NOTE**:\
The projection $\pi_U(\mathbf{x}) \in \reals^n$ is still an $n$-dimensional vector and not a scalar. However, we no longer require $n$ coordinates to represent the projection. We only need *a single* coordinate to express it with respect to the basis vector $\bf b$ that spans $U$. This coordinate is $\lambda$.

**NOTE**:\
It can be shown (and supposedly is explored in ch 4) that $\pi_U(\mathbf{x})$ is an *eigenvector* of $\mathbf{P}_\pi$ with and eigenvalue of $1$

### Projection onto General Subspaces