# Orthogonal and orthonormal bases

You already know that any two vectors that do not lie on the same line form a basis in the plane. However, if you ask a random person to draw two coordinate axes on a checkered piece of paper, they will most likely draw two **perpendicular** lines.

This intuition reflects an important idea: the concept of **orthogonality** complements the notion of choosing a basis in a vector space in a very natural and powerful way.

The combination of **orthogonality** and **bases** is extremely fruitful and appears throughout mathematics and its applications. It plays a fundamental role in areas such as:
- dimensionality reduction in machine learning,
- Fourier analysis,
- signal processing,
- and many frontiers of modern science and technology.

The scope of applications of orthogonal bases is so vast that providing a complete overview would be difficult. Therefore, we will focus on understanding the **foundations** of these ideas and where they come from.

## Example: Constructing an Orthogonal and Orthonormal Basis

Consider a two-dimensional Euclidean space
$$
(V, \langle \cdot , \cdot \rangle).
$$
Choose arbitrary vectors $\vec v_1$ and $\vec v_2$ forming a basis of $V$.

Define a new vector
$$
\vec v_2' = \vec v_2 - \frac{\langle \vec v_1, \vec v_2 \rangle}{\langle \vec v_1, \vec v_1 \rangle}\,\vec v_1.
$$

This vector is of particular interest because it is **orthogonal** to $\vec v_1$.

### Orthogonality Check

Indeed,
$$
\begin{aligned}
\langle \vec v_1, \vec v_2' \rangle
&= \left\langle \vec v_1,\; \vec v_2 - \frac{\langle \vec v_1, \vec v_2 \rangle}{\langle \vec v_1, \vec v_1 \rangle}\vec v_1 \right\rangle \\
&= \langle \vec v_1, \vec v_2 \rangle
- \frac{\langle \vec v_1, \vec v_2 \rangle}{\langle \vec v_1, \vec v_1 \rangle}
\langle \vec v_1, \vec v_1 \rangle \\
&= 0.
\end{aligned}
$$

Since $\{\vec v_1, \vec v_2\}$ is a basis, vectors $\vec v_1$ and $\vec v_2'$ are linearly independent. Therefore,
$$
\{\vec v_1, \vec v_2'\}
$$
is also a basis of $V$, now consisting of **orthogonal vectors**.

Such a basis is called an **orthogonal basis**.


### Numerical Example

Suppose
$$
\langle \vec v_1, \vec v_1 \rangle = 9, \quad
\langle \vec v_2, \vec v_2 \rangle = 8, \quad
\langle \vec v_1, \vec v_2 \rangle = 6.
$$

Then
$$
\vec v_2' = \vec v_2 - \frac{6}{9}\vec v_1
= \vec v_2 - \frac{2}{3}\vec v_1.
$$

Thus,
$$
\{\vec v_1,\; \vec v_2 - \tfrac{2}{3}\vec v_1\}
$$
is an orthogonal basis.

### Lengths of the Orthogonal Vectors

The norms are:
$$
\|\vec v_1\| = \sqrt{\langle \vec v_1, \vec v_1 \rangle} = 3,
$$
and
$$
\begin{aligned}
\|\vec v_2'\|
&= \sqrt{\langle \vec v_2 - \tfrac{2}{3}\vec v_1,\; \vec v_2 - \tfrac{2}{3}\vec v_1 \rangle} \\
&= \sqrt{
\langle \vec v_2, \vec v_2 \rangle
- \tfrac{4}{3}\langle \vec v_1, \vec v_2 \rangle
+ \tfrac{4}{9}\langle \vec v_1, \vec v_1 \rangle
} \\
&= \sqrt{8 - 8 + 4} = 2.
\end{aligned}
$$


### Normalization

Dividing a vector by its norm produces a **unit vector** (length $1$). This process is called **normalization**.

For any vector $\vec w \neq 0$,
$$
\left\|\frac{\vec w}{\|\vec w\|}\right\| = 1.
$$

Normalize $\vec v_1$ and $\vec v_2'$:
$$
\vec e_1 = \frac{\vec v_1}{\|\vec v_1\|} = \frac{1}{3}\vec v_1,
$$
$$
\vec e_2 = \frac{\vec v_2'}{\|\vec v_2'\|}
= \frac{1}{2}\vec v_2'
= \frac{1}{2}\vec v_2 - \frac{1}{3}\vec v_1.
$$

### Result: Orthonormal Basis

The vectors $\vec e_1$ and $\vec e_2$ form a basis of $V$ that is both **orthogonal** and **normalized**:

$$
\langle \vec e_1, \vec e_1 \rangle = 1, \quad
\langle \vec e_2, \vec e_2 \rangle = 1, \quad
\langle \vec e_1, \vec e_2 \rangle = 0.
$$

These relations are exactly the same as those of the standard basis in $\mathbb{R}^2$ with the usual dot product.

This procedure is the two-dimensional case of the **Gram–Schmidt orthonormalization process**.

## Geometric Side of the Story

All the algebraic manipulations introduced earlier may seem somewhat overcomplicated at first. However, when viewed from a **geometric perspective**, they become much more intuitive and natural.

Let us once again consider two vectors
$$
\vec v_1 \quad \text{and} \quad \vec v_2,
$$
but now interpret them as **arrows in the plane**.

![Vectors v1 and v2](img/v1_v2.png)


Now notice that the vector

$$
\frac{\langle \vec v_1, \vec v_2 \rangle}{\langle \vec v_1, \vec v_1 \rangle}
\, \vec v_1
$$

is just the projection

$$
\operatorname{proj}_{\vec v_1}(\vec v_2)
$$

of the vector $\vec v_2$ onto $\vec v_1$.

This projection can be easily illustrated using the following picture:

![Projection of v2 on v1](img/proj_v2_on_v1.png)


Therefore, the vector $\vec v_2'$ is the vector $\vec v_2$ from which you have “removed” its projection onto $\vec v_1$:

$$
\vec v_2'
=
\vec v_2
-
\operatorname{proj}_{\vec v_1}(\vec v_2).
$$

You have already proven that $\vec v_2'$ is perpendicular to $\vec v_1$.
However, now it is also easy to see this:

![Projection Components](img/proj_components.png)


Last but not least, you normalize the vectors so that their lengths are equal to $1$.
In our picture, it looks like this.

![Projection Basis](img/proj_basis.png)


Here $\vec e_1$ and $\vec e_2$ are normalized versions of $\vec v_1$ and $\vec v_2'$, respectively.
They form a basis of this plane, which is very similar to the one you usually choose in a checkered notebook.

![Grid Basis](img/grid_basis.png)


## Higher dimensions

Now this idea of basis with length-one vectors, which are orthogonal to each other, could be adapted to an arbitrary dimension $n$.
First, let’s give it a name.

Let $(V,\langle \cdot,\cdot\rangle)$ be a Euclidean space $(\dim(V)=n)$.
A basis
$$
\{\vec e_1, \vec e_2, \ldots, \vec e_n\}
$$
is called **orthogonal** if
$$
\langle \vec e_i, \vec e_j \rangle = 0
$$
for any $i,j \in \{1,2,\ldots,n\}$ such that $i \neq j$.
It literally means that each vector of this basis is **orthogonal** to any other vector.

An orthonormal basis
$$
\{\vec e_1, \vec e_2, \ldots, \vec e_n\}
$$
is called **orthonormal** if
$$
\langle \vec e_i, \vec e_i \rangle = 1
$$
for any $i \in \{1,2,\ldots,n\}$.
This means that besides orthogonality, each vector is a unit vector.

There is a very useful conventional symbol in mathematics, which is called the **Kronecker delta**.
It is defined in the following manner:
$$
\delta_{i,j} =
\begin{cases}
1, & \text{if } i=j, \\
0, & \text{if } i\neq j.
\end{cases}
$$

It means that $\delta_{4,4}$ and $\delta_{3,5}$ mean the same as $1$ and $0$ correspondingly.
How could it be used?
Well, you can say that a basis
$$
\{\vec e_1, \vec e_2, \ldots, \vec e_n\}
$$
is orthonormal if
$$
\langle \vec e_i, \vec e_j \rangle = \delta_{i,j}
$$
for $i,j \in \{1,2,\ldots,n\}$.
This way, you can combine the two above-mentioned definitions into one.

You will learn more on why orthonormal bases are so great later.
But first, let's state that such a basis could be constructed for any finite-dimensional vector space.
The process of this construction is called the **Gram–Schmidt process**, and it goes like this:

Let’s start with $V$ (with inner product $\langle \cdot,\cdot\rangle$ and $\dim(V)=n$) and its arbitrary basis $\{\vec v_1, \vec v_2, \ldots, \vec v_n\}.$

Introduce a new basis $\{\vec w_1, \vec w_2, \ldots, \vec w_n\}$ in the following manner.

1. Vector $\vec w_1$ is equal to $\vec v_1$.

2. Each following vector $\vec w_k$ (for $1 < k \le n$) is defined as the vector $\vec v_k$
from which you have “removed” all the projections onto the previous vectors
$\vec v_1, \vec v_2, \ldots, \vec v_{k-1}$:
$$
\vec w_k
=
\vec v_k
-
\operatorname{proj}_{\vec w_1}(\vec v_k)
-
\operatorname{proj}_{\vec w_2}(\vec v_k)
-
\cdots
-
\operatorname{proj}_{\vec w_{k-1}}(\vec v_k).
$$
These vectors turn out to be orthogonal to each other and form a basis of $V$.

3. Finally, normalize vectors $\vec w_k$:
$$\vec e_k = \frac{1}{\|\vec w_k\|}\,\vec w_k.$$
Vectors $\vec e_k$ are still orthogonal, but also have length $1$.
Therefore, $\{\vec e_1, \vec e_2, \ldots, \vec e_n\}$ is an orthonormal basis of $V$.

The proof of orthogonality of $\{\vec w_1, \vec w_2, \ldots, \vec w_n\}$ is actually quite a tricky task. Think about it like this: by ‘removing’ all the components of the vector that are co-directed with
$\vec v_1, \vec v_2, \ldots, \vec v_{k-1}$, you end up with a vector that is perpendicular to all of them.

Here’s an example for a better understanding.
Consider three linearly independent vectors in a Euclidean space $\mathbb{R}^3$
with the choice of dot product as an inner product (meaning that
$\langle \vec v, \vec w \rangle = \vec v \cdot \vec w$):

$$
\vec v_1 =
\begin{pmatrix}
1 \\ 0 \\ 1
\end{pmatrix},
\qquad
\vec v_2 =
\begin{pmatrix}
1 \\ -2 \\ 0
\end{pmatrix},
\qquad
\vec v_3 =
\begin{pmatrix}
1 \\ -1 \\ 1
\end{pmatrix}.
$$

Applying the Gram–Schmidt process:

$$
\vec w_1 = \vec v_1 =
\begin{pmatrix}
1 \\ 0 \\ 1
\end{pmatrix},
\qquad
\vec e_1 =
\begin{pmatrix}
\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}}
\end{pmatrix}.
$$

$$
\vec w_2
=
\vec v_2
-
\frac{\langle \vec w_1, \vec v_2 \rangle}
{\langle \vec w_1, \vec w_1 \rangle}
\, \vec w_1
=
\begin{pmatrix}
\frac{1}{2} \\ -2 \\ -\frac{1}{2}
\end{pmatrix},
\qquad
\vec e_2 =
\begin{pmatrix}
\frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ -\frac{1}{\sqrt{6}}
\end{pmatrix}.
$$

$$
\vec w_3
=
\vec v_3
-
\frac{\langle \vec w_2, \vec v_3 \rangle}
{\langle \vec w_2, \vec w_2 \rangle}
\, \vec w_2
-
\frac{\langle \vec w_1, \vec v_3 \rangle}
{\langle \vec w_1, \vec w_1 \rangle}
\, \vec w_1
=
\begin{pmatrix}
-\frac{2}{9} \\ -\frac{1}{9} \\ \frac{2}{9}
\end{pmatrix},
\qquad
\vec e_3 =
\begin{pmatrix}
-\frac{2}{3} \\ -\frac{1}{3} \\ \frac{2}{3}
\end{pmatrix}.
$$

By calculating the dot products, you can check that
$$
\{\vec e_1, \vec e_2, \vec e_3\}
$$
is indeed an orthonormal basis of $\mathbb{R}^3$.

## Features of an orthonormal basis

The main feature of choice of an orthonormal basis in $V$ is that it turns $V$ into $\mathbb{R}^n$
(here $n=\dim(V)$) and $\langle \cdot,\cdot\rangle$ into a standard dot product.
Let see how it works.

If
$$
\{\vec e_1, \vec e_2, \ldots, \vec e_n\}
$$
is an arbitrary basis of $V$ then you can express the inner product of vectors
$$
\vec a
=
a_1 \vec e_1 + a_2 \vec e_2 + \cdots + a_n \vec e_n
$$
and
$$
\vec b
=
b_1 \vec e_1 + b_2 \vec e_2 + \cdots + b_n \vec e_n
$$
only with the help of all possible products of the form
$\langle \vec e_i, \vec e_j \rangle$ where $i \le j$:

$$
\begin{aligned}
\langle \vec a, \vec b \rangle
&=
a_1 b_1 \langle \vec e_1, \vec e_1 \rangle
+
a_2 b_2 \langle \vec e_2, \vec e_2 \rangle
+
\cdots
+
a_n b_n \langle \vec e_n, \vec e_n \rangle
\\
&\quad+
(a_1 b_2 + a_2 b_1)\langle \vec e_1, \vec e_2 \rangle
+
(a_1 b_3 + a_3 b_1)\langle \vec e_1, \vec e_3 \rangle
+
\cdots
+
(a_{n-1} b_n + a_n b_{n-1})\langle \vec e_{n-1}, \vec e_n \rangle.
\end{aligned}
$$

This is one monstrous expression.
The problem with it is that starting with $n=4$ the number of terms with $i<j$
is greater than with $i=j$ and it grows in a quadratic manner with the increase of $n$.
That means that computations of inner products in an arbitrary basis are much harder than in
$\mathbb{R}^n$ with standard dot product, in which the number of terms is just $n$.

But notice that if
$$
\{\vec e_1, \vec e_2, \ldots, \vec e_n\}
$$
is an orthonormal basis, this problem vanishes.
Why?
Because all the terms with $i<j$ in this enormous formula will be equal to $0$.
Moreover, all the other terms will be equal to $1$.
Which results in a very elegant formula

$$
\langle \vec a, \vec b \rangle
=
a_1 b_1 + a_2 b_2 + \cdots + a_n b_n.
$$

Not only this formula is way easier, but it is also very familiar.
It is just a dot product of vectors
$$
(a_1, a_2, \ldots, a_n)^T \in \mathbb{R}^n
$$
and
$$
(b_1, b_2, \ldots, b_n)^T \in \mathbb{R}^n.
$$

Take into account that writing down
$$
\begin{pmatrix}
a_1 \\ a_2 \\ \vdots \\ a_n
\end{pmatrix}
+
\begin{pmatrix}
b_1 \\ b_2 \\ \vdots \\ b_n
\end{pmatrix}
=
\begin{pmatrix}
a_1+b_1 \\ a_2+b_2 \\ \vdots \\ a_n+b_n
\end{pmatrix},
\qquad
\lambda
\begin{pmatrix}
a_1 \\ a_2 \\ \vdots \\ a_n
\end{pmatrix}
=
\begin{pmatrix}
\lambda a_1 \\ \lambda a_2 \\ \vdots \\ \lambda a_n
\end{pmatrix}
$$

is just another way of writing

$$
(a_1 \vec e_1 + a_2 \vec e_2 + \cdots + a_n \vec e_n)
+
(b_1 \vec e_1 + b_2 \vec e_2 + \cdots + b_n \vec e_n)
=
((a_1+b_1)\vec e_1 + (a_2+b_2)\vec e_2 + \cdots + (a_n+b_n)\vec e_n),
$$

$$
\lambda (a_1 \vec e_1 + a_2 \vec e_2 + \cdots + a_n \vec e_n)
=
\lambda a_1 \vec e_1 + \lambda a_2 \vec e_2 + \cdots + \lambda a_n \vec e_n.
$$

You can conclude that writing down vectors in an orthogonal basis of a Euclidean space $V$
is basically the same as working with $\mathbb{R}^n$ with a standard dot product.

There is one more way to express this property.
Let $\vec x$ be a vector in a Euclidean space $(V,\langle \cdot,\cdot\rangle)$.
Let
$$
\{\vec e_1, \ldots, \vec e_n\}
$$
be an orthonormal basis of this space and
$$
\vec x = x_1 \vec e_1 + \cdots + x_n \vec e_n.
$$

Now let’s calculate the following sum

$$
\sum_{i=1}^n \langle \vec x, \vec e_i \rangle \cdot \vec e_i
=
\langle \vec x, \vec e_1 \rangle \cdot \vec e_1
+
\cdots
+
\langle \vec x, \vec e_n \rangle \cdot \vec e_n.
$$

If you throw out all the inner products that are equal to $0$ you will end up with

$$
\langle \vec x, \vec e_1 \rangle \cdot \vec e_1
+
\cdots
+
\langle \vec x, \vec e_n \rangle \cdot \vec e_n
=
x_1 \vec e_1 + \cdots + x_n \vec e_n.
$$

This literally means that

$$
\sum_{i=1}^n \langle \vec x, \vec e_i \rangle \cdot \vec e_i = \vec x.
$$

Therefore the coordinates $x_i$ are just the projections of $\vec x$ onto $\vec e_i$.

## Conclusion

Let $(V,\langle \cdot,\cdot\rangle)$ be a Euclidean space and $\dim(V)=n$.

A normalization of a vector $\vec v$ is a unit vector
$$
\frac{1}{\|\vec v\|}\cdot \vec v.
$$

A basis
$$
\{\vec e_1, \vec e_2, \ldots, \vec e_n\}
$$
is orthogonal if
$$
\langle \vec e_i, \vec e_j \rangle = 0
$$
for all $i \ne j$.

A basis
$$
\{\vec e_1, \vec e_2, \ldots, \vec e_n\}
$$
is orthonormal if
$$
\langle \vec e_i, \vec e_j \rangle = \delta_{i,j}.
$$

If you write the vectors of $V$ in an orthogonal basis, then $V$ could be thought as $\mathbb{R}^n$ and $\langle \cdot,\cdot\rangle$ as the dot product.

Knowing some base of a Euclidean space you can always obtain an orthonormal basis of this space by the Dram-Schmidt process.

The following identity holds:
$$
\vec x
=
\langle \vec x, \vec e_1 \rangle \cdot \vec e_1
+
\ldots
+
\langle \vec x, \vec e_n \rangle \cdot \vec e_n.
$$