# Projections and Orthogonalisation

## Projections.

Say that we have a 2 dimensional vector and a point that does not lie in the column space of that vector. How do we best estimate this point such that the point lies in the column space of the vector?

The first thing we can note is that the dot product between the vector and the vector created between the point and the original vector will be 0.

$$a^T(b-\beta a) = 0$$

Where a is the original vector, b is the point, and beta is the scaling factor by which we estimate the best value of a. We then solve for beta to find this best scaling coefficient.

$$ \beta = \frac{a^Tb}{a^Ta}$$

In [1]:
import numpy as np
import matplotlib.pyplot as plt

plt.style.use("seaborn")

# Define vector a and point b
a = np.array([1, 2])
b = np.array([2, 2])

a, b

(array([1, 2]), array([2, 2]))

There is no way we can multiply a by a scalar to get b, hance it is not in the column space of a.

In [2]:
# find best a
beta = a.dot(b)/a.dot(a)

beta

1.2

In [3]:
beta * a

array([1.2, 2.4])

Hence, we can see that we've optimally estimated b using a.

### Projections in $R^N$
Now, how do we manage this using higher dimensions, we need to turn a into a matrix A and b into a vector with the same dimension as the number of rows in A. 

$$A^T(b-Ax) = 0$$

$$A^TAx = A^Tb$$

To find x we need to take the inverse of $A^TA$, this is possible because it is always a full rank square matrix.

$$x = (A^TA)^{-1}A^Tb$$

Thus, we have found our ideal weights in x in the same way.

In [4]:
X = np.random.randint(0, 100, size=(3, 2))

X

array([[21, 72],
       [25, 82],
       [73, 19]])

We can think of X as variables that can be summed together by some weights to give us our optimal estimates of some vector that does not lie in the column space.

In [5]:
y = np.random.randint(0, 100, size=(3, 1))

y

array([[77],
       [ 1],
       [39]])

In [6]:
weights = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

weights

array([[0.43665522],
       [0.34281468]])

In [7]:
y_pred = X.dot(weights)

y_pred

array([[33.8524166 ],
       [39.02718428],
       [38.3893102 ]])

## Orthogonalisation

One application of projections in this way is by decomposing a vector into its parallel components.

If we cant to decompose vector W relative to V, then we can think of the total vector W as being parallel and perpendicular components of v.

$$w = w_{\parallel v} + w_{\perp v}$$

The component that is parrallel to v will simply be the projection of w onto v:

$$w_{\parallel v} = \frac{w^Tv}{v^Tv}v$$

The vector that is perpendicular to the vector we just calculated can be expressed as the difference between that vector and the original one.

$$w_{\perp v} = w - w_{\parallel v}$$

We can see an example of this below:

In [8]:
w = np.array([2, 3])
v = np.array([4, 0])

v, w

(array([4, 0]), array([2, 3]))

In [9]:
w_para = w.dot(v)/v.dot(v) * v

w_para

array([2., 0.])

In [10]:
w_perp = w - w_para

w_perp

array([0., 3.])

If we perform the dot product between the two components then we get 0

In [11]:
w_para.dot(w_perp)

0.0