In [None]:
import requests
from IPython.core.display import HTML
HTML(f"""
<style>
@import "https://cdn.jsdelivr.net/npm/bulma@0.9.4/css/bulma.min.css";
</style>
""")

# Projection and Least Squares Tutorial
This tutorial will guide you through the basics of linear projections and their relation to least squares.


In [None]:
# importing libraries
import numpy as np
import matplotlib.pyplot as plt

## Projections
Recall from the reading material that an orthogonal projection is a transformation that maps vectors onto a subspace in such that the distances between original and projected points are minimal. 
**Example**
Define a set of points $X=\begin{bmatrix}|&&|\\x_1&\dots&x_n\\|&&|\end{bmatrix} \in \mathbb{R}^{2\times n}$ (`points`
 in the code) and a line $\mathcal{l}$ defined by the function $f(x)=0.5x$. The task is to project $X$ onto $\mathcal{l}$:


In [None]:
# Three points
X = np.array([
    [1, 2],
    [2, 1.5],
    [3, 1.2]
]).T

# Show plot
plt.scatter(X[0, :], X[1, :], c="r")

# Make line points (remember Numpy broadcasting)
x = np.linspace(0, 4)
f_x = x * 0.5

# Plot line
plt.plot(x, f_x);

A point $x_i$ is projected onto the line $\mathcal{l}$ by multiplying $Px_i$ where $P$ is the projection matrix. Projecting all points in $X$ onto $l$ is therefore $X^{\prime}=PX$.
The projection matrix $P$ is given by:

$$
P = A(A^TA)^{-1}A^T,
$$

where $A$ is the _design matrix_. $A=\begin{bmatrix}1\\0.5\end{bmatrix}$ for the line $l$, i.e., $\begin{bmatrix}x\\y\end{bmatrix} = At$. 
** ANTON: SKal have en ordentlig forklaring (muligvis en figur)**
<article class="message is-info">
  <div class="message-header">Info</div>
  <div class="message-body">- The columns of $A$ span the subspace we want to project the point $x^{\prime}_i$ onto.
- The design matrix for $l$ is one-dimensional 

</div>
</article>

The code cell below calculates $P$ and projects $X$ onto $l$:


In [None]:
#The line l written as the design matrix
A = np.array([[1, 0.5]]).T  #  has to be a column vector
## construct projection matrix
P = (A @ np.linalg.inv(A.T @ A)) @ A.T
print("P:\n", P)

#projection the points with matrix multiplication
x_prime = P @ X
print("projected points:\n", x_prime)

The projection process is visualized below:


In [None]:
# Creating a square figure (makes it easier to visually confirm projection)
plt.figure(figsize=(8, 8))

plt.scatter(X[0, :], X[1, :], label="Original points")  # Old points
plt.scatter(x_prime[0, :], x_prime[1, :], label="Projected points")  # Projected points
plt.plot(x, f_x, label="Line")  # Line
plt.legend()

# Gather old and projected points in a single array
P1 = np.concatenate([X.T[:, :].reshape(1, 3, 2), x_prime.T[:, :].reshape(1, 3, 2)], axis=0)
# Plot projection/error lines
plt.plot(P1[:, 0, 0], P1[:, 0, 1], 'g--')
plt.plot(P1[:, 1, 0], P1[:, 1, 1], 'g--')
plt.plot(P1[:, 2, 0], P1[:, 2, 1], 'g--')

# Set axes limits to be the same for equal aspect ratio
plt.xlim(0, 3.5)
plt.ylim(0, 3.5);

<article class="message is-warning">
  <div class="message-header">Observe</div>
  <div class="message-body">The projection lines (dashed lines) and $\mathcal{l}$ are orthogonal. 
</div>
</article>

## Least squares
This section is about fitting a straight line to the matrix of points $X$ using projections (least squares). The goal is to find the line that minimizes the error between the points and the fitted line.


In [None]:
# Define the example points
X = np.array([
    [1, 1],
    [2, 2],
    [3, 2]
]).T

plt.scatter(X[0, :], X[1, :]);

This requires that we look differently on the problem than we did previously. In the previous section we projected a set of points onto an existing line. Now, we want to find a line that minimises the error of projecting the points onto it. The heart of the problem is the linear equation

$$
A\mathbf{w} = \mathbf{y},
$$

where $\mathbf{w}=\begin{bmatrix}a\\b\end{bmatrix}$ is the model parameters and represents a line. This might be confusing because $\mathbf{w}$ represented points in the previous section. Another way to understand this is that for a linear function of the form $f(x) = ax + b$, we may write this in matrix form as $y=f(x;\mathbf{w}) = \begin{bmatrix}x& 1\end{bmatrix}\mathbf{w}$. 
With multiple points, the function can be rewritten

$$
\begin{bmatrix}x_1 & 1\\\vdots & \vdots \\x_n&1\end{bmatrix} \mathbf{w} = A\mathbf{w} = \mathbf{y} = \begin{bmatrix}y_1\\ \vdots \\y_n\end{bmatrix},
$$

which is the general linear equation which can be solved using $\mathbf{y}=A^{-1}\mathbf{w}$. However, since the system is _overdetermined_, it will often have no solution. This is the case whenever the points in $A$ are not on a straight line. Instead, an approximate solution $\mathbf{\hat{w}}$ is found that minimizes the squared error of the distance from each predicted value $\mathbf{\hat{y}}$ to the original value $\mathbf{y}$. 
<article class="message is-warning">
  <div class="message-header">Tip</div>
  <div class="message-body">To get an intuition for why the system likely lacks a solution, try to visualize a line in the plot above. Clearly, there is no line that intersects all points, unless you modify them to be on a straight line.
</div>
</article>
 
The previous section demonstrated how projections minimize the orthogonal distance between the original points and their projections onto a subspace (the line in this case). However, in this case it the vertical distances, i.e. $\mathbf{\hat{y}}-\mathbf{y}$ that are of interest.
where $a, b$ are the unknown parameters we want to find. The $1$’s column is what allows this compact notation - it ensures that $b$ is treated as constant. Now, $A$’s column vectors are each elements of $R^n$ but only span a plane ($R^2$) in this space. This space represents all possible valid lines. Since $\mathbf{y}$ might not be placed on this plane, we cannot solve the equations directly.
As demonstrated in the book and lectures, we first have to project $\mathbf{y}$ onto the plane spanned by $A$. This leads to a new equation $A\hat{x}=\hat{\mathbf{y}}$ which **can** be solved using inverses. 
### Design matrix
The design matrix $A$:


In [None]:
x_vals = X[0, :]
y_vals = X[1, :]

A = np.vstack((x_vals, np.ones(x_vals.shape))).T
print("A\n", A)

In [None]:
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.quiver([0, 0], [0, 0], [0, 0], A[0], A[1], A[2], length=1, arrow_length_ratio=0.2, normalize=True)
ax.set_xlim([-0.2, 1])
ax.set_ylim([-0.2, 1])
ax.set_zlim([-0.2, 1])

The 3-dimensional space represents the possible combinations of points 
### Projection
Instead of transforming the line parameters $mathbf{w}$ back into $R^n$ when projecting ($n$ is from the design matrix which has $n$ rows, one for each point), we use the same method you were tasked with implementing in the last section. This gives the equation: $\mathbf{w} = (A^TA)^{-1}A^T \mathbf{\hat{y}}$ , same as explained in the projection section but with a slightly different notation:


In [None]:
Pm_y = np.linalg.inv(A.T @ A) @ A.T
# Applying the transformation
params = Pm_y @ y_vals
print("params:", params)

<article class="message is-warning">
  <div class="message-header">Observe</div>
  <div class="message-body">When applying projections, the resulting vectors are described in the original vector space, e.g. in this example above both the original and projected points are in $\mathbb{R}^2$. Often we are only interested in the points expressed in the subspace, called $x_{sub}$. In this case $x_{sub}$ is simply a scaler. The points in the original vectorspace is found by mulitplying $x_{sub}$ with the design matrix $A$. Hence mathematically, $x_i^{\prime}=Ax_{sub}$ and since $x_i^{\prime} = A(A^TA)^{-1}A^T x_i = Ax_{sub}$. It follows that $x_{sub} =(A^TA)^{-1}A^Tx_i$ is the formula to recover the points expressed in the dimension of the subspace. It is intuitive to think of $x_{sub}$ as coordinates. Each coordinate tells you how for to go along a certain direction. With each direction being a coloum vectors of the design matrix.
</div>
</article>

The `params`
 vector is of the form $(a, b)$ and the line formula is $f(x)=ax+b$. Below, we calculate a number of points on the line for visualization purposes and compare with both the original and projected points:


In [None]:
x = np.linspace(0, 5)  # Create range of values
y = x * params[0] + params[1]  # Calculate f(x)

plt.figure(figsize=(5, 5))

plt.plot(x, y)  # Plot line
plt.scatter(X[0, :], X[1, :])  # Plot original points

y_hat = A @ params  # Project original points onto the line (like in the last section)
plt.scatter(X[0, :], y_hat)  # Plot the points
plt.title('Least squares linear regression 3 points')

### Measuring the error
Remember that both $\mathbf{y}$ and $\mathbf{\hat{y}}$ are vectors. The projection error is:

$$
e = \|\mathbf{y}-\mathbf{\hat{y}}\| = \sqrt{\sum_{i=1}^n (y_i - \hat{y}_i)^2} = \sqrt{(\mathbf{y}-\mathbf{\hat{y}})(\mathbf{y}-\mathbf{\hat{y}})^\top}.
$$
 


In [None]:
# Calculating the projected y-values
y_hat = A @ params
# Calculating the error
diff = y_vals - y_hat
e = np.sqrt(diff @ diff.T)
print("e", e)

To get the mean error, we use

$$
f_{RMS}(\mathbf{y}, \mathbf{\hat{y}}) = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2} = \sqrt{\frac{1}{n}(\mathbf{y}-\mathbf{\hat{y}})(\mathbf{y}-\mathbf{\hat{y}})^\top}.
$$



In [None]:
diff = y_vals - y_hat
rmse = np.sqrt((diff @ diff.T).mean())
print("root mean squared error", rmse)