On the last sheet, we looked at a formulation for regression that worked on single points (a 'pointwise' description). In this sheet we will look at how we can describe linear regression instead in terms of vectors. 


Recall the formula for simple linear regression:

$$
y = w_0 + w_1 x
$$


Its also important to remember the process for matrix multiplication. Let's start with a very simple example
$$
\begin{bmatrix}
    a & 0 \\ 0 & b\\
\end{bmatrix}
\begin{bmatrix}
    x\\
    y\\
\end{bmatrix}
$$

What does that mean? A simple intuition is that each column in the matrix describes how each row in the vector should be transformed. We can expand out the result as so:


$$
\begin{bmatrix}
    xa + 0y\\
    0a + by
\end{bmatrix} = 
\begin{bmatrix}
    ax\\
    by\\
\end{bmatrix}
$$

So, every row in the matrix corresponds to a row in the resulting vector. An interesting result: if we have only a single row in the matrix, then the result is a single number. Let's look at annother example:


$$
\begin{bmatrix}
    a & b
\end{bmatrix}
\begin{bmatrix}
    x\\
    y\\
\end{bmatrix} = ax + by
$$

Now, that might look familiar; if we rename the variables in our vectors a little bit here, something interesting happens:


$$
\begin{bmatrix}
    w_0 & w_1
\end{bmatrix}
\begin{bmatrix}
    1\\
    x\\
\end{bmatrix} = w_0 + w_1 x
$$

This is exactly the same as our definition of simple linear regression! We have represented regression with vectors!

We typically give these vectors names; for example:

$$
\text{parameters} = w = \begin{bmatrix}
    w_0\\
    w_1
\end{bmatrix}
$$

and 

$$
\text{data} = x = \begin{bmatrix}
1\\
x
\end{bmatrix}
$$


Note that to get the multiplication to work like we want, we need to use the transpose of the parameters vector; Thus we can re-write our model as:

$$
y = w^\intercal x
$$


This is quite useful, but can we go further? What if we tried to compute all our predictions at once? First lets think about the format we want our answer to be in:

$$
Y = \begin{bmatrix}
    y_1\\
    y_2\\
    y_3\\
    \vdots\\
    y_n
\end{bmatrix}
$$

We can stack all our values one on top of the other. So then how should our input data look?

$$
X = \begin{bmatrix}
    1 & x_1\\
    1 & x_2\\
    1 & x_3\\
    \vdots & \vdots\\
    1 & x_n
\end{bmatrix}
$$

Then, if we multiply this by $w$ from above:

$$
Xw = \begin{bmatrix}
    w_0 + w_1x_1\\
    w_0 + w_1x_2\\
    w_0 + w_1x_3\\
    \vdots\\
    w_0 + w_1x_n\\
\end{bmatrix}
$$

Which is exactly what we wanted. Note the difference with our earlier definition: its $Xw$, not $w^\intercal x$. If you don't understand why, try writing out some small matrices and doing the calculation by hand.

# Vectorizing loss

How about loss?

Well, vectorising error is quite easy; remember, error for a single point is given by:

$$t - f(x)$$

For our vector model, we can use:

$$
e = t_n - w^\intercal x_n
$$

To compute the mean, we can just sum along the vector

$$
ME = \frac{1}{|X|} \sum_{i =0} ^{|X|} t_i - w^\intercal x_i
$$

But something very cool happens when we look at mean squared error. First, note the following identity:

$$
\sum_{i=1}^{D} a_i^2 = A^\intercal A
$$

Where $a_i$ is a row of $A$.

We can exploit this identity in our case like so:


$$
MSE = \frac{1}{|X|} \sum_{i =0} ^{|X|} (t_i - w^\intercal x_i)^2 = \frac{1}{|X|} (t - Xw)^\intercal (t - Xw)
$$

# Vectorising Regression

So, hopefully you remember at least some of the vectorisation stuff we just went over; feel free to refer back to it if you need. Lets start nice and easy: given some parameters, put them into the parameter vector $w$. 

In [2]:
import numpy as np

In [3]:
def makeW(w0:float, w1:float) -> np.array:
    return np.array([w0, w1])

Hopefully, you didn't have any trouble there. Now lets use that to predict a value for a given data point. 

Hint: numpy uses `@` for matrix multiplication (and equivalently, for vector dot product)

In [7]:
data = makeW(1, 10)
params = makeW(5, 0.5)

data.T @ params

10.0

So we've handled a very simple case. Let's get a bit more advanced. Here is some more data:

We need to format this data so that our model can work with it. write a function that takes in some 1-dimensional data, and produces the proper vector

In [30]:
def formatData(data) -> np.array:
    ones = np.ones(len(data))
    return np.array([ones, data]).T

X = formatData([1, 2, 3, 4, 5])
X

array([[1., 1.],
       [1., 2.],
       [1., 3.],
       [1., 4.],
       [1., 5.]])

Now, predict the values for all of those at once!

In [23]:
w = makeW(10, 2)
X @ w


array([12., 14., 16., 18., 20.])

Nice! Now lets compute the loss for that 

In [28]:
def loss(w, X, t):
    error = (t - X @ w)
    return (error @ error) / len(X)