# 4. Linear algebra operations

In [1]:
import numpy

## Basic operations

Linear algebra is a branch of mathematics dealing with vector spaces. Linear algebra operations such as transposes, dot products, matrix multiplications and others are often very useful when manipulating numeric datasets. Using these operations often allows us to avoid writing explicit loops, and thus make our code more readable, more concise and faster to execute.

In the following steps we'll implement linear regression using `numpy` linear algebra operations. We will start with the functional form of the regression: how the predicted target values depend on the regression coefficients the of features (aka predictors or regressors). 

Once we have the coefficients $\mathrm{\beta}$ and the matrix with the predictors $\mathrm{X}$ we can calculate predicted targets $\mathrm{\hat{y}}$ according to:

$$
\mathrm{\hat{y}} = \beta_0 + \mathrm{X}\mathrm{\beta}
$$

If we add an extra column with all $1$s to the matrix $\mathrm{X}$, we can write this without the intercept $\beta_0$:

$$
\mathrm{\hat{y}} = \mathrm{X}\mathrm{\beta}
$$

This operation is the matrix multiplication. In this case the first matrix is $N\times M$ where N is the number datapoints (rows) and $M$ the number of predictors (including the extra row of ones). The second matrix is $M\times 1$, so it is a column vector.

In this particular case, for each row of M we multiply it with $\beta$ and the sum these multiplications is our predicted $\hat{y}$ for each row. This can be written explicitly as:

$$
\mathrm{\hat{y}_i} = \sum_{m=1}^M X_{i,m} \beta_m
$$

We can see that the matrix multiplication version is much more concise. The same is true in numpy code: it's more concise to implement this using the matrix multiplication function `numpy.dot` than write a bunch of loops.


### Matrix multiplication / dot product

In numpy the concept of dot product (aka scalar product) is treated as
a special case of matrix multiplication. The numpy function `dot` for simple dot product between vectors, for multiplying a matrix by a vector, as well as for multyplying two matrices.

The definition of dot product between two vectors $u$ and $v$ is:

$$\langle u, v\rangle = \sum_{i=1}^N u_i v_i$$

Other notations that you will come across for this operation are:
$u \cdot v$, $u^T v$.

In numpy we write
```python
numpy.dot(u, v)
```
or
```python
u.dot(v)
```

#### Exercise 4.1

Create two vectors of random values between -10 and 10 of size 100. Compute:
- elementwise product between them
- dot (scalar) product between them.


When multiplying two matrices, the number of columns in the first one needs to be equal to the number of rows in the second one. For matrices $A_{m\times n}$ and $B_{n \times p}$, the resulting matrix will be $C_{m \times p}$. It is defined as:

$$
C_{i,j} = \sum_{k=1}^n A_{i,k}B_{k,j}
$$
![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/eb/Matrix_multiplication_diagram_2.svg/470px-Matrix_multiplication_diagram_2.svg.png)

In `numpy` we simply use `dot`:

```python
numpy.dot(A, B)
```
or

```python
A.dot(B)
```

### Exercise 4.2

- Create a random matrix $A_{3\times 4}$ and another random matrix $B_{4 \times 2}$. Multiply AB.  

- Create a random matrix $C_{3\times 3}$ and $D_{3\times3}$. Multiply CD. Multiply DC. Is matrix multiplication commutative?

- Create a identity matrix $I_{3\times 3}$.  Multiply IC, CI, DI, ID. What do you notice?

- What will be the result of multiplying a matrix $Z_{m\times n}$ by a matrix $O_{n \times p}$ whose all entries are zero? Check your answer using some examples in `numpy`.

### Exercise 4.3

- Create a random matrix $A_{3\times 4}$ and another random matrix $B_{2 \times 4}$. Can you transform one of them so that they can be multiplied? Try this in `numpy`.

## Transpose

We have already encountered matrix transpose. The mathematical notation for the transpose of matrix $A$ is $A^T$. Transposing a matrix simply means making the rows into columns and columns into rows. If $A$ is $m \times n$ then $A^T$ is $n \times m$. The values are:

$$A^T_{i,j} = A_{j,i}$$

In `numpy` the transpose is simply written `A.T`.

### Exercise 4.4

- Create a random $4 \times 5$ matrix and verify that the above equality holds for it.
- What would be the outcome of $(A^T)^T$? Check this in `numpy`.

## Inverse

For scalar numbers the multiplicative inverse (aka reciprocal) of number $n$ is $\frac{1}{n}$, also written as $n^{-1}$. This inverse has certain properties, like:
- $n^{-1}n = 1$
- $(n^{-1})^{-1} = n$

There is an analogous concept for matrices. For a square matrix $A_{m \times m}$, its inverse is written $A^{-1}$ and it satisfies:

- $A^{-1}A = I$ where $I$ is the $m \times m$ identity matrix
- $(A^{-1})^{-1} = A$
- $(A^T)^{-1} = (A^{-1})^T$

Not all matrices are invertible: a matrix needs to be square, and its [determinant](https://en.wikipedia.org/wiki/Determinant) needs to be non-zero. There is a function to invert matrices in `scipy.linalg` called `inv`.


In [10]:
from scipy.linalg import inv
A = numpy.random.uniform(0,1,(3,3))
print(A)
print(inv(A))

[[ 0.28220008  0.13801978  0.15833848]
 [ 0.45325715  0.17665677  0.47649004]
 [ 0.88278238  0.55563422  0.22141151]]
[[ 52.63763471 -13.39479568  -8.81650803]
 [-74.71544028  18.03171689  14.62614014]
 [-22.37065785   8.15517988   2.96404637]]


- $A^{-1}A = I$ where $I$ is the $m \times m$ identity matrix
- $(A^{-1})^{-1} = A$
- $(A^T)^{-1} = (A^{-1})^T$

### Exercise 4.5

Verify the three properties of the matrix inverse operations listed above for a random $m \times m$ numpy matrix. 

## Ordinary Least Squares formula for Linear Regression

We are now ready to implement the formula which can be used to find the coefficients of linear regression:

$$\hat\beta = (X^TX)^{-1}X^Ty$$

Remember that $X$ has $N$ rows corresponding to the $N$ datapoints, and $M$ columns corresponding to the $M$ predictors. The formula defines the vector $\hat\beta$ with the $M$ regression coefficients.

We will apply this formula to the winequality dataset.

Previously we loaded this dataset into a structured array.


In [13]:
# Load winequality as a structured array
data = numpy.genfromtxt("winequality-red.csv", names=True, delimiter=';')
# Convert the array into a matrix. We'll have the target in the last column
print(data.shape)

(1599,)


In [14]:
# Convert structured array into array of numeric values
Xy = data.view((data.dtype[0], len(data.dtype)))
print(Xy.shape)

(1599, 12)


In [15]:
# Extract X and y from Xy
X = Xy[:,:-1]
y = Xy[:,-1:]
print(X.shape)
print(y.shape)

(1599, 11)
(1599, 1)


In [16]:
X_new = numpy.hstack([ numpy.ones((1599,1)), X ])
print(X_new)

[[  1.      7.4     0.7   ...,   3.51    0.56    9.4  ]
 [  1.      7.8     0.88  ...,   3.2     0.68    9.8  ]
 [  1.      7.8     0.76  ...,   3.26    0.65    9.8  ]
 ..., 
 [  1.      6.3     0.51  ...,   3.42    0.75   11.   ]
 [  1.      5.9     0.645 ...,   3.57    0.71   10.2  ]
 [  1.      6.      0.31  ...,   3.39    0.66   11.   ]]


### Exercise 4.6

Implement function `fit` which takes a matrix of predictors and a vector of targets, and returns the vector of regression coefficients computed according to the OLS formula.
Apply this function to the winequality data.

$$(X^T X)^{-1} X^T y$$

In [17]:
def fit(X, y):
    # ----------------------------------
   

### Exercise 4.7

Implement function `predict` which takes a vector of coefficients and a vector of predictors, and returns the predicted targets according to the regression formula (see beginning of notebook). Apply this function to the coefficients from the previous exercise, and the winequality data.

In [20]:
def predict(beta, X):
    #--------------------------------


### Exercise 4.8

Define the following two functions to quantify how well the regression is able to predict the targets:
- `mse` - mean squared error, defined as the mean of the squared difference between each prediction and true target: $$MSE(y, \hat{y}) = \frac{1}{N}\sum_{i=1}^N (y_i-\hat{y}_i)^2$$
- `mae` - mean absolute error, defined as the mean of the absolute difference between each prediction and true target: $$MAE(y, \hat{y}) = \frac{1}{N}\sum_{i=1}^N abs(y_i-\hat{y}_i)$$

Check how well your regression functions predict the targets in winequality according to these error measures.

### Exercise 4.9

- Load the iris data and extract the first three column into a predictor matrix, and the fourth column into a target vector. Apply the functions `fit` and `predict` to this data, and check the MSE and MAE of your predictions.