In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.text as text

## Linear regression

In this notebook we will consider a simple linear regression model: 

$$ y_i = x_{ij} w_j + b$$

We will be using the "sumation conventions": when an index is repeated the sumation over this index is implied:

$$ 
x_{ij} w_j \equiv   \sum_j x_{ij} w_j 
$$

#### Problem 1

Implement function `linear(x,w,b)` that given feature matrix $\mathbf{x}$, weights $\mathbf{w}$ and bias $b$  returns $\mathbf{y}$. **Hint** Use matrix multiplication operator `@`.  

### Data

#### Problem 2

Generate a random feature matrix $\mathbf{x}$ witch 10000 samples and three features,  
such that first feature is drawn from N(0,1), second feature from  U(,1) and third from N(1,2).

$N(\mu,\sigma)$ denotes normal distribution with mean $\mu$ and standard deviation $\sigma$. To generate random numbers you can use `numpy.random.normal` and `numpy.random.uniform` functions. To collect all features together you can use `numpy.stack` function. 

Then using $\mathbf{x}$, weights $w_{true}$  and  bias $b_{true}$:  

In [None]:
w_true = np.array([0.2, 0.5,-0.2])
b_true = -1

generate output $\mathbf{y}$ assuming a normaly distributed $N(0,0.1)$ noise $\mathbf{\epsilon}$. 

$$ y_i =  
x_{ij} w_j+b +\epsilon_i 
$$

### Loss

#### Problem 3

Given the means square loss

$$ MSE(w,b|y,x) = \frac{1}{2}\frac{1}{N}\sum_{i=0}^{N-1} (y_i -  x_{ij} w_j -b  )^2$$

write down the python function `mse(y,x,w,b)` implementing it:

### Gradient

Find the gradient of the loss function with respect to weights $w$ and bias $b$. 

#### Bonus problem 

Solve numerically the equations 

$$\frac{\partial}{\partial w_k} MSE(w,b|y,x) = 0,\quad \frac{\partial}{\partial b} MSE(w,b|y,x) = 0$$

for $w_k$ and $b$.  You can use the `numpy.linalg.solve` to solve the linear equation. 

In [None]:
x_bar = x.mean(0)
y_bar = y.mean()
X = np.transpose(x) @ x / len(x)
Y = y @ x /len(y)
A = X - np.multiply.outer(x_bar, x_bar)
w = np.linalg.solve(A,Y-y_bar*x_bar)
b = y_bar - x_bar@w
print(w,b)

#### Problem 4

Implement functions `grad_w(y,x,w,b)` and `grad_b(y,x,w,b)` implementing those gradients.  

### Gradient descent 

#### Problem 5

Implement gradient descent for linear regression. Starting from 

In [None]:
w = np.asarray([0.0,0.0,0.0], dtype='float64')
b = 1.0 

How many epochs did you need to get MSE below 0.0075 ? 

#### Problem 6

Implement stochastic gradient descent (SGD).

In [None]:
w = np.asarray([0.0,0.0,0.0], dtype='float64')
b = 1.0 

Starting from same parameters as above and 

In [None]:
batch_size = 100

how many epochs did you need to get MSE below 0.0075 ?  

### Pytorch 

#### Problem 7

Implement SGD using pytorch. Start by just rewritting Problem 3 to use torch Tensors instead of numpy arrays. 

To convert frrom numpy arrays to torch tensors you can use ``torch.from_numpy()`` function: 

In [None]:
import torch as t 

In [None]:
t_y = t.from_numpy(y)
t_x = t.from_numpy(x)
t_w = t.DoubleTensor([0,0,0])
t_b = t.DoubleTensor([1.0])

#### Problem 8 

Implement SGD using pytorch automatic differentiation.

To this end the variable with respect to which the gradient will be calculated, ``t_w`` in this case, must have attribute
``requires_grad`` set to ``True`` (``t_w.require_grad=True``).

The torch will automatically track any expression containing ``t_w`` and store its computational graph. The method ``backward()`` can be run on the final expression to back propagate the gradient e.g. ``loss.backward()``. Then the gradient is accesible as ``t_w.grad``.

In [None]:
t_y = t.from_numpy(y)
t_x = t.from_numpy(x)
t_w = t.DoubleTensor([0,0,0])
t_b = t.DoubleTensor([1.0])
t_w.requires_grad_(True)
t_b.requires_grad_(True)

#### Problem 9 

Implement SGD using pytorch  optimisers. 

In [None]:
t_y = t.from_numpy(y)
t_x = t.from_numpy(x)
t_w = t.DoubleTensor([0,0,0])
t_b = t.DoubleTensor([1.0])
t_w.requires_grad_(True)
t_b.requires_grad_(True)