# Linear Regression in numpy

This notebook gives an intro to machine learning from scratch using numpy and also covers feature scaling and the gradient descent algorithm.

## What is linear regression?

Linear = our predictions are a **linear combination** of our inputs

Regression = we will learn the relationship that relates features to labels

$h$ is our hypothesis - our prediction of the mapping from input to output.

## $ h = W X = w_1 x_1 + w_2 x_2 + \dots + w_{n-1} x_{n-1} + w_n x_n$

This linear combination is a **weighted sum of the input features**. As we vary the value of one feature, our hypothesis will change proportionately and linearly.

Imagine that we are trying to predict house price. Consider:
- The weight associated with the feature that is the number of rooms will be large and positive, because the number of rooms contributes lots, and positively to the price of a house. 
- The weight associated with the age of the house may be negative, as older houses might be found to be worth less from the training data.
- The weight associated with a feature that is the age of the person last living there will be zero, because the house price is independent of this feature. It does not contribute at all to the house price.

## Gradient Descent
Gradient descent works by moving the weights that control a model


Firstly we will import some functionality

In [3]:
import numpy as np
import matplotlib.pyplot as plt

### Making Data
Let's write a function and call it to create some fake data that we know the function of

In [4]:
m = 100 # specify the number of datapoints that we want

def makedata(numdatapoints):            # make some fake data fitted to an arbitrary polynomial
    """Make some fake data noisily distributed around a polynomial"""
    x = np.linspace(-10, 10, numdatapoints)     # create a vector of numdatapoints(=m) numbers evenly spaced between -10 and 10
    x = x.reshape(-1, 1)    # make it into a column vector (each datapoint is a row)

    coeffs = [2, -30, 0.5, 5]   # make some polynomial coefficients for the fake data

    y = np.polyval(coeffs, x) + 2 * np.random.rand(numdatapoints, 1)    # evaluate a polynomial with the coefficients we
                                                                        # specified to create the labels for our data
    y = y.reshape(-1, 1) # reshape into column vector (each datapoint is a row)

    return x, y # return column vectors of single inputs and outputs

### Features

We may choose to give the model a particular set of features based upon how we expect the mapping to look. For example, house price is probably dependent on floor space, so we might include a power of 2 if our original variable was room length.

Let's manually **make some polynomial features** from our data

In [5]:
powers = [2, 3]     # a list of the powers of the inputs which we want to include as features for our model
n = len(powers)     # n = number of features of each training datapoint

def makefeatures(powers):
    features = np.ones((inputs.shape[0], len(powers)))  # initialise a design matrix with the right shape (mxn)
    for i in range(len(powers)):    # for each power in the list powers
        features[:, i] = (inputs**powers[i])[:, 0] # set a column of the design matrix = inputs raised to that power
    print(features)

    return features

### Feature Scaling - normalisation/standardisation

If the scale of variables is different, it is likely that one of the variables will start further from an acceptably sufficient position than another. When the first one of them reaches a sufficient value, it needs to stop jumping around and settle, however, any other variables need to keep moving and being optimised. 
If variables vary over the same domain, but have ranges of different scales
The learning rate needs to be small enough to ensure that the parameters converge, but large enough to ensure that this happens at a suitable rate.
 

Let's build our model class

In [None]:
class LinearModel():
    
    def __init__(self, num_inputs, num_outputs):
        self.num_inputs = num_inputs
        self.num_outputs = num_outputs
        self.weights = np.random.rand((outputs, inputs + 1)) # plus one row for the bias
        
    def forward(x):
        h = np.matmul(self.weights, x)
        return h

Let's write a function to train our model

In [2]:
def train():
    costs = [] # initialise an empty list to store our past costs in
    for e in range(epochs): # for as many epochs as we have defined
        prediction = mymodel(datain) # pass the datain forward through our model
        
        cost = criterion(predictions, labels) # calculate the cost of our predictions compared to the labels
        costs.append(cost.data) # get the data from the cost variable and append it to the list of costs
        print('Epoch', e, 'Cost', cost.data[0])
        
        # unpack the parameters so that we can use values for visualisation
        params = [mymodel.state_dict([i][0]) for i in mymodel.state_dict()]
        weights = params[0]
        bias = params[1]
        print('b', bias)
        print('w', weights)
        
        optimiser.zero_grad() # zero the gradients
        cost.backward() # push the error 

Now let's call all of our functions, train the model and then show the history of costs