![Numpy](images/numpy_logo.png)

# Regression in plain Numpy
Let's start by building a regression from scratch in numpy, so we see what's actually happening behind the scenes

In [None]:
import numpy as np

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

In [None]:
# Set seed
seed = 42
np.random.seed(seed)

## Load data

We are using the built-in sklearn dataset Boston House Prices.

Our goal is to predict the median price of a home in a given town from a number of features, such as Crime Rate, Property Tax Rate, amount of Industry etc.

It's generally a good idea to scale our data, so we use Sklearn's MinMax scaler to scale our values between 0 and 1

In [None]:
# Load our dataset
boston = load_boston()
train_x, test_x, train_y, test_y = train_test_split(boston.data, boston.target, random_state=seed)
scaler = MinMaxScaler()

train_x = scaler.fit_transform(train_x)
test_x = scaler.transform(test_x)
train_y = train_y.reshape(-1, 1)
test_y = test_y.reshape(-1, 1)

## Setup parameters

We have some hyperparameters to set, as well as some numbers we need to know upfront.

`layer_size` --> We need to know how many input variables there are, so we can create an equivalent number of weights

`lr` --> Aka learning rate.
When we take a step in our gradient descent, we multiply by this factor, so we don't take too big or too large a step. 

`epochs` --> How many times should we keep stepping?

In [None]:
layer_size = train_x.shape[1]
lr = 0.1
epochs = 800

## Initialize weights and bias

We need one weight to multiply each feature with - we are learning what these should be, so we start them as a random number. 

In [None]:
w = np.random.randn(layer_size, 1)
b = np.zeros(1)

## Define Loss Function

Just like before, we want to use mean squared error to say how bad or good our line is

In [None]:
# Define loss function
def mean_squared_error(y_hat, y):
    return ((y_hat - y) ** 2).mean()

## Define derivatives
In order to find out what size and direction our step should be, we need to get the gradient for each parameter - I've done the math so you don't have to! *(This can be a pain in the behind!)*

In [None]:
# Define derivate functions of w and b
def w_prime(delta, x):
    return np.sum((delta * x), axis=0) / len(x)

def b_prime(delta, x):
    return np.sum(delta, axis=0) / len(x)

In [None]:
# Training loop
for epoch in range(epochs):
    # Forward pass
    pred = train_x @ w + b
    loss = mean_squared_error(pred, train_y)
    
    # Backpropagation
    delta = pred - train_y
    w -= w_prime(delta, train_x).reshape(-1, 1) * lr
    b -= b_prime(delta, train_x) * lr
    
    # Validate model
    if epoch % 10 == 0:
        val_pred = test_x @ w + b
        val_loss = mean_squared_error(val_pred, test_y)
        print(f"Epoch: {epoch} Train Loss: {loss} Test Loss: {val_loss}")


# Compare to Sklearn

Just to make sure we've not done something horribly wrong, let's compare our homemade Gradient Descent Linear Regression vs the sklearn LinearRegression

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn import metrics

In [None]:
regression = LinearRegression()
regression.fit(train_x, train_y)

In [None]:
metrics.mean_squared_error(test_y, regression.predict(test_x))