![Pytorch](images/pytorch_logo.png)

# Deep Learning in Pure Pytorch
Let's take our Regression and add a hidden layer to it, so we have **Deep Learning&trade;**!

In [None]:
import torch
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston

In [None]:
# Set seed
seed = 42
torch.manual_seed(seed);

## Load data

We are using the built-in sklearn dataset Boston House Prices.

Our goal is to predict the median price of a home in a given town from a number of features, such as Crime Rate, Property Tax Rate, amount of Industry etc.

It's generally a good idea to scale our data, so we use Sklearn's MinMax scaler to scale our values between 0 and 1

In [None]:
# Load our dataset
boston = load_boston()
train_x, test_x, train_y, test_y = train_test_split(boston.data, boston.target, random_state=seed)
scaler = MinMaxScaler()

train_x = torch.tensor(scaler.fit_transform(train_x), dtype=torch.float)
test_x = torch.tensor(scaler.transform(test_x), dtype=torch.float)
train_y = torch.tensor(train_y, dtype=torch.float).view(-1, 1)
test_y = torch.tensor(test_y, dtype=torch.float).view(-1, 1)

## Setup parameters

We have some hyperparameters to set, as well as some numbers we need to know upfront.

`layer_size` --> We need to know how many input variables there are, so we can create an equivalent number of weights

`lr` --> Aka learning rate.
When we take a step in our gradient descent, we multiply by this factor, so we don't take too big or too large a step. 

`epochs` --> How many times should we keep stepping?

**new param** `hidden_size` --> How many nodes in the 2nd layer?

In [None]:
# Set some parameters
layer_size = train_x.shape[1]
lr = 0.01
epochs = 1000
hidden_size = 16

## Initialize weights and bias

We need one weight to multiply each feature with - we are learning what these should be, so we start them as a random number.
Since we have two layers now, we need two sets of weights and biases

In [None]:
# Initializing weights
w_1 = torch.randn(layer_size, hidden_size, requires_grad=True, dtype=torch.float)
w_2 = torch.randn(hidden_size, 1, requires_grad=True, dtype=torch.float)
b_1 = torch.zeros(1, requires_grad=True, dtype=torch.float)
b_2 = torch.zeros(1, requires_grad=True, dtype=torch.float)

## Define Loss Function

Just like before, we want to use mean squared error to say how bad or good our line is.

In addition we define our non-linear function to apply to the output of each layer - in this case the **Re**ctified __L__inear **U**nit. It just means return 0 if input is negative, else return the input.

In [None]:
# Define loss function
def mean_squared_error(y_hat, y):
    return ((y_hat - y) ** 2).mean()

# Define non-linear function
def relu(x):
    return torch.max(torch.tensor(0, dtype=torch.float), x)

In [None]:
# Training loop
for epoch in range(epochs):
    layer_1 = relu(train_x @ w_1 + b_1) # Result of first layer...
    pred = layer_1 @ w_2 + b_2 # ...is passed through the second layer
    
    loss = mean_squared_error(pred, train_y)
    
    # Backpropagation
    loss.backward() # The magic bit!
    with torch.no_grad():
        w_1 -= w_1.grad * lr
        w_2 -= w_2.grad * lr
        b_1 -= b_1.grad * lr
        b_2 -= b_2.grad * lr
        w_1.grad.zero_()
        w_2.grad.zero_()
        b_1.grad.zero_()
        b_2.grad.zero_()
        
        if epoch % 10 == 0:
            val_pred = relu(test_x @ w_1 + b_1) @ w_2 + b_2
            val_loss = mean_squared_error(val_pred, test_y)
            print(f"Epoch: {epoch} Train Loss: {loss.item()} Test Loss: {val_loss.item()}")