![Pytorch](images/pytorch_logo.png)

# Deep Learning in PyTorch with all the bells and whistles
Let's reimplement our **Deep Learning&trade;** network, but this time we use all the toys!

In [8]:
import torch
from torch import nn
import torch.nn.functional as F
from torch import optim
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

In [9]:
# Set seed
seed = 42
torch.manual_seed(seed);

## Load data

We are using the built-in sklearn dataset Boston House Prices.

Our goal is to predict the median price of a home in a given town from a number of features, such as Crime Rate, Property Tax Rate, amount of Industry etc.

It's generally a good idea to scale our data, so we use Sklearn's MinMax scaler to scale our values between 0 and 1

In [10]:
# Load our dataset
boston = load_boston()
train_x, test_x, train_y, test_y = train_test_split(boston.data, boston.target, random_state=seed)
scaler = MinMaxScaler()

train_x = torch.tensor(scaler.fit_transform(train_x), dtype=torch.float)
test_x = torch.tensor(scaler.transform(test_x), dtype=torch.float)
train_y = torch.tensor(train_y, dtype=torch.float).view(-1, 1)
test_y = torch.tensor(test_y, dtype=torch.float).view(-1, 1)

## Setup parameters

We have some hyperparameters to set, as well as some numbers we need to know upfront.

`layer_size` --> We need to know how many input variables there are, so we can create an equivalent number of weights

`lr` --> Aka learning rate.
When we take a step in our gradient descent, we multiply by this factor, so we don't take too big or too large a step. 

`epochs` --> How many times should we keep stepping?

**new param** `hidden_size` --> How many nodes in the 2nd layer?

In [11]:
# Set some parameters
layer_size = train_x.shape[1]
lr = 0.01
epochs = 1000
hidden_size = 16

# Defining the Model

We get to play with some new toys here - `nn.Module` makes it very easy to define a model and the `nn` module contains plenty of premade functionality 

Now we see how useful our new toys are - look how easy it is for me to add 2 additional layers!

In [5]:
class Model(nn.Module):
    def __init__(self, layer_size, hidden_size):
        super().__init__()
        self.l1 = nn.Linear(layer_size, hidden_size)
        self.l2 = nn.Linear(hidden_size, hidden_size)
        self.l3 = nn.Linear(hidden_size, 1)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = self.relu(self.l1(x))
        x = self.relu(self.l2(x))
        return self.l3(x)

# Define loss function and optimizer
Pytorch predefines a number of loss functions for me, so I can just use those directly, such as `nn.MSELoss()`

Another new features is the SGD optimizer. Until now we have been updating the weights "by hand" in a fairly naive fashion. There are many ways of updating our weights and Pytorch provides implementations for most of these. 

SGD *(Stochastic Gradient Descent)* is the closest to what we've been doing so far, so we use that to handle our weight updates.

In [6]:
model = Model(layer_size, hidden_size)
opt = optim.SGD(model.parameters(), lr=lr)
loss_func = nn.MSELoss()

# Training Loop

In [7]:
# Training loop
for epoch in range(epochs):
    # Forward pass
    model.train() # Put the model into train mode
    pred = model(train_x) # Our model acts like a function!
    loss = loss_func(pred, train_y)
    
    # Backpropagation    
    loss.backward() # The magic bit
    opt.step() # A new magic bit
    opt.zero_grad() # Gotta reset the gradients to zero, so they don't accumulate
    
    if epoch % 10 == 0:
        model.eval()  # Put the model into evaluation mode
        val_pred = model(test_x)
        val_loss = loss_func(val_pred, test_y) # Calculate validation loss
        print(f"Epoch: {epoch} Train Loss: {loss.item()} Validation Loss: {val_loss.item()}")
print(f"Epoch: {epoch} Train Loss: {loss.item()} Validation Loss: {val_loss.item()}")

Epoch: 0 Train Loss: 606.623779296875 Validation Loss: 495.802978515625
Epoch: 10 Train Loss: 324.54266357421875 Validation Loss: 171.4607391357422
Epoch: 20 Train Loss: 67.16031646728516 Validation Loss: 52.76789093017578
Epoch: 30 Train Loss: 58.69449234008789 Validation Loss: 43.758201599121094
Epoch: 40 Train Loss: 37.83702850341797 Validation Loss: 30.774845123291016
Epoch: 50 Train Loss: 48.848297119140625 Validation Loss: 47.153846740722656
Epoch: 60 Train Loss: 37.94041442871094 Validation Loss: 28.001794815063477
Epoch: 70 Train Loss: 62.90554428100586 Validation Loss: 39.652427673339844
Epoch: 80 Train Loss: 47.16319274902344 Validation Loss: 31.31838607788086
Epoch: 90 Train Loss: 47.825740814208984 Validation Loss: 29.859413146972656
Epoch: 100 Train Loss: 43.41510772705078 Validation Loss: 28.221847534179688
Epoch: 110 Train Loss: 41.27399444580078 Validation Loss: 27.049591064453125
Epoch: 120 Train Loss: 38.60835266113281 Validation Loss: 25.96039581298828
Epoch: 130 Tra