In [1]:
# Get all the variables, classes and functions we defined in the previous lessons
from vars.week3 import *
import torch.nn as nn 
from torchsummary import summary

# 4. Defining simple models
## 4.1 nn.Sequential
The `Sequential` class is very simple: it accepts a sequence of neural network `Modules` as arguments and arranges them such that the output of one is automatically sent to the input of the next in line. This saves us a bit of time writing some code, but has some drawbacks, as we shall see shortly. The following is the simplest possible neural network. It consists only of an input layer, 1 hidden layer and an output layer. It is good practice to print out the summary of your network using `torchsummary.summary`. This lets you inspect your networks parameters and the input/output sizes of each layer. Interestingly, it also acts as a sort of sanity checker for your model, because it will complain if the input/output sizes of your layers aren't compatible with each other.

In [2]:
def get_simple_linear_net():
    return nn.Sequential(nn.Flatten(), nn.Linear(28*28, 128),nn.ReLU(), nn.Linear(128, 10))


summary(get_simple_linear_net(), input_size=(1, 28, 28), device="cpu")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                  [-1, 128]         100,480
              ReLU-3                  [-1, 128]               0
            Linear-4                   [-1, 10]           1,290
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.01
Params size (MB): 0.39
Estimated Total Size (MB): 0.40
----------------------------------------------------------------


# 4.1.1 Simple training loop
Now that we've defined a network, we can start training it! Let's define the simplest possible training function, for which we only require the model, number of training epochs, the dataloader and an optimisation function

In [3]:
def train_model(model, epochs, train_dl, optimiser):
    for epoch in range(epochs):
        total_steps = len(train_dl)
        correct = 0
        total = 0
        msg = ""
        for batch_num,(image_batch, label_batch) in enumerate(train_dl):
            #prepare the data and labael batches
            batch_sz = len(image_batch)
            output = model(image_batch)

            losses = nn.CrossEntropyLoss()(output, label_batch)
            optimiser.zero_grad()
            losses.backward()
            optimiser.step() #updates model weights based on gradients

            preds = torch.argmax(output, dim =1)
            correct += (torch.eq(preds,label_batch).sum())
            total += batch_sz
            minibatch_accuracy = 100 * correct/total


            #### fancy printing stuff
            if(batch_num+1) % 5 ==0:
                print(" "*len(msg),end='\r')
                msg = f'Train epoch[{epoch+1}/{epochs}], MiniBatch[{batch_num + 1}/{total_steps}], Loss: {losses.item():.5f}, Acc: {minibatch_accuracy:.5f}'
                print(msg,end='\r'if epoch < epochs else "\n",flush=True)
            #### fancy printing stuff

In [4]:
from torch.optim import SGD

# Defining network hyperparameters
epochs = 18
batch_sz = 32
learning_rate = 0.005

# Get train, validation and test dataloaders from the fucntion we 
# defined last week 
train_dl, val_dl, test_dl = load_data(DATA_PATH,batch_sz=batch_sz)

# Create an instance of our network
network = get_simple_linear_net()                    
optim = SGD(network.parameters(), lr=learning_rate)  # Stochastic gradient descent optimiser
# Call the training function on the network and use the hyperparameters
# defined above
train_model(network, epochs, train_dl, optim)

Train epoch[2/18], MiniBatch[390/394], Loss: 0.46646, Acc: 82.84455

KeyboardInterrupt: 

# 4.1.2 Debrief: Simple model with simple training loop
At the end of the training loop, our model performs pretty well - should be around 80-90% accuracy most of the time. This is definitely better than random chance, so our model seems to have learned something about the dataset and can make good predictions. But it could be better! Before we look into improving this, there is something else that needs fixing...

# 4.1.3 Training device
Something you may have noticed so far is that the training loop runs quite slowly. 5 epochs is not a very long time at all in the machine learning world and it still takes a while to complete. This because we've been asking the CPU to do all the tensor calculations needed to update the weights. This is a bad idea because GPUs are much more efficient at processing large amounts of data in parallel. You should always use a GPU to train machine learning models if one is available. Pytorch makes it very easy to detect GPU availability and transfer code you've written for a CPU to GPU:

In [None]:
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def train_model_gpu(model, epochs, train_dl, optimiser):
    msg = ""
    for epoch in range(epochs):
        total_steps = len(train_dl)
        correct = 0
        total = 0

        model.train()
        for batch_num, (image_batch, label_batch) in enumerate(train_dl):
            batch_sz = len(image_batch)
            
            # Transferring image and label tensors to GPU #
           ###############################################
            
            image_batch = image_batch.to(DEVICE)
            label_batch = label_batch.to(DEVICE)
        
            output = model(image_batch)
            losses = nn.CrossEntropyLoss()(output, label_batch)
                        
            optimiser.zero_grad()
            losses.backward()
            optimiser.step()  
            
            preds = torch.argmax(output, dim=1)
            correct += int(torch.eq(preds, label_batch).sum())
            total += batch_sz
            minibatch_accuracy = 100 * correct / total

            #### Fancy printing stuff, you can ignore this! ######
            if (batch_num + 1) % 5 == 0:
                print(" " * len(msg), end='\r')
                msg = f'Train epoch[{epoch+1}/{epochs}], MiniBatch[{batch_num + 1}/{total_steps}], Loss: {losses.item():.5f}, Acc: {minibatch_accuracy:.5f}'
                print (msg, end='\r' if epoch < epochs else "\n",flush=True)
            #### Fancy printing stuff, you can ignore this! ######

In [None]:
# Finally, we need to transfer our model to the device as well, and can begin training

# Instantiate simple network
network = get_simple_linear_net()
# Instatiate SGD optimiser using network params
optim = SGD(network.parameters(), lr=learning_rate)
# Transfer network to GPU just like we did the tensors earlier
network = network.to(DEVICE)
# Call the new training function
train_model_gpu(network,epochs,train_dl,optim)

# You should see a speedup in training speed!

Train epoch[18/18], MiniBatch[390/394], Loss: 0.20089, Acc: 94.49519