This notebook will give a quick intro to training PyTorch models with torchbearer. 

We’ll need to load in some data and define a model, then we can train it with standard PyTorch and with torchbearer to see how it compares.

Before we do that, lets import all the packages we'll need:

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import transforms

from torchbearer.cv_utils import DatasetValidationSplitter

Now lets build our dataloader. We'll use the CIFAR10 dataset from torchvision and create training and testing dataloaders. 

In [2]:
BATCH_SIZE = 128

normalize = transforms.Normalize(mean=[0.5, 0.5, 0.5],
                                 std=[1, 1, 1])

dataset = torchvision.datasets.CIFAR10(root='./data/cifar', train=True, download=True,
                                        transform=transforms.Compose([transforms.ToTensor(), normalize]))

traingen =  torch.utils.data.DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=False) 

testset = torchvision.datasets.CIFAR10(root='./data/cifar', train=False, download=True,
                                       transform=transforms.Compose([transforms.ToTensor(), normalize]))
testgen = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE, shuffle=False)

Files already downloaded and verified
Files already downloaded and verified


Now lets build our model. Here is a 3 layer CNN with batchnorm, ReLU activations and a final linear layer to classify to CIFAR10s 10 classes. We also create the optimizer and define the criterion. 

In [3]:
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.convs = nn.Sequential(
            nn.Conv2d(3, 16, stride=2, kernel_size=3),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.Conv2d(16, 32, stride=2, kernel_size=3),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.Conv2d(32, 64, stride=2, kernel_size=3),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        self.classifier = nn.Linear(576, 10)

    def forward(self, x):
        x = self.convs(x)
        x = x.view(-1, 576)
        return self.classifier(x)


model = SimpleModel().cuda()
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

First lets have a look at a standard PyTorch training loop for this type of task. 

In [11]:
nEpoch = 2
for epoch in range(nEpoch):  # loop over the dataset multiple times
    trainloader = iter(traingen)
    running_loss = 0.0
    for i, data in enumerate(trainloader):
        # get the inputs
        inputs, labels = data

        # wrap them in Variable
        inputs, labels = inputs.cuda(), labels.cuda()

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss = 0.99 * running_loss + 0.01 * loss.item()
        print("Epoch: {}, Batch: {}, Running Loss: {}".format(epoch, i, running_loss), end='\r')

print('\n **** Finished Training **** \n')

correct = 0
total = 0
for data in testgen:
    images, labels = data
    labels = labels.cuda()

    outputs = model(images.cuda())
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()
    
print('Accuracy of the model on the 10000 test images: %d %%' % (
    correct / total))



Epoch: 1, Batch: 390, Running Loss: 0.8963909865273205
 **** Finished Training **** 

Accuracy of the model on the 10000 test images: 0 %


We reset the model and optimizer 

In [9]:
model = SimpleModel().cuda()
optimizer = optim.Adam(model.parameters(), lr=0.001)

We can see that this takes quite a few lines. Instead, in torchbearer we can achieve the same result with just 4 lines. 

In [10]:
from torchbearer import Trial

torchbearer_trial = Trial(model, optimizer, criterion, metrics=['acc', 'loss']).to('cuda')
torchbearer_trial.with_generators(train_generator=traingen, val_generator=testgen)
torchbearer_trial.run(2)

0/2(t): 100%|██████████| 391/391 [00:06<00:00, 56.77it/s, running_acc=0.546, running_loss=1.27, acc=0.474, acc_std=0.499, loss=1.47, loss_std=0.25]
0/2(v): 100%|██████████| 79/79 [00:01<00:00, 64.74it/s, val_acc=0.547, val_acc_std=0.498, val_loss=1.26, val_loss_std=0.0828]
1/2(t): 100%|██████████| 391/391 [00:06<00:00, 56.93it/s, running_acc=0.62, running_loss=1.07, acc=0.594, acc_std=0.491, loss=1.14, loss_std=0.1]
1/2(v): 100%|██████████| 79/79 [00:01<00:00, 65.18it/s, val_acc=0.599, val_acc_std=0.49, val_loss=1.13, val_loss_std=0.0933]


[((391, 79),
  {'running_acc': 0.5460000002384185,
   'running_loss': 1.2666822719573974,
   'acc': 0.47436,
   'acc_std': 0.4993421576434339,
   'loss': 1.4740624586334619,
   'loss_std': 0.24970062229582546,
   'val_acc': 0.5474,
   'val_acc_std': 0.4977481692583108,
   'val_loss': 1.2617988556246214,
   'val_loss_std': 0.08279087051484435}),
 ((391, 79),
  {'running_acc': 0.6204375004768372,
   'running_loss': 1.0682953095436096,
   'acc': 0.59406,
   'acc_std': 0.49107302552675397,
   'loss': 1.1441477407579836,
   'loss_std': 0.10008796427875284,
   'val_acc': 0.5987,
   'val_acc_std': 0.49016151419710624,
   'val_loss': 1.1318027565750894,
   'val_loss_std': 0.09329527538470476})]