This is a first draft of code for the INT2 project. Most variables have been chosen quite arbitrarily - I have signified where this is the case, so that we know what to experiment with in order to improve the code.

*Note you need to have pytorch installed already*

First, we load and transform the CIFAR10 dataset:

In [1]:
import torch
import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader

our_transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])


# first we load the data into a training set and a testing set
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=our_transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=our_transform)

# now we create loader objects that allow us to iterate through the data
trainloader = DataLoader(trainset, batch_size=8, shuffle=True, num_workers=2)
testloader = DataLoader(testset, batch_size=8, shuffle=False, num_workers=2)

Files already downloaded and verified
Files already downloaded and verified


The definition of our_transform is not my own, here's an explanation of what it's supposed to do:

'toTesnor()' converts the image data into a tensor format (necessary to work with easily)

'Normalize()' normalises the tensor data (i.e. setting x = x_mean / standard_deviation). The triples (0.5, 0.5, 0.5) are apparently supposed to represent the means and standard deviations respectively for each of the three RGB channels (red/green/blue). Why they should all be set to 0.5 isn't very clear to me.

As for the loaders, the choice 'batch_size=4' and 'num_workers=2' is completely arbitrary - although I originally ran it with batch_size=8 and got much worse results, so maybe 4 is a good choice? Batch size seems to be to do with the amount of data that is held in memory at any one time, though I don't really understand it, and 'num_workers' is about the number of threads that can run concurrently.

We will now set up the neural network:

In [2]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        # defines the layers
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        self.conv2 = nn.Conv2d(16, 64, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.linear1 = nn.Linear(64*8*8, 512)
        self.linear2 = nn.Linear(512, 10)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64*8*8) # reshaping the output from the second convolutional layer
        x = F.relu(self.linear1(x))
        x = self.linear2(x)
        return x

net = Net()

This network has two convolutional layers (which extract low-level features), two pooling layers (to extract more information from the convolutional layer), and two linear layers (which do the actual linear classifying). More layers, especially more convolutional layers, might be a good idea.

The numbers in each layer are again arbitrarily chosen - for the most part. In 'conv1', the first 3 represents the 3 input channels (that is, the 3 RGB channels), and the last 3 denotes kernel size, i.e. the convolution is done with a 3x3 filter. Apparently it is good to keep the kernel size at either 3 or 5 (why, I have no idea).
It is important that the 16 - which represents number of output channels - in conv1 matches the 16 in the first place of conv2, as the output from conv1 is also the input to conv2 (technically there is a pooling step in between but this doesn't change the number of channels). Similarly if a third layer was added the input dimension would have to be 64 (unless we changed the output dimension of conv2)
The optional variable padding='1' ensures the convolutional layers do not change the output dimension.

If you get an error 'expected input size to match target size' when running the training part below (as I did), add a line 'print(x.shape)' just before the line starting 'x = x.view('. This will output an array - multiply all but the first number together and set that as the input dimension for the first linear layer. If you don't do this it forces the batch size to change: https://discuss.pytorch.org/t/valueerror-expected-input-batch-size-324-to-match-target-batch-size-4/24498/4

In linear2, the output dimension of 10 is important - this is the number of categories we are trying to classify into.

Now we define the loss function and optimizer:

In [3]:
import torch.optim as optim

loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum = 0.9)

CrossEntropyLoss is apparently a relatively (computationally) simple loss function which suits our purposes okay.
There are a few optional parameters for 'optim.SGD' which might be worth playing around with, check the documentation here: https://pytorch.org/docs/master/generated/torch.optim.SGD.html

Now we train the network:

In [4]:
# run for 4 Epochs, i.e. loop over the dataset 4 times
for epoch in range(1, 5):
    train_loss, valid_loss = [], []
    
    # training 
    net.train()
    for data, target in trainloader:
        optimizer.zero_grad()
        output = net(data)
        loss = loss_function(output, target)
        loss.backward()
        optimizer.step()
        train_loss.append(loss.item()) 
    
    print('Run ', epoch, ' complete') # track progress
print('All done')

Run  1  complete
Run  2  complete
Run  3  complete
Run  4  complete
All done


The above took a little over 15 minutes to run on my computer (though for some reason it was only using about 60% of the CPU, maybe this is something to do with the batch_size=4 or the num_workers=2 from earlier?)

Running for more epochs would probably help, but I suspect altering the definition of the NN would help more. *Edit: I've ran it twice - once w

Now we can test the model, first by just seeing how it performs on a single batch:

In [5]:
test_iterator = iter(testloader)
data, labels = test_iterator.next()
output = net(data)

_, predicted = torch.max(output, 1)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))
print('Actual: ', ' '.join('%5s' % classes[labels[j]]
                              for j in range(4)))

Predicted:    cat  ship  ship plane
Actual:    cat  ship  ship plane


And now over the whole dataset:

In [6]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 58 %


And finally on each individual category:

In [7]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

Accuracy of plane : 58 %
Accuracy of   car : 73 %
Accuracy of  bird : 38 %
Accuracy of   cat : 47 %
Accuracy of  deer : 52 %
Accuracy of   dog : 53 %
Accuracy of  frog : 56 %
Accuracy of horse : 68 %
Accuracy of  ship : 82 %
Accuracy of truck : 59 %


Do be aware that none of the testing code above is originally mine, but rather is taken from the wiki I linked to on github.

**Final observations**
- I haven't used CUDA to run this on GPU instead of CPU.
- Here's a useful guide on figuring out sizes for NN layers: https://towardsdatascience.com/pytorch-layer-dimensions-what-sizes-should-they-be-and-why-4265a41e01fd