These are the only imports you should need for the whole notebook

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

# Loading our dataset
Loading in the MNIST dataset. Don't worry too much about what's going on here, where just setting up the framework for pulling images and their labels from the training and testing sets.

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5), (0.5))])

batch_size = 4

trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')


Let's take a look at what we're dealing with

In [None]:
# functions to show an image
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))

# Building a Neural Network

Let's define a structure for a single layer neural network. This is what the deep learning experts spend a lot of time tinkering with to figure out the best structures fora problem. This is pretty much the simplest network possible.

In [None]:
# Define the class for single layer NN
class one_layer_net(torch.nn.Module):
    # Constructor
    def __init__(self, input_size, hidden_neurons, output_size):
        super(one_layer_net, self).__init__()
        # hidden layer
        self.linear_one = torch.nn.Linear(input_size, hidden_neurons)
        self.linear_two = torch.nn.Linear(hidden_neurons, output_size)
        # defining layers as attributes
        self.layer_in = None
        self.act = None
        self.layer_out = None
    # prediction function
    def forward(self, x):
        flattened_input = torch.flatten(x, 1)
        self.layer_in = self.linear_one(x)
        self.act = torch.sigmoid(self.layer_in)
        self.layer_out = self.linear_two(self.act)
        y_pred = torch.sigmoid(self.linear_two(self.act))
        return y_pred

Let's make one for MNIST
Our input is 784 in size because that's how many pixel values we'll get  
The output size is 10 because that's how many classes we have to predict, one for each digit  
The hidden layer size is 42 as a fun starting number

In [None]:
model = one_layer_net(28*28, 42, 10)

We need a way of updating the weights in our network based on loss, known as an optimizer, so we choose a method provided by pytorch. Don't worry too much about the specifics of this part, we're using Stochastic Gradient Descent (SGD), which is pretty standard.

In [None]:
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

It's finally time to train! We'll start with two passes over the data and update the model weights after every batch. We previosuly defined a batch as being 4 images in size.

In [None]:
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()
        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

Now that the model is trained we can save it's weights to a file. We'll use this later, but this is also what can be shared so others can use the model you built.

In [None]:
PATH = './mnist_model_weights.pth'
torch.save(model.state_dict(), PATH)

# Testing
Let's test to see how our model does.  
When evaluating models it's important to have a seperate test set of inputs that the model has not seen before, otherwise a model can potentially memorize specifics from it's training set that do not apply beyond, this is known as overfitting, and avoiding it is another big piece of machine learning.

In [None]:
dataiter = iter(testloader)
images, labels = next(dataiter)

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))

In [None]:
net = one_layer_net(28*28, 42, 10)
net.load_state_dict(torch.load(PATH))

Now we'll make some example predicitons.

In [None]:
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join(f'{classes[predicted[j]]:5s}'
                              for j in range(4)))

Finally, run the model against the full test dataset and see how it did overall and per class!

In [None]:
# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

# again no gradients needed
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        # collect the correct predictions for each class
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1


# print accuracy
print(f'Accuracy of the network on the 10000 test images: {100 * sum(correct_pred.values()) // 10000} %')
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')