### Import Statements

In [49]:
from torchvision.datasets import MNIST
from torchvision import transforms
import torch

### Load the dataset

In [50]:
# path to save the MNSIT dataset
PATH = '../data/'

# a variable signifying whether to download the data or use a predownloaded one
DOWNLOAD = False

# download the training dataset, if the data is present don't forget to remove the download parameter
# also transform the images into tensors, originaly they are PIL objects
train_set = MNIST(root=PATH, download=DOWNLOAD, transform=transforms.ToTensor())

# download the testing dataset, if the data is present don't forget to remove the download parameter
# also transform the images into tensors, originaly they are PIL objects
test_set = MNIST(root=PATH, download=DOWNLOAD, train=False, transform=transforms.ToTensor()) 

- Here the images should be converted from PIL(Pilow) objects to tensor, which is a multidimensional array of numbers often in a rectangular shape. For simplicity of understanding you can consider it as an array of matrics.

Note: anything can be a tensor, even 1 singluar number.

In [51]:
print(f"{len(train_set)} training data points")
print(f"{len(test_set)} testing data points")

60000 training data points
10000 testing data points


In [52]:
train_set[0][0]

tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,

In [53]:
train_set[0][1]

5

- The dataset is a collection of tuples with the first index being the pillow object of the number and the second index being the label of that number. This is true for both the training and testing sets

### Create a simple model and train it

1) Define the model by creating a class that will be a subclass of torch.nn.Module class

In [54]:
class Net(torch.nn.Module):
    """
    A one linear layer Neural Net for training on MNIST dataset.
    It expects the images to be of 28 X 28 dimension, which should be flattend making them one linear input with a length of 784.
    The outputs it gives are one of 10 classes, 0 - 9
    """
    def __init__(self) -> None:
        super().__init__()
        self.fc = torch.nn.Linear(784, 10)

    def forward(self, x):
        return self.fc(x)

2) Instantiate the model

In [55]:
model = Net()

In [56]:
model

Net(
  (fc): Linear(in_features=784, out_features=10, bias=True)
)

3) Define loss/optimizer for the neural network

The SGD(Stochastic Gradient Descent) is used for this purpose.

In [57]:
# initialze the optimizer
sgd_optimizer = torch.optim.SGD(
    lr=0.1, # the learning rate
    params=model.parameters()
)

In [58]:
sgd_optimizer

SGD (
Parameter Group 0
    dampening: 0
    differentiable: False
    foreach: None
    lr: 0.1
    maximize: False
    momentum: 0
    nesterov: False
    weight_decay: 0
)

4) Train the model 

- Define data loaders with batch sizes

In [59]:
import torch.utils
import torch.utils.data
import torch.utils.data.dataloader


batch_size = 128

# use pytorch for loading the training data in batches
train_data_loader = torch.utils.data.DataLoader(
    dataset=train_set,
    batch_size=batch_size,
    shuffle=True
)

# use pytorch for loading the testing data in batches
test_data_loader = torch.utils.data.DataLoader(
    dataset=train_set,
    batch_size=batch_size,
)

- Create a function that is used for infering/testing

In [60]:
def test_model(model: torch.nn.Module, test_data_loader: torch.utils.data.DataLoader):
    """
    A function that obtains the accuracy of a model over an MNIST dataset.

    Args:
        model(torch.nn.Module): the model to be tested
        test_data_loader(torch.utils.data.DataLoader): a data loader for the dataset you want to test the model over

    Returns:
        accuracy(tensor): the accuracy of the model
    """
    for batch in test_data_loader:
        images, labels = batch

        # a list for holding accuracies for each batch
        accuracies = []

        # reshape the images
        images = images.view(-1, 784)
        with torch.no_grad():
            # obtain the predictions
            predictions = model(images)
            label_predictions = predictions.argmax(dim=1)
            
            # calculate the accuracy
            correct_predictions = label_predictions == labels
            accuracy = sum(correct_predictions) / len(label_predictions)

            # add the accuracy to the list
            accuracies.append(accuracy)


        # calculate the average accuracy
        avg_accuracy = sum(accuracies) / len(accuracies)

        return avg_accuracy

In [61]:
test_model(model=model, test_data_loader=test_data_loader)

tensor(0.0703)

- The accuracy of the untrained model is about 6.25%. If we increate the batch size it will be closer to 10% because at this point the model is just having a random guess, witch a chance of getting the rigth answer 1 out of 10 times, and increasing the sample size will get you closer to 10%. You can check this out by adjusting the batch size on previous cell.

Define a function to train the model

In [62]:
def train_batch(model: torch.nn.Module, image: torch.Tensor, labels: torch.Tensor, optimizer: torch.optim.Optimizer):
    """"""

    # define tests
    assert type(model) == Net
    assert type(image) == torch.Tensor
    assert type(labels) == torch.Tensor
    assert image.shape[0] == labels.shape[0]

    # reset gradients
    optimizer.zero_grad()

    # predict using the model
    predictions = model(image)
    label_predictions = predictions.argmax(dim=1)

    # calculate loss
    loss_calculatro = torch.nn.CrossEntropyLoss()
    loss = loss_calculatro(predictions, labels)
    
    # calculate the accuracy
    correct_predictions = label_predictions == labels
    accuracy = sum(correct_predictions) / len(label_predictions)
    
    # back propagate the lost
    loss.backward()

    # upadte the weights using the optimizer
    optimizer.step()

    return loss, accuracy


def train_epoch(model: torch.nn.Module, data_loader: torch.utils.data.DataLoader, optimizer: torch.optim.Optimizer):
    """"""

    loss_progression = []
    accuracy_progression = []

    for batch in data_loader:
        images, labels = batch
        images = images.reshape(-1, 784)
        
        loss, accuracy = train_batch(model=model, image=images, labels=labels, optimizer=optimizer)
        
        loss_progression.append(loss)
        accuracy_progression.append(accuracy)

    avg_loss = sum(loss_progression) / len(loss_progression)
    avg_accuracy = sum(accuracy_progression) / len(accuracy_progression)

    return avg_loss, avg_accuracy, loss_progression, accuracy_progression

Start training the model

In [63]:
import copy

model_copy = copy.deepcopy(model)
avg_loss, avg_accuracy, loss_history, accuracy_history = train_epoch(model=model_copy, data_loader=train_data_loader, optimizer=sgd_optimizer)

In [64]:
print(f"The average loss is: {avg_loss}")
print(f"The average accuracy is: {avg_accuracy}")

The average loss is: 2.350289821624756
The average accuracy is: 0.051439233124256134


- After training the model has achieved an accuracy of 89.8% and a loss of 0.566.
- This is over one epoch, one iteration over the dataset. So next let us train it over more than 1 epoch

In [65]:
epochs = 10
for epoch in range(epochs):
    # lists for holding training history
    test_accuracy_history = []
    train_accuracy_history = []
    train_loss_history = []

    # train the model 
    train_loss, train_accuracy, loss_history, accuracy_history = train_epoch(model=model, data_loader=train_data_loader, optimizer=sgd_optimizer)
    
    # get test accuracy
    accuracy = test_model(model=model, test_data_loader=test_data_loader)

    # add the training history
    test_accuracy_history.append(accuracy)
    train_accuracy_history.append(train_accuracy)
    train_loss_history = [*train_loss_history, *loss_history]

    # print out epoch progression
    print(f"Epoch {epoch+1}/{epochs} => train_loss: {train_loss} | train_accuracy: {train_accuracy} | test_accuracy: {accuracy}")

Epoch 1/10 => train_loss: 0.5760444402694702 | train_accuracy: 0.8581923246383667 | test_accuracy: 0.9375
Epoch 2/10 => train_loss: 0.3744281232357025 | train_accuracy: 0.8974713683128357 | test_accuracy: 0.9453125
Epoch 3/10 => train_loss: 0.3426187336444855 | train_accuracy: 0.904795229434967 | test_accuracy: 0.9453125
Epoch 4/10 => train_loss: 0.3260417878627777 | train_accuracy: 0.9092872738838196 | test_accuracy: 0.9375
Epoch 5/10 => train_loss: 0.31539374589920044 | train_accuracy: 0.9121135473251343 | test_accuracy: 0.9375
Epoch 6/10 => train_loss: 0.3077022433280945 | train_accuracy: 0.9146566390991211 | test_accuracy: 0.9453125
Epoch 7/10 => train_loss: 0.3020157516002655 | train_accuracy: 0.9162335395812988 | test_accuracy: 0.9453125
Epoch 8/10 => train_loss: 0.2971782088279724 | train_accuracy: 0.9176272749900818 | test_accuracy: 0.9375
Epoch 9/10 => train_loss: 0.29349759221076965 | train_accuracy: 0.9180215001106262 | test_accuracy: 0.9453125
Epoch 10/10 => train_loss: 0.2