Importing NumPy and TorchVision to get the MNIST dataset and convert them to NumPy arrays

In [1]:
import numpy as np
from torchvision import datasets

Initialistion of the DatasetGenerator class where the number of workers, batch sizes, and valid sizes used to get and build the datasets are defined.
Setting the number of workers to 0 causes the data preperation to be handled by the main thread.
Batch size and valid size refers to the actual processing of the dataset; batch size of 20 is setting the size of each minibatch and 0.2 is a percentage (20%) of how much the dataset is used for validation training.

In [2]:
class DatasetGenerator:
    def __init__(self):
        self.num_workers = 0
        self.batch_size = 20
        self.valid_size = 0.2

Using the TorchVision functions to get the MNIST dataset and setting its download locatations. The train valirable is used to set the download to either getting the training or testing set

In [3]:
    def dataset(self, train=True):
        return datasets.MNIST(root='data', train=train, download=True)

This function converts the MNIST Torch dataset into a NumPy array used later for all the NumPy based calculations.

    1. First the X and Y of the datasets are obtained
    2. They are both turned into non-singletons and made easier to be operated on
    3. The data (images of hand written numbers) is normalised by 255.0 (greyscale bitsize)
    4. Flatten the data into a 2D array
    5. Conver the labels into a categorical array of arrays where each category is binary (0/1)

In [4]:
    def to_numpy(self, dataset):
        data = dataset.data.numpy()
        labels = dataset.targets.numpy()
        data = np.squeeze(np.array(data))
        labels = np.squeeze(np.array(labels))
        data = data / 255.0
        data_flat = data.reshape(data.shape[0], -1)
        labels = np.eye(10)[labels]
        return data_flat, labels

Returns a NumPy array converted train dataset of MNIST

In [5]:
    def get_train_data(self):
        return self.to_numpy(self.dataset())

Returns a NumPy array converted test dataset of MNIST

In [6]:
    def get_test_data(self):
        return self.to_numpy(self.dataset(False))

Returns the "shape" of the dataset used later for hyperparamterisation and layer definition

In [7]:
    def get_layers(self):
        data = self.dataset().data[0].shape[0]
        labels = len(np.unique(self.dataset().targets))
        return (data * data), labels