# Neural Networks for Fashion MNIST in PyTorch
We will extend our previous MLP from scratch example by re-implementing the same content in PyTorch. This may seem like a tour-de-force, but will show just exactly how much of the complicated underlying implementation is abstracted away from the user in modern Deep Learning frameworks. We will then proceed to implement a simple convolutional neural network (CNN). 

Luckily, PyTorch is already installed by default in Colab. We will install one auxiliary package called torchnet: https://github.com/pytorch/tnt though, which we will use for confusion matrices. 

Before starting the notebook you should make sure that your runtime uses GPU acceleration. You can find the corresponding option under *runtime* and then *change runtime type*.

In [4]:
!pip install torchnet 



As always we will import numpy as we will still use it for our dataloader. We now also import PyTorch (simply called torch when importing) and particularly its neural network specific part *nn*. To be on the safe side you can also print Colab's pre-installed version of PyTorch and check if it corresponds to the most recent version (or alternatively update it)

In [3]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
print(torch.__version__)

1.11.0+cu113


### Dataset class extended to use directly in PyTorch
We can basically take our given dataset loader from our previous MLP from scratch FashionMNIST example and use it almost as is.

There is one modification that we absolutely have to make which is converting the numpy arrays to torch tensors.
Below, we will have to use the function *torch.from_numpy()* for this purpose. 

Two additional features we can add is the use of PyTorch dataset and dataloader structures that are very convenient to use and highly efficient. 
These are called *torch.utils.data.TensorDataset* and *torch.utils.data.DataLoader* and allow for the use of a multi-threaded mini-batch dataset loader. In contrast to storing the entire dataset in our memory, this data loader allows us to only load and return the current mini-batch and e.g. store the rest of the dataset in terms of paths only. Although we can just load the entire dataset at once in our simple example (in fact we still do when loading it into Numpy the first time), this is particularly useful for large datasets thart do not fit our memory.

In [5]:
import os
import struct
import gzip
import errno
import torch.utils.data
import torchvision.datasets as datasets


class FashionMNIST:
    """
    Fashion MNIST dataset featuring gray-scale 28x28 images of
    fashion items belonging to ten different classes.
    Dataloader adapted from MNIST.
    We do not define __getitem__ and __len__ in this class
    as we are using torch.utils.data.TensorDataSet which
    already implements these methods.

    Parameters:
        args (dict): Dictionary of (command line) arguments.
            Needs to contain batch_size (int) and workers(int).
        is_gpu (bool): True if CUDA is enabled.
            Sets value of pin_memory in DataLoader.

    Attributes:
        trainset (torch.utils.data.TensorDataset): Training set wrapper.
        valset (torch.utils.data.TensorDataset): Validation set wrapper.
        train_loader (torch.utils.data.DataLoader): Training set loader with shuffling.
        val_loader (torch.utils.data.DataLoader): Validation set loader.
    """

    def __init__(self, is_gpu, batch_size, workers):
        self.path = os.path.expanduser('datasets/FashionMNIST')
        self.__download()

        self.trainset, self.valset = self.get_dataset()

        self.train_loader, self.val_loader = self.get_dataset_loader(batch_size, workers, is_gpu)

        self.val_loader.dataset.class_to_idx = {'T-shirt/top': 0,
                                                'Trouser': 1,
                                                'Pullover': 2,
                                                'Dress': 3,
                                                'Coat': 4,
                                                'Sandal': 5,
                                                'Shirt': 6,
                                                'Sneaker': 7,
                                                'Bag': 8,
                                                'Ankle boot': 9}

    def __check_exists(self):
        """
        Checks if dataset has already been downloaded

        Returns:
             bool: True if downloaded dataset has been found
        """

        return os.path.exists(os.path.join(self.path, 'train-images-idx3-ubyte.gz')) and \
               os.path.exists(os.path.join(self.path, 'train-labels-idx1-ubyte.gz')) and \
               os.path.exists(os.path.join(self.path, 't10k-images-idx3-ubyte.gz')) and \
               os.path.exists(os.path.join(self.path, 't10k-labels-idx1-ubyte.gz'))

    def __download(self):
        """
        Downloads the Fashion-MNIST dataset from the web if dataset
        hasn't already been downloaded.
        """

        from six.moves import urllib

        if self.__check_exists():
            return

        print("Downloading FashionMNIST dataset")
        urls = [
            'https://cdn.rawgit.com/zalandoresearch/fashion-mnist/ed8e4f3b/data/fashion/train-images-idx3-ubyte.gz',
            'https://cdn.rawgit.com/zalandoresearch/fashion-mnist/ed8e4f3b/data/fashion/train-labels-idx1-ubyte.gz',
            'https://cdn.rawgit.com/zalandoresearch/fashion-mnist/ed8e4f3b/data/fashion/t10k-images-idx3-ubyte.gz',
            'https://cdn.rawgit.com/zalandoresearch/fashion-mnist/ed8e4f3b/data/fashion/t10k-labels-idx1-ubyte.gz',
        ]

        # download files
        try:
            os.makedirs(self.path)
        except OSError as e:
            if e.errno == errno.EEXIST:
                pass
            else:
                raise

        for url in urls:
            print('Downloading ' + url)
            data = urllib.request.urlopen(url)
            filename = url.rpartition('/')[2]
            file_path = os.path.join(self.path, filename)
            with open(file_path, 'wb') as f:
                f.write(data.read())

        print('Done!')

    def __get_fashion_mnist(self, path, kind='train'):
        """
        Load Fashion-MNIST data

        Parameters:
            path (str): Base directory path containing .gz files for
                the Fashion-MNIST dataset
            kind (str): Accepted types are 'train' and 't10k' for
                training and validation set stored in .gz files

        Returns:
            numpy.array: images, labels
        """

        labels_path = os.path.join(path,
                                   '%s-labels-idx1-ubyte.gz'
                                   % kind)
        images_path = os.path.join(path,
                                   '%s-images-idx3-ubyte.gz'
                                   % kind)

        with gzip.open(labels_path, 'rb') as lbpath:
            struct.unpack('>II', lbpath.read(8))
            labels = np.frombuffer(lbpath.read(), dtype=np.uint8)

        with gzip.open(images_path, 'rb') as imgpath:
            struct.unpack(">IIII", imgpath.read(16))
            images = np.frombuffer(imgpath.read(), dtype=np.uint8).reshape(len(labels), 784)

        return images, labels

    def get_dataset(self):
        """
        Loads and wraps training and validation datasets

        Returns:
             torch.utils.data.TensorDataset: trainset, valset
        """

        x_train, y_train = self.__get_fashion_mnist(self.path, kind='train')
        x_val, y_val = self.__get_fashion_mnist(self.path, kind='t10k')

        # This is new with respect to our previous data loader
        # convert to torch tensors in range [0, 1]
        # after conversion (before normalization) you need to cast the training data to float
        # and the labels to long (integers)
        x_train = torch.from_numpy(x_train.astype(np.float)) / np.maximum(x_train.astype(np.float))
        y_train = torch.from_numpy(y_train.astype(np.int))
        x_val = torch.from_numpy(x_val.astype(np.float)) / np.maximum(x_val.astype(np.float))
        y_val = torch.from_numpy(y_val.astype(np.int))

        # resize flattened array of images for input to a CNN
        # we use the in-place variant of the resize function here
        x_train.resize_(x_train.size(0), 1, 28, 28)
        x_val.resize_(x_val.size(0), 1, 28, 28)

        # TensorDataset wrapper
        trainset = torch.utils.data.TensorDataset(x_train, y_train)
        valset = torch.utils.data.TensorDataset(x_val, y_val)

        return trainset, valset

    def get_dataset_loader(self, batch_size, workers, is_gpu):
        """
        Defines the dataset loader for wrapped dataset

        Parameters:
            batch_size (int): Defines the batch size in data loader
            workers (int): Number of parallel threads to be used by data loader
            is_gpu (bool): True if CUDA is enabled so pin_memory is set to True

        Returns:
             torch.utils.data.TensorDataset: trainset, valset
        """

        # multi-threaded data loaders
        train_loader = torch.utils.data.DataLoader(self.trainset, batch_size=batch_size, shuffle=True,
                                                   num_workers=workers, pin_memory=is_gpu, sampler=None)
        test_loader = torch.utils.data.DataLoader(self.valset, batch_size=batch_size, shuffle=True,
                                                  num_workers=workers, pin_memory=is_gpu, sampler=None)

        return train_loader, test_loader


Let's load the data and set the device to use. 

In [7]:
# set a boolean flag that indicates whether a cuda capable GPU is available 
# we will need this for transferring our tensors to the device and 
# for persistent memory in the data loader
is_gpu = torch.cuda.is_available()
print("GPU is available:", is_gpu)
print("If you are receiving False, try setting your runtime to GPU")

# set the device to cuda if a GPU is available
device = torch.device("cuda" if is_gpu else "cpu")

# in contrast to our MLP from scratch notebook, we need to set the batch size already now
# this is because our data loader now requires it.
batch_size = 128
# we also set the amount of workers, i.e. parallel threads to use in our data loader
workers = 2

# We can now instantiate our dataset class 
dataset = FashionMNIST(is_gpu, batch_size, workers)

GPU is available: True
If you are receiving False, try setting your runtime to GPU
Downloading FashionMNIST dataset
Downloading https://cdn.rawgit.com/zalandoresearch/fashion-mnist/ed8e4f3b/data/fashion/train-images-idx3-ubyte.gz


HTTPError: ignored

### The MLP model in PyTorch
We now show how to implement a 2 hidden layer MLP in PyTorch. 

Suitable hidden-layer sizes for this task could be 100 and 100, like in our last notebook. 
Because we are using an optimized GPU implementation, you are welcome and should try larger sizes to see the impact of neural network size (capacity) on our task!

In [None]:
class MLP(nn.Module):
    def __init__(self, img_size, num_classes):
        super(MLP, self).__init__()
        
        self.img_size = img_size
        
        # we can optionally set the "bias=False" in our layers (like in previous notebook)
        self.fc1 = 
        self.fc2 = 
        self.fc3 = 

    def forward(self, x):
        # The view flattens the data to a vector (the representation needed by the MLP)
        x = x.view(-1, self.img_size)
        
        x = # apply first layer and activatiom
        x = # apply second layer and activation
        x = # apply last linear layer (no activation)
        return x

### Defining optimization criterion and optimizer
A good baseline is a Cross Entropy loss (that combines a logarithmic Softmax + negative log-likelihood) and a stochastic gradient descent (SGD) algorithm with a baseline learning rate of 0.01. 
The Softmax function: https://en.wikipedia.org/wiki/Softmax_function is similar to the Sigmoid function, but is a normalized exponential and thus normalizes the probability of the output. In contrast to the Sigmoid unit that just gives out values in the range of 0-1 for each output unit, the Softmax function outputs values that are normalized to 1 across the entire range of all outputs. 


If we want to we can use additional momenta or regularization terms (such as L2 - Tikhonov regularization commonly reffered to as weight-decay in ML). The respective optimizer parameters are called *momentum* and *weight_decay*.

In [None]:
# Define optimizer and loss function (criterion)
img_size = 28 * 28
num_classes = 10

# create an instance of the MLP and transfer the model to the device.
# Note that we do not necessarily need any custom weight initialization as PyTorch
# already uses the initialization schemes that we have previously learned about internally. 
model = MLP(img_size, num_classes).to(device)
# we can also print the model architecture
print(model)

# set the loss function
criterion = nn.CrossEntropyLoss().to(device)

# we can use advanced stochastic gradient descent algorithms 
# with regularization (weight-decay) or momentum
optimizer = torch.optim.SGD(model.parameters(), lr=0.01,
                            momentum=0.9,
                            weight_decay=5e-4)

### Monitoring and calculating accuracy
We add a convenience class to keep track and average concepts such as processing or data loading speeds, losses and accuracies. For this we need to define a function to define accuracy, which could be based on the absolute accuracy, or top-1 accuracy. Often times in Machine Learning other metrics are employed. For example, in the ImageNet ILSVRC challenge with a classification problem containing 1000 classes, it is common to report the top-5 accuracy. Here a prediction is counted as accurate if the correct class lies within the top-5 most likely output classes. 

In [None]:
class AverageMeter(object):
    """
    Computes and stores the average and current value
    """
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count


def accuracy(output, target, topk=(1,)):
    """
    Evaluates a model's top k accuracy

    Parameters:
        output (torch.autograd.Variable): model output
        target (torch.autograd.Variable): ground-truths/labels
        topk (list): list of integers specifying top-k precisions
            to be computed

    Returns:
        float: percentage of correct predictions
    """

    maxk = max(topk)
    batch_size = target.size(0)

    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk:
        correct_k = correct[:k].reshape(-1).float().sum(0)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res

### Training function (sometimes referred to as "hook")
The training function needs to loop through the entire dataset in steps of mini-batches (for SGD). For each mini-batch the output of the model and losses are calculated and a *backward* pass is done to calculate gradients and an *optimizer step* is done in order to do the respective update to the model's weights. This is similar to our former notebook where we first calculate the errors/deltas for every layer and then apply the weight updates at the end.

When the entire dataset has been processed once, one epoch of the training has been conducted. It is common to shuffle the dataset after each epoch. In contrast to our previous notebook from scratch, in this implementation this is handled by the "sampler" of the dataset loader. 

In [None]:
def train(train_loader, model, criterion, optimizer, device):
    """
    Trains/updates the model for one epoch on the training dataset.

    Parameters:
        train_loader (torch.utils.data.DataLoader): The trainset dataloader
        model (torch.nn.module): Model to be trained
        criterion (torch.nn.criterion): Loss function
        optimizer (torch.optim.optimizer): optimizer instance like SGD or Adam
        device (string): cuda or cpu
    """

    # create instances of the average meter to track losses and accuracies
    losses = AverageMeter()
    top1 = AverageMeter()

    # switch to train mode
    model.train()

    # iterate through the dataset loader
    for i, (inp, target) in enumerate(train_loader):
        # transfer inputs and targets to the GPU (if it is available)
        inp = 
        target = 

        # compute output, i.e. the model forward
        output = 
        
        # calculate the loss
        loss = 

        # measure accuracy and record loss and accuracy
        prec1, prec5 = accuracy(output, target, topk=(1, 5))
        losses.update(loss.item(), inp.size(0))
        top1.update(prec1.item(), inp.size(0))

        # compute gradient and do the SGD step
        # we reset the optimizer with zero_grad to "flush" former gradients
        optimizer.zero_grad()
        
        loss # backpropagate the loss here 
        optimizer # do the optimizer step here

        # print the loss every 100 mini-batches
        if i % 100 == 0:
            print('Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                  'Prec@1 {top1.val:.3f} ({top1.avg:.3f})'.format(
                   loss=losses, top1=top1))

### Validation function
Validation is similar to the training loop, but on a separate dataset with the exception that no update to the weights is performed. This way we can monitor the generalization ability of our model and check whether it is overfitting (memorizing) the training dataset.  

In [None]:
from torchnet import meter

def validate(val_loader, model, criterion, device):
    """
    Evaluates/validates the model

    Parameters:
        val_loader (torch.utils.data.DataLoader): The validation or testset dataloader
        model (torch.nn.module): Model to be evaluated/validated
        criterion (torch.nn.criterion): Loss function
        device (string): cuda or cpu
    """

    # create instances of the average meter to track losses and accuracies
    losses = AverageMeter()
    top1 = AverageMeter()

    confusion = meter.ConfusionMeter(len(val_loader.dataset.class_to_idx))

    # switch to evaluate mode 
    # (this would be important for e.g. dropout where stochasticity shouldn't be applied during testing)
    model.eval()

    # avoid computation of gradients and necessary storing of intermediate layer activations
    with torch.no_grad():
        # iterate through the dataset loader
        for i, (inp, target) in enumerate(val_loader):
            # transfer to device
            inp = 
            target = 

            # compute output
            output =

            # compute loss
            loss =

            # measure accuracy and record loss and accuracy
            prec1, _ = accuracy(output, target, topk=(1, 5))
            losses.update(loss.item(), inp.size(0))
            top1.update(prec1.item(), inp.size(0))

            # add to confusion matrix
            confusion.add(output.data, target)

    print(' * Validation accuracy: Prec@1 {top1.avg:.3f} '.format(top1=top1))

### Running the training of the model
Let's optimize this model for 20 epochs and check at every epoch how we are doing on our validation set. 

Depending on your model definition and optimizer you might experience over-fitting!

In [None]:
total_epochs = 20
for epoch in range(total_epochs):
    print("EPOCH:", epoch + 1)
    print("TRAIN")
    # call the train function
    train(...)
    print("VALIDATION")
    # call the validation function
    validate(...)

### Moving from MLP to CNN
Now that we have seen how our two-hidden layer MLP performs, let's see how we can move on to a convolutional neural network (CNN). The advantage of a CNN is that the we no longer have an all-to-all connectivity structure between layers, but rather take a look at local (2-D or even 3-D) neighborhoods. This spatial (or even temporal) filter is then convolved over the whole input (here an image) by "sharing the weights" to every position. The outcome is typically referred to as a feature map and in order to check for multiple features we apply a set of such filters in parallel.  We will see how these effects improve our accuracy in contrast to our MLP. 

Let us see how to build a CNN with 2 layers with a fully-connected classifier on top and included pooling layers after every convolution. These layers generally subsample the input and introduce translation invariance (to an extent). The network should again have rectified linear units for activation functions and end on a fully-connected linear layer to the amount of classes.

    1. Define two convolution layers "nn.Conv2d" with 5 x 5 filters. Good starting values for amount of filters/features can be 64 in the first and 128 in the second layer.
    2. Convolutions should be followed by ReLU activations. You can apply the activations in the definition of the forward pass with the functional package and "F.relu"
    3. Each conv + act block should be followed by a 2 x 2 max pooling "nn.MaxPool2d" with stride 2.
    4. You will need to calculate the remaining feature * spatial dimensionality to flatten the convolutional output to feed it to the last fully-connected layer. 

In [None]:
class CNN(nn.Module):
    def __init__(self, num_classes):
        super(CNN, self).__init__()
        
        self.conv1 =  # input features, output features, kernel size
        self.mp1 =  # kernel size, stride
        
        self.conv2 =  # input features, output features, kernel size
        self.mp2 =  # kernel size, stride
        
        self.fc =  # 4x4 is the remaining spatial resolution here

    def forward(self, x):
        # Conv + ReLU + max pooling for two layers
        x = 
        x = 
        # The view flattens the output to a vector (the representation needed by the classifier)
        x = x.view(-1, your_determined_size_here)
        # apply fully-connected linear layer
        x = 
        return x

### Constructing and running the CNN
Let's create an instance of our CNN model and optimize it. 

In [None]:
# create CNN model instance
model = 
print(model)

# again, define loss function and optimizer
criterion = nn.CrossEntropyLoss().to(device)

optimizer = torch.optim.SGD(model.parameters(), lr=0.01,
                            momentum=0.9,
                            weight_decay=5e-4)

# optimize
total_epochs = 20
for epoch in range(total_epochs):
    print("EPOCH:", epoch + 1)
    print("TRAIN")
    train(...)
    print("VALIDATION")
    validate(...)

We can see that by changing to a CNN for images we have gained a couple percent accuracy already. If you want to play around with this example you will be able to gain even more by modifying the network to include regularization methods such as dropout, augmenting or preprocessing your data, constructing larger and deeper models and finding better hyperparameters such as learning rates or mini-batch sizes.  

### How well did the model do?
In Machine Learning research it is crucial to compare and contrast a model to other researchers implementations. Many of the current Machine Learning datasets are posed as benchmarks where results are rigorously tracked in order to examine the efficiency and efficacy of a model or algorithm proposition.

For the fashion MNIST dataset you can check how well both of your models (from scratch and in PyTorch) perform here:
http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/#

Do keep in mind that in order to analyze the usefulness of a method one should always compare and contrast on a variety of different datasets with varying task and complexity.

---
# Kuzushiji recognition - KMNIST: transferring what we have learned to a different task

Here is where neural networks in combination with libraries such as PyTorch really begin to shine. Because the neural network approach is generic, our approach is transferable to a different classification task with minor code modifications. If the complexity of the target task is roughly the same, then we don't need to change the architecture and basically only need to exchange the dataloader. 

We will learn how to use PyTorch's in-built dataloaders for convenience and apply our learned knowledge to a second classification task: recognition of [classical Japanese handwritten Hiragana](https://github.com/rois-codh/kmnist). 

![Kuzushiji](https://raw.githubusercontent.com/rois-codh/kmnist/master/images/kmnist_examples.png)

## KMNIST dataloader - PyTorch dataloaders and transformations for data augmentation
Similar to above example where we wrap our custom dataset into PyTorch's data loaders, we will use PyTorch's in-built dataloaders. However, as the KMNIST dataset has already been added we can directly use convenience functions. We will also see how we can trivially implement data augmentation directly into our data loader.





In [None]:
import torchvision.transforms as transforms


class KMNIST:
    """
    KMNIST dataset featuring gray-scale 28x28 images of
    Japanese Kuzushiji characters belonging to ten different classes.
    Dataset implemented with torchvision.datasets.KMNIST.

    Parameters:
        args (dict): Dictionary of (command line) arguments.
            Needs to contain batch_size (int) and workers(int).
        is_gpu (bool): True if CUDA is enabled.
            Sets value of pin_memory in DataLoader.

    Attributes:
        train_transforms (torchvision.transforms): Composition of transforms
            including conversion to Tensor, repeating gray-scale image to
            three channel for consistent use with different architectures
            and normalization.
        val_transforms (torchvision.transforms): Composition of transforms
            including conversion to Tensor, repeating gray-scale image to
            three channel for consistent use with different architectures
            and normalization.
        trainset (torch.utils.data.TensorDataset): Training set wrapper.
        valset (torch.utils.data.TensorDataset): Validation set wrapper.
        train_loader (torch.utils.data.DataLoader): Training set loader with shuffling.
        val_loader (torch.utils.data.DataLoader): Validation set loader.
    """

    def __init__(self, is_gpu, batch_size=128, workers=4, patch_size=28):
        self.num_classes = 10
        self.patch_size = 28

        self.train_transforms, self.val_transforms = self.__get_transforms()

        self.trainset, self.valset = self.get_dataset()
        self.train_loader, self.val_loader = self.get_dataset_loader(batch_size, workers, is_gpu)

    def __get_transforms(self):
        # We can define data transformations by composing a list of operations to execute
        # this list of transformations can be given to the data loader and will be
        # applied at every step of data loading. It is a really convenient way to 
        # implement random operations such as flips, translations, resizing etc. 

        # In below example we simply apply a resizing operation (to resize the 
        # images to whatever resolution is required for our architecture)
        # and then convert the image to a tensor representation.
        train_transforms = transforms.Compose([
            transforms.Resize(size=(self.patch_size, self.patch_size)),
            transforms.ToTensor(),
        ])

        val_transforms = transforms.Compose([
            transforms.Resize(size=(self.patch_size, self.patch_size)),
            transforms.ToTensor(),
        ])

        return train_transforms, val_transforms

    def get_dataset(self):
        """
        Uses torchvision.datasets.KMNIST to load dataset.
        Downloads dataset if doesn't exist already.

        Returns:
             torch.utils.data.TensorDataset: trainset, valset
        """
        trainset = datasets.KMNIST('datasets/KMNIST/train/', train=True, transform=self.train_transforms,
                                   target_transform=None, download=True)
        valset = datasets.KMNIST('datasets/KMNIST/test/', train=False, transform=self.val_transforms,
                                 target_transform=None, download=True)

        return trainset, valset

    def get_dataset_loader(self, batch_size, workers, is_gpu):
        """
        Defines the dataset loader for wrapped dataset

        Parameters:
            batch_size (int): Defines the batch size in data loader
            workers (int): Number of parallel threads to be used by data loader
            is_gpu (bool): True if CUDA is enabled so pin_memory is set to True

        Returns:
             torch.utils.data.DataLoader: train_loader, val_loader
        """

        train_loader = torch.utils.data.DataLoader(
            self.trainset,
            batch_size=batch_size, shuffle=True,
            num_workers=workers, pin_memory=is_gpu, sampler=None)

        val_loader = torch.utils.data.DataLoader(
            self.valset,
            batch_size=batch_size, shuffle=False,
            num_workers=workers, pin_memory=is_gpu)

        return train_loader, val_loader

We have kept the global structure of the class the same. If we take a closer look we can however observe that both the `get_dataset` as well as the `get_dataset_loader` methods have essentially been replaced with single line calls to torchvision. 

We also no longer need to explicitly pre-load the entire dataset as before and convert it tensors. What we can do instead is we can define so called `transforms` that allows us to specify a sequence of operations that are executed on every loaded batch. This way we can trivially implement deterministic transforms such as conversion of images to tensors or stochastic data augmentation (think of random flips or translations to virtually augment the amount of different samples seen during training). 

This data loader will now only load a single mini-batch at a time and save us a lot of memory, which is essential if our dataset is too large to be loaded directly. For efficiency PyTorch has implemented this data loading in a multi-threaded version. 

## Train KMNIST
We can directly use this now to train our models for recognition of ancient Japanese hiragana. 


## Given that you already have all the ingredients from our FashionMNIST example, train the CNN model on KMNIST yourselves

In [None]:
# load KMNIST and train our CNN on it