# Classification with Delira and Chainer - A very short introduction
*Author: Justus Schock* 

*Date: 31.07.2019*

This Example shows how to set up a basic classification model and experiment using Chainer.

Let's first setup the essential hyperparameters. We will use `delira`'s `Parameters`-class for this:

In [1]:
logger = None
import chainer
from delira.training import Parameters
params = Parameters(fixed_params={
    "model": {
        "in_channels": 1, 
        "n_outputs": 10
    },
    "training": {
        "batch_size": 64, # batchsize to use
        "num_epochs": 10, # number of epochs to train
        "optimizer_cls": chainer.optimizers.Adam, # optimization algorithm to use
        "optimizer_params": {'lr': 1e-3}, # initialization parameters for this algorithm
        "losses": {"L1": chainer.functions.mean_absolute_error}, # the loss function
        "lr_sched_cls": None,  # the learning rate scheduling algorithm to use
        "lr_sched_params": {}, # the corresponding initialization parameters
        "metrics": {} # and some evaluation metrics
    }
}) 

  from collections import Iterable
  from google.protobuf.pyext import _message
  _pywrap_tensorflow.RegisterType("Mapping", _collections.Mapping)
  _pywrap_tensorflow.RegisterType("Sequence", _collections.Sequence)
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  class ObjectIdentityDictionary(collections.MutableMapping):
  class ObjectIdentitySet(collections.MutableSet):
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
W0731 14:01:15.852783 27416 deprecation_wrapper.py:11

Since we did not specify any metric, only the `CrossEntropyLoss` will be calculated for each batch. Since we have a classification task, this should be sufficient. We will train our network with a batchsize of 64 by using `Adam` as optimizer of choice.

## Logging and Visualization
To get a visualization of our results, we should monitor them somehow. For logging we will use `Tensorboard`. Per default the logging directory will be the same as our experiment directory.


## Data Preparation
### Loading
Next we will create some fake data. For this we use the `ClassificationFakeData`-Dataset, which is already implemented in `deliravision`. To avoid getting the exact same data from both datasets, we use a random offset.

In [None]:
from deliravision.data.fakedata import ClassificationFakeData
dataset_train = ClassificationFakeData(num_samples=10000, 
                                       img_size=(3, 32, 32), 
                                       num_classes=10)
dataset_val = ClassificationFakeData(num_samples=1000, 
                                     img_size=(3, 32, 32), 
                                     num_classes=10,
                                     rng_offset=10001
                                     )

### Augmentation
For Data-Augmentation we will apply a few transformations:

In [None]:
from batchgenerators.transforms import RandomCropTransform, \
                                        ContrastAugmentationTransform, Compose
from batchgenerators.transforms.spatial_transforms import ResizeTransform
from batchgenerators.transforms.sample_normalization_transforms import MeanStdNormalizationTransform

transforms = Compose([
    RandomCropTransform(24), # Perform Random Crops of Size 200 x 200 pixels
    ResizeTransform(32), # Resample these crops back to 224 x 224 pixels
    ContrastAugmentationTransform(), # randomly adjust contrast
    MeanStdNormalizationTransform(mean=[0.5], std=[0.5])]) 



With these transformations we can now wrap our datasets into datamanagers:

In [None]:
from delira.data_loading import BaseDataManager, SequentialSampler, RandomSampler

manager_train = BaseDataManager(dataset_train, params.nested_get("batch_size"),
                                transforms=transforms,
                                sampler_cls=RandomSampler,
                                n_process_augmentation=4)

manager_val = BaseDataManager(dataset_val, params.nested_get("batch_size"),
                              transforms=transforms,
                              sampler_cls=SequentialSampler,
                              n_process_augmentation=4)


## Model

After we have done that, we can specify our model: We will use a smaller version of a [VGG-Network](https://arxiv.org/pdf/1409.1556.pdf) in this case. We will use more convolutions to reduce the feature dimensionality and reduce the number of units in the linear layers to save up memory (and we only have to deal with 10 classes, not the 1000 imagenet classes).

In [3]:
from delira.models import AbstractChainerNetwork
import chainer
from functools import partial
    
    
class SmallVGGChainer(AbstractChainerNetwork):
    def __init__(self, in_channels, num_classes):
        super().__init__()
        
        self.model = chainer.Sequential(
            chainer.links.Convolution2d(in_channels, 64, 3, padding=1), # 28 x 28
            chainer.functions.relu,
            partial(chainer.functions.max_pooling_2d, ksize=2), # 14 x 14
            chainer.links.Convolution2d(64, 128, 3, padding=1),
            chainer.functions.relu,
            partial(chainer.functions.max_pooling_2d, ksize=2), # 7 x 7
            chainer.links.Convolution2d(128, 256, 3), # 6 x 6
            chainer.functions.relu,
            partial(chainer.functions.max_pooling_2d, ksize=2), # 3 x 3
            chainer.links.Convolution2d(256, 512, 3), # 1 x 1
            chainer.functions.flatten,
            chainer.links.Linear(1*1*512, num_classes)
        )
        
    def forward(self, x):
        return {"pred": self.model(x)}
    
    @staticmethod
    def prepare_batch(data_dict, input_device, output_device):
        new_batch = {k: chainer.as_variable(v.astype(np.float32))
                     for k, v in batch.items()}

        for k, v in new_batch.items():
            if k == "data":
                device = input_device
            else:
                device = output_device

            # makes modification inplace!
            v.to_device(device)

        return new_batch
    
    @staticmethod
    def closure(model, data_dict: dict, optimizers: dict, losses: dict,
                fold=0, **kwargs):

        loss_vals = {}
        metric_vals = {}
        total_loss = 0

        inputs = data_dict["data"]
        preds = model(inputs)

        with chainer.using_config("train", True):
            for key, crit_fn in losses.items():
                _loss_val = crit_fn(preds["pred"], data_dict["label"])
                loss_vals[key] = _loss_val.item()
                total_loss += _loss_val

        model.cleargrads()
        total_loss.backward()
        optimizers['default'].update()
        
        return loss_vals, {k: v.unchain()
                           for k, v in preds.items()}

    
    

So let's evisit, what we have just done.

In `delira` all networks must be derived from `delira.models.AbstractNetwork`. For each backend there is a class derived from this class, handling some backend-specific function calls and registrations. For the `Chainer` Backend this class is `AbstractChainerNetwork` and all Chainer Networks should be derived from it.

First we defined the network itself (this is the part simply concatenating the layers into a sequential model). Next, we defined the logic to apply, when we want to predict from the model (this is the `forward` method).

So far this was plain `Chainer`. The `prepare_batch` function is not plain Chainer anymore, but allows us to ensure the data is in the correct shape, has the correct data-type and lies on the correct device. The function above is the standard `prepare_batch` function, which is also implemented in the `AbstractChainerNetwork` and just re-implemented here for the sake of completeness.

Same goes for the `closure` function. This function defines the update rule for our parameters (and how to calculate the losses). These funcitons are good to go for many simple networks but can be overwritten for customization when training more complex networks.


## Training
Now that we have defined our network, we can finally specify our experiment and run it.

In [None]:
import warnings
warnings.simplefilter("ignore", UserWarning) # ignore UserWarnings raised by dependency code
warnings.simplefilter("ignore", FutureWarning) # ignore FutureWarnings raised by dependency code


from delira.training import ChainerExperiment

if logger is not None:
    logger.info("Init Experiment")
experiment = PyTorchExperiment(params, SmallVGGChainer,
                               name="ClassificationExample",
                               save_path="./tmp/delira_Experiments",
                               key_mapping={"x": "data"}
                               gpu_ids=[0])
experiment.save()

model = experiment.run(manager_train, manager_val)

Congratulations, you have now trained your first Classification Model using `delira`, we will now predict a few samples from the testset to show, that the networks predictions are valid (for now, this is done manually, but we also have a `Predictor` class to automate stuff like this):

In [None]:
import numpy as np
from tqdm.auto import tqdm # utility for progress bars

device = "@numpy"
model = model.to(device) # push model to device
preds, labels = [], []

with torch.no_grad():
    for i in tqdm(range(len(dataset_val))):
        img = dataset_val[i]["data"] # get image from current batch
        img_tensor = torch.from_numpy(img).unsqueeze(0).to(device).to(torch.float) # create a tensor from image, push it to device and add batch dimension
        pred_tensor = model(img_tensor) # feed it through the network
        pred = pred_tensor.argmax(1).item() # get index with maximum class confidence
        label = np.asscalar(dataset_val[i]["label"]) # get label from batch
        if i % 1000 == 0:
            print("Prediction: %d \t label: %d" % (pred, label)) # print result
        preds.append(pred)
        labels.append(label)
        
# calculate accuracy
accuracy = (np.asarray(preds) == np.asarray(labels)).sum() / len(preds)
print("Accuracy: %.3f" % accuracy)