# Classification with Delira and TensorFlow Graph Execution- A very short introduction
*Author: Justus Schock* 

*Date: 31.07.2019*

This Example shows how to set up a basic classification model and experiment using TensorFlow's Graph Execution Mode.

Let's first setup the essential hyperparameters. We will use `delira`'s `Parameters`-class for this:

In [1]:
logger = None
import tensorflow as tf
tf.disable_eager_execution()
from delira.training import Parameters
params = Parameters(fixed_params={
    "model": {
        "in_channels": 1, 
        "n_outputs": 10
    },
    "training": {
        "batch_size": 64, # batchsize to use
        "num_epochs": 10, # number of epochs to train
        "optimizer_cls": tf.train.AdamOptimizer, # optimization algorithm to use
        "optimizer_params": {'lr': 1e-3}, # initialization parameters for this algorithm
        "losses": {"L1": tf.losses.absolute_difference}, # the loss function
        "lr_sched_cls": None,  # the learning rate scheduling algorithm to use
        "lr_sched_params": {}, # the corresponding initialization parameters
        "metrics": {} # and some evaluation metrics
    }
}) 

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  from collections import Iterable
W0731 13:38:30.713174 21496 deprecation_wrapper.py:119] From c:\users\jsc7rng\downloads\delira\delira\models\backends\tf_eager\abstract_network.py:113: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0731 13:38:30.727135 21496 deprecation_wrapper.py:119] From c:\users\jsc7rng\downloads\delira\delira\

Since we did not specify any metric, only the `L1-Loss` will be calculated for each batch. Since this is just a toy example, this should be sufficient. We will train our network with a batchsize of 64 by using `Adam` as optimizer of choice.

## Logging and Visualization
To get a visualization of our results, we should monitor them somehow. For logging we will use `Tensorboard`. Per default the logging directory will be the same as our experiment directory.


## Data Preparation
### Loading
Next we will create some fake data. For this we use the `ClassificationFakeData`-Dataset, which is already implemented in `deliravision`. To avoid getting the exact same data from both datasets, we use a random offset.

In [2]:
from deliravision.data.fakedata import ClassificationFakeData
dataset_train = ClassificationFakeData(num_samples=10000, 
                                       img_size=(3, 32, 32), 
                                       num_classes=10)
dataset_val = ClassificationFakeData(num_samples=1000, 
                                     img_size=(3, 32, 32), 
                                     num_classes=10,
                                     rng_offset=10001
                                     )

ModuleNotFoundError: No module named 'deliravision'

### Augmentation
For Data-Augmentation we will apply a few transformations:

In [None]:
from batchgenerators.transforms import RandomCropTransform, \
                                        ContrastAugmentationTransform, Compose
from batchgenerators.transforms.spatial_transforms import ResizeTransform
from batchgenerators.transforms.sample_normalization_transforms import MeanStdNormalizationTransform

transforms = Compose([
    RandomCropTransform(24), # Perform Random Crops of Size 24 x 24 pixels
    ResizeTransform(32), # Resample these crops back to 32 x 32 pixels
    ContrastAugmentationTransform(), # randomly adjust contrast
    MeanStdNormalizationTransform(mean=[0.5], std=[0.5])]) 



With these transformations we can now wrap our datasets into datamanagers:

In [None]:
from delira.data_loading import BaseDataManager, SequentialSampler, RandomSampler

manager_train = BaseDataManager(dataset_train, params.nested_get("batch_size"),
                                transforms=transforms,
                                sampler_cls=RandomSampler,
                                n_process_augmentation=4)

manager_val = BaseDataManager(dataset_val, params.nested_get("batch_size"),
                              transforms=transforms,
                              sampler_cls=SequentialSampler,
                              n_process_augmentation=4)


## Model

After we have done that, we can specify our model: We will use a smaller version of a [VGG-Network](https://arxiv.org/pdf/1409.1556.pdf) in this case. We will use more convolutions to reduce the feature dimensionality and reduce the number of units in the linear layers to save up memory (and we only have to deal with 10 classes, not the 1000 imagenet classes).

In [3]:
from delira.models import AbstractTfGraphNetwork
import tensorflow as tf
import numpy as np

class SmallVGGTfEager(AbstractTfGraphNetwork):
    def __init__(self, in_channels, num_classes, data_format="channels_last"):
        if data_format == "channels_last":
            input_shape = (32, 32, 3)
        else:
            input_shape = (3, 32, 32)
        super().__init__()
        
        self.model = tf.keras.models.Sequential(
            tf.keras.layers.Conv2d(in_channels, 64, 3, padding=1, input_shape=input_shape), # 32, 32
            tf.keras.layers.ReLU(),
            tf.keras.layers.MaxPool2d(2), # 16 x 16
            tf.keras.layers.Conv2d(128, 3, padding=1),
            tf.keras.layers.ReLU(),
            tf.keras.layers.MaxPool2d(2), # 8 x 8
            tf.keras.layers.Conv2d(256, 3, padding=1),
            tf.keras.layers.ReLU(),
            tf.keras.layers.MaxPool2d(2), # 4 x 4
            tf.keras.layers.Conv2d(512, 3, padding=1),
            tf.keras.layers.ReLU(),
            tf.keras.layers.MaxPool2d(), # 2 x 2
            tf.keras.layers.Conv2d(512, 3, padding=1),
            tf.keras.layers.ReLU(),
            tf.keras.layers.MaxPool2d(), # 1 x 1
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(num_classes),
        )
        
        # create computation graph
        data = tf.placeholder(shape=[None, 32], dtype=tf.float32)
        labels = tf.placeholder_with_default(
                tf.zeros([tf.shape(data)[0], 1]), shape=[None, 1])

        preds_train = self.model(data)
        preds_eval = self.model(data)

        self.inputs["data"] = data
        self.inputs["label"] = labels
        self.outputs_train["pred"] = preds_train
        self.outputs_eval["pred"] = preds_eval
        
    @staticmethod
    def prepare_batch(data_dict, input_device, output_device):
        with tf.device(input_device):
            return_dict = {"data": tf.convert.to.tensor(
                batch["data"].astype(np.float32))}
        
        with tf.device(output_device):
            for key, vals in batch.items():
                if key == "data": 
                    continue
                return_dict[key] = tf.convert_to_tensor(
                    vals.astype(np.float32))

        return return_dict
    
    @staticmethod
    def closure(model, data_dict: dict, optimizers: dict, losses: dict,
                fold=0, **kwargs):

        outputs = model.run(data=inputs, label=data_dict['label'])
        preds = outputs['pred']
        loss_vals = outputs['losses']
        
        return loss_vals, preds
    
    

So let's evisit, what we have just done.

In `delira` all networks must be derived from `delira.models.AbstractNetwork`. For each backend there is a class derived from this class, handling some backend-specific function calls and registrations. For the `Tensorflow Graph` Backend this class is `AbstractTfGraphNetwork` and all TensorFlow Eager Execution Networks should be derived from it.

First we defined the network itself (this is the part simply concatenating the layers into a sequential model). Next, we defined the logic to apply, when we want to predict from the model (this is the `call` method).

So far this was plain `TensorFlow`. The `prepare_batch` function is not plain TF anymore, but allows us to ensure the data is in the correct shape, has the correct data-type and lies on the correct device. The function above is the standard `prepare_batch` function, which is also implemented in the `AbstractTfGraphNetwork` and just re-implemented here for the sake of completeness.

Same goes for the `closure` function. This function defines the update rule for our parameters (and how to calculate the losses). These funcitons are good to go for many simple networks but can be overwritten for customization when training more complex networks.


## Training
Now that we have defined our network, we can finally specify our experiment and run it.

In [None]:
import warnings
warnings.simplefilter("ignore", UserWarning) # ignore UserWarnings raised by dependency code
warnings.simplefilter("ignore", FutureWarning) # ignore FutureWarnings raised by dependency code


from delira.training import TfGraphExperiment

if logger is not None:
    logger.info("Init Experiment")
experiment = TfGraphExperiment(params, SmallVGGTfGraph,
                               name="ClassificationExample",
                               save_path="./tmp/delira_Experiments",
                               key_mapping={"x": "data"}
                               gpu_ids=[0])
experiment.save()

model = experiment.run(manager_train, manager_val)

Congratulations, you have now trained your first Classification Model using `delira`, we will now predict a few samples from the testset to show, that the networks predictions are valid (for now, this is done manually, but we also have a `Predictor` class to automate stuff like this):

In [None]:
import numpy as np
from tqdm.auto import tqdm # utility for progress bars
import tensorflow as tf

device = "/cpu:0"
preds, labels = [], []

with tf.device(device):
    for i in tqdm(range(len(dataset_val))):
        img = dataset_val[i]["data"] # get image from current batch
        img_tensor = tf.convert_to_tensor(img[None, ...].astype(np.float)) # create a tensor from image, push it to device and add batch dimension
        pred_tensor = model(img_tensor) # feed it through the network
        pred = pred_tensor.argmax(1).item() # get index with maximum class confidence
        label = np.asscalar(dataset_val[i]["label"]) # get label from batch
        if i % 1000 == 0:
            print("Prediction: %d \t label: %d" % (pred, label)) # print result
        preds.append(pred)
        labels.append(label)

# calculate accuracy
accuracy = (np.asarray(preds) == np.asarray(labels)).sum() / len(preds)
print("Accuracy: %.3f" % accuracy)