# Overview

The neural network development and training pipeline consists of a series of high-level templated classes that provide key functionality to define and train a deep learning algorithm. To create a new algorithm using this pipeline, one needs simply to overload the appropriate classes with project specific values. Varying levels of customization are possible by choosing to overload as many class methods and/or variables as needed (e.g. or to leave unchanged with default settings).

This tutorial and series of unit tests covers an overview of the neural network development and training pipeline including: 

1. Overloading `cnn.utils.Client()` to define data loading
2. Overloading `cnn.Model()` to define network architecture
3. Instantiating a new `cnn.Network()` object and network training
4. Loading network statistics
5. Running inference

The cnn package for implementing all functionality described here can be imported with a single module: 

In [None]:
import cnn

# Overloading cnn.utils.Client()

The `cnn.utils.Client()` class provides an object for easy interaction with imported data. While it may be used in isolation, defining several key method overloads will prepare this object with the necessary modifications to be used for CNN training. At minimum, the two key methods that need to be overloaded are:

* `self.init_client()`
* `self.get()`

An example overloaded class definition is provided below. Continue reading for more information about considerations for overloading the `client` object.

In [None]:
import numpy as np

class Client(cnn.utils.Client):
    
    def init_client(self):
        """
        Method to set default experiment specific variables
        
        """
        self.infos = {
            'shape': [1, 28, 28],
            'tiles': [0, 0, 0]
        }
        
        self.inputs = {
            'dtypes': {},
            'shapes': {},
            'classes': {}
        }
        
        self.inputs['dtypes']['dat'] = 'float32'
        self.inputs['shapes']['dat'] = [1, 28, 28, 1]
        
        self.inputs['dtypes']['sce-digit'] = 'int32'
        self.inputs['shapes']['sce-digit'] = [1, 1, 1, 1]
        self.inputs['classes']['sce-digit'] = 10
        
    def get(self, mode='mixed'):
        """
        Method to load a single train/valid study
        
        """
        # Load data
        next_doc = self.next_doc(mode=mode, random=True)
        vol = self.load(doc=next_doc['doc'], infos=next_doc['infos'])
        
        # Preprocessing
        vol['dat'] = (vol['dat'] - np.mean(vol['dat'])) / np.std(vol['dat'])
        
        # Manually add labels from MongoDB document
        studyid = next_doc['doc']['study']['studyid']
        vol['lbl'] = int(studyid.split('-')[0]) + 1
        vol['lbl'] = np.array(vol['lbl']).reshape(1, 1, 1, 1)
        
        # Create nested vol dictionary
        vol['lbl'] = {'sce-digit': vol['lbl']}
        
        return self.return_get(next_doc, vol)

## Overloading the self.init_client() method

The `init_client()` method is called upon initialization of a new `Client()` object to set default experiment-specific values for CNN training. The two main dictionaries to define in this method are `self.infos` and `self.inputs`.

### Setting self.infos dictionary

The `infos` dictionary defines how underlying data will be loaded from memory. Please see [02_Database_Access_and_Manipulation](02_Database_Access_and_Manipulation.ipynb#Infos-dictionary) for more information.

### Setting self.inputs dictionary

The `self.inputs` dictionary is used to define key information about inputs into the model, including both input image data and labels. For input images, the corresponding `dtype` and `shapes` (input shape) must be defined. By convention, dictionary key used to define input data is set to `'dat'`.
```
self.inputs['dtypes']['dat'] = 'float32'
self.inputs['shapes']['dat'] = [1, 28, 28, 1]
```
For input labels, the corresponding `dtypes`, `shapes` and `classes` (total number of classes for classification tasks; set to 0 for regression tasks) must be defined. Note that all labels are assumed to be 4D masks with single value labels represented by a matrix of shape (1, 1, 1, 1). 
```
self.inputs['dtypes']['sce-digit'] = 'int32'
self.inputs['shapes']['sce-digit'] = [1, 1, 1, 1]
self.inputs['classes']['sce-digit'] = 10
```
The dictionary key used to define a label must be carefully defined; the library will in fact use the specificiation here to automatically identify logit scores and apply the appropriate loss function without any other user input. To accomplish this, the algorithm assumes that the keys follow a naming convention split into two parts separated by a hypthen (`xxx-xxxx`). The first three letters before the hypthen indicate the type of loss function to apply to this label. The available loss functions include:

* `sce`: sigmoid cross-entropy
* `dsc`: soft Dice score
* `l1d`: L1 distance
* `l2d`: L2 distance
* `sl1:` smooth L1 loss (Huber)

The second half of the key after the hypthen can be any descriptive label as long as the keys are consistent. Note that the keys chosen here must match the keys used in the `self.get()` method below, and potentially in the `cnn.Model()` class below if customizations are used.

## Overloading the self.get() method

The `self.get()` method is called during each training iteration to get data prepared for feeding into a CNN. As documented in [02_Database_Access_and_Manipulation](02_Database_Access_and_Manipulation.ipynb#Loading-data), the easiest way to accomplish this is to simply use the `self.next_doc()` method to pick a random MongoDB document matching the prespecified criteria in app_context, and then feeding the document into `self.load()`. Given that the `self.load()` function will only load 2D or 3D mask volume files into the default `vol` dictionary, any 1D conventional classification label if present should be extracted from the corresponding MongoDB document (`next_doc['doc']`) manually. 

During the `self.get()` method, any number of preprocessing steps may also be included. Note that at this point the tensors remain as Numpy (not Tensorflow) arrays, making a number of preprocessing pipelines easy to implement.

There are two data structures that must be returned at the end of this call and passed into the `self.return_get()` method. The first is the `next_doc` dictionary containing the MongoDB document as well as some related metadata. This is generated automatically as part of the call to `self.next_doc()`. The second is a nested dictionary, vol:

```
vol = {
    'dat': (NumPy array)
    'msk': msk,
    'lbl': lbl
}
```
Note that while the `dat` entry is assumed to contain one input volume, the `lbl` (and `msk`) entries can potentially contain more than one label (and mask). Because of this, while `dat` simply contains a single Numpy array, both `msk` and `lbl` contain dictionaries with a number of potential masks and labels specified by a corresponding key (that matches the same key defined above in `init_inputs()`.

Here, `vol['msk']` references a dictionary containing special a mask(s) equal in shape to the label. At all locations where mask is 0, the loss will be masked and not contribute to backpropogation calculations. By default, the mask will be set to 1 (True) for all pixels (or voxels). 
```
msk = {
    'lbl-key00': ...,
    'lbl-key01': ..., etc
}
```
Here, `vol['lbl']` references a dictionary containing the label. Note that by convention, any label with a value of 0 is ignored (reserved for missing data); thus the first class in your label output should be labeled 1, the second class 2, etc.
```
lbl = {
    'lbl-key00': ...,
    'lbl-key01': ..., etc
}
```

# Overloading cnn.Model()

The `cnn.Model()` class provides an object for easy creation and neural network model architectures. Defining several key method overloads will prepare this object with the necessary modifications to be used for CNN training. At minimum, the two key methods that need to be overloaded are:

* `self.init_hyperparams_custom()`
* custom network definition function 

An example overloaded class definition is provided below. Continue reading for more information about considerations for overloading the `model` object.

In [None]:
class Model(cnn.ClassicModel):
    
    def init_hyperparams_custom(self):
        
        self.params['save_dir'] = 'exps/exp01' 
        self.params['batch_size'] = 128
        self.params['iterations'] = 200
        
        self.params['lnet_fn'] = self.create_lnet
        
        self.params['train_ratio'] = {'lnet': 1}
        self.params['learning_rate'] = {'lnet': 1e-3}
        
        self.params['stats_top_model_source'] = {
            'name': 'lnet',
            'node': 'errors',
            'target': 'sce-digit',
            'key': 'topk'
        }
    
    def create_lnet(self, inputs):
        
        nn_struct = {}
        
        nn_struct['channels_out'] = [
            16, 16,
            32, 32]
        
        nn_struct['filter_size'] = [
            [1, 7, 7], [1, 2, 2],
            [1, 7, 7], [1, 2, 2]]
        
        nn_struct['stride'] = [
            1, [1, 2, 2],
            1, [1, 2, 2]]
        
        self.builder.create(
            name='L',
            nn_struct=nn_struct,
            input_layer=inputs[0])
        
        self.builder.graph.output['layer'] = self.flatten(self.builder.graph.output['layer'])
        self.create_logits(name='L')

## Overloading the self.init_hyperparams_custom() method

To define model hyperparameters, overload the `init_hyperparams_custom()` method. Some of the common hyperparameters (with default values) are shown here:

```
self.params = {
    'save_dir': None,               # directory for saving model and metadata
    'iterations': 1e6,              # number of training iterations
    'batch_size': 1,                # batch size
    'learning_rate': None,          # learning rate; no default value (must be set)
    'train_ratio': None,            # ratio at which to train different subnetworks; no default value 
    'optimizer': None,              # optimizer type; by default Adam will be used
    'adam_b1': 0.5,                 # b1 for Adam optimizer
    'adam_b2': 0.999,               # b2 for Adam optimizer
    'l2_beta': 0,                   # lambda constant for L2 regularization
}
```

One key concept to note here is the design choice of *subnetworks*. As needed, defining multiple individual subnetworks using specific conventions (e.g. `lnet`, `enet`, `gnet`, etc) will allow the `cnn` library to orchestrate larger, more complex architectures automatically, and to coordinate the training of each component at specific ratios (specified in the `train_ratio` dictionary entry) and individual learning rates. However the vast majority of standard single-pass feed-forward architectures (classification, U-net, etc) will simply be implemented as just a single *subnetwork*. For standard classification algorithms (VGG, ResNet, Inception, etc) use `lnet` and for fully-convolutional expanding-contracting architectures (U-net, etc) use `enet`. The training ratio for these simple single subnetwork architectures is just `{'lnet': 1}` or `{'enet': 1}` indicating that no special ratio is needed. The corresponding `{'lnet_fn': _}` or `{'enet_fn': _}` simply indicates the particular model architecture function, defined below in the same class, to be used (allows a number of different architecture permutations to be defined in a single template).

## Defining the model architecture

Models are created using the built-in `self.builder` object. To use the `self.builder.create()` three different parameters must be defined. The most important is the `nn_struct` dictionary which defines the structure of the network. This structure is composed predominantly as a series of lists, with each entry in the list corresponding to a single layer in the neural network. For example the first three layers of a CNN may be defined as follows:
```
nn_struct = {
    'channels_out': [16, 32, 64...],
    'filter_size': [[1, 3, 3], [1, 3, 3], [1, 3, 3]...],
    'stride': [1, 1, 1...]
 }
```

In this particular specification, we defined a total of 3 layers, each consisting of 1x3x3 (essentially 2D) convolutional filters with output feature maps 16, 32 and 64 and with a stride of 1. Note that the input channel sizes are calculated automatically. By default, each of these convolutions will be also followed by a batch normalization operation and a ReLU nonlinearity unless otherwise specified. Some of the most common layer specifications are shown here in the order of implementation within a single layer block:
```
nn_struct = {
    'add_preblock': [...],        # name of layer to add before conv (residual connection); default is None
    'filter_size': [...],         # filter sizes (specify 3D filters of size [Z, H, W]); no default
    'resize': [...],              # perform nearest neighbor resize (specify feature map of size [Z, H, W]); default is None
    'batch_norm': [...],          # True to include; default is True
    'add_postblock': [...],       # name of layer to add after conv (residual connection); default is None
    'relu': [...],                # use 1 for ReLU, values <1 for leaky ReLU; default is 1
    'dropout': [...],             # [0, 1] for rate_to_keep; default is NOne
}
```
Note that for each of these options, a value of `None` will ignore this specific layer component.

# Training a Network

To train a network, we use the `cnn.Network` class. While a number of custom modifications may be applied, the default `Network` class will often suffice for common CNN implementations. After initializing a new `Network` class, simply attach your custom class definitions to the object.

In [None]:
# Define app_context
app_context = cnn.db.init_app_context({'name': 'mnist_test'})
app_context['tags-series'] = ['mnist-all']

# Initialize network
net = cnn.Network()
net.Client = Client
net.Model = Model

Next, we need to initialize (build) the network. In this same step, the library will also inspect your entire database for your requested input data and labels and prepare stratified sampling strategies as needed. The `initialize()` call requires the two required arguments are `app_context` (as defined above) and the fold you wish to set as the validation fold (usually start with 0 and cycle through all the other folds iteratively).

In [None]:
net.initialize(
    app_context=app_context,
    fold=0)

Now that your graph has been built and compiled by the library, you are ready to train! Doing so requires a simple parameterless call to `train()`.

In [None]:
net.train()

In the default training paradigm implemented by the `cnn` library, training and validation sets are evaluated simultaneously for real-time monitoring of current training dynamics at any given time point. All individual components of the loss function are reported, as are all defined errror metrics (e.g. top-K for classification, Dice score for segmentation, etc). 

## Multithreaded Training

A significant bottleneck to CNN training is loading data into memory (and subsequently GPU) for training. Given the single-threaded nature of Python, typically the data loading process for the next iteration does not begin to occur until the current training iteration has completed. To use a custom asynchronous load function to significantly increase training speed, pass three additional parameters to the `initialize()` call: threads, batch and capacity:

* threads: number of separate independent threads to use (consider the # of total CPU threads available on your machine)
* batch: total number of exams to be loaded by each thread at a time in a single batch
* capacity: total number of studies to be pre-loaded in the queue

Note that the number of threads x batch should a multiple of the training batch size (minimum 2 to 3 times greater) otherwise one iteration of the asynchronous processs will not load enough data for a single pass through the network. The following default parameters are reasonable for our starting batch size of 16 images.

In [None]:
# Define app_context
app_context = cnn.db.init_app_context({'name': 'mnist_test'})
app_context['tags-series'] = ['mnist-all']

# Initialize network
net = cnn.Network()
net.Client = Client
net.Model = Model
net.initialize(
    app_context=app_context,
    fold=0,
    threads=8,
    capacity=32)

# Train network
net.train()

## Training statistics

All training statistics are stored in a Pickled dictionary file. This file can be easily loaded and viewed using the `cnn.Viewer` class. To do so, simply pass the training directory to the initial class instantiation. To view training dynamcis over time, simply pass the subnetwork name (`lnet`, `enet`, etc), node (either `errors` or `losses`), target / label. For losses, no other information is required because the error type is defined by the label itself (e.g. `sce-` for sigmoid cross entropy). For errors, an error key is needed given that a number of possible error metrics may exist for a given label.

While the current Python kernel is engaged in algorithm training, it may be worthwhile to load a second kernel (new Jupyter notebook, new Python shell, etc) to concurrently graph training loss / error curves over time. 

In [None]:
viewer = cnn.Viewer('./exps/classic/exp01')
viewer.graph(name='lnet', node='errors', target='sce-tumor', key='topk')