# Training Engine
This notebook will show you how to use the lightnet engine and related classes to help you setup your training more easily.

When training models with pytorch, there is quite a bit of boilerplate code that needs to be written each time.
The [lightnet.engine](../api/engine.rst) submodule provides functionality to make it easier to setup your training pipelines and allows to easily perform the same training routine on different networks (with different losses, hyperparameters, etc).

In this tutorial, we will discuss the 2 most important classes of this module:

- [HyperParameters](#HyperParameters): This class allows to group together all hyperparameters of a certain model, making it easier to perform the same training on various different models
- [Engine](#Engine): This class provides basic functionality to reduce the boilerplate code needed to train a model. It has some opinionated features that make it easy to split up training in batches and even mini-batches.

Besides these 2 classes, this submodule contains a [LinePlotter](../api/engine.rst#lightnet.engine.LinePlotter) to easily plot data with visdom and a [SchedulerCompositor](../api/engine.rst#lightnet.engine.SchedulerCompositor) that allows to use multiple schedulers throughout your training pipeline.

In [1]:
# Basic imports
import lightnet as ln
import torch
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import brambox as bb

# Settings
ln.logger.setConsoleLevel('ERROR')             # Only show error log messages
bb.logger.setConsoleLevel('ERROR')             # Only show error log messages

## HyperParameters
The [HyperParameters](../api/engine.rst#lightnet.engine.HyperParameters) class is used to store all kinds of different data, and allows to (de)serialize that data to a file for later use.

To start using it, you can simply initialize an object, and give it any keyword argument you want to save.
Once it is created, you can add more parameters by creating new attributes on the fly (syntax: `object.parameter = value`).  
Any attributes that start with an underscore will be marked, so that they are not saved when serializing the data in this class.
They will then be stored without as an attribute without the underscore.

When saving this class, it will loop through all of its attributes (except those that are marked) and check whether that value has a `state_dict()` function.
If it exists, it will store the result of that function instead of the value itself.  
The same happens when loading data, except with the `load_state_dict()` function.

<div class="alert alert-info">

**Note:**

You will see that when this object is created, it contains two extra attributes (_batch_, _epoch_).
These values get used by the Engine (see next section), to keep track of where the training is, and will always be initialized to **0** when creating a HyperParameters object.

</div>

<div class="alert alert-info">

**Note:**

We store the data with a _.state.pt_ extension.
This function internally uses the _torch.save()_ function, justifying the final _.pt_ extension, but the _.state_ part is just a convention we are using to differentiate between _HyperParameter.save()_ and _network.save()_

</div>

In [2]:
# Create parameters object
params = ln.engine.HyperParameters(
    network = ln.models.Yolo(),
    _foo = 10,                   # Will not be stored when saving
)

params.bar = ['abc', 'def']
params._baz = 3.1415            # Will not be stored when saving

# Access and modify parameters
params.bar.append('ghi')
params.baz *= 2

# Show values
print(params)  # Note stars behind values that will not be stored

# Save values
params.save('parameters.state.pt')

HyperParameters(
  bar = ['abc', 'def', 'ghi']
  batch = 0
  baz* = 6.283
  epoch = 0
  foo* = 10
  network = Yolo
)


We can now use the serialized hyperparameters to reload all previous values.
This can be useful to resume an interrupted training.

In [3]:
# Create parameters object
params = ln.engine.HyperParameters(
    network = ln.models.Yolo()
)

# Show a single weight value from network
print(params.network.layers[0][0].layers[1].weight.data[0])

# load previous state
params.load('parameters.state.pt')

# Show same weight value as before
print(params.network.layers[0][0].layers[1].weight.data[0])

# Show values
print(params)  # Note extra bar value

tensor(0.5859)
tensor(0.8571)
HyperParameters(
  bar = ['abc', 'def', 'ghi']
  batch = 0
  epoch = 0
  network = Yolo
)


The HyperParameters class also has a static classmethod [from_file](../api/engine.rst#lightnet.engine.HyperParameters.from_file) which allows to load a HyperParameters object from an external file. This is what allows to easily use the same training pipeline with different parameters, networks, etc. To see how this works, check out the next tutorial where we show a real example of training on the Pascal VOC dataset.

## Engine
The lightnet [Engine](../api/engine.rst#lightnet.engine.Engine) class provides an opinionated framework to reduce the boilerplate code when building up a training pipeline.  
We will again refer to the next tutorial about training on Pascal VOC to see an actual example implementation, but we will quickly go over the basic intended usage and all features this engine offers.

The Engine is an abstract base class ([ABC](https://docs.python.org/3/library/abc.html#abc.ABC)), which means that you are intended to create your own Engine class which inherits from this one and implement a few methods.  
The 2 methods which you are required to implement, are the [process_batch](../api/engine.rst#lightnet.engine.Engine.process_batch) function, which recieves data from your dataloader and should perform the forward and backward passes through your network (and loss), and the [train_batch](../api/engine.rst#lightnet.engine.Engine.train_batch), which should perform the weight update based on the gradients of your parameters.

When initializing your Engine, you need to give it a HyperParameters object, a dataloader _(optional)_ and any other keyword arguments.
During execution of your engine, you can access any attribute of your HyperParameters as if it were an engine attribute.
All keyword arguments passed to the initialization will also be accessible as attributes.
The dataloader will be used to get data for training your network and is what is passed on to the process_batch() function.


<div class="alert alert-info">

**Note:**

The reason the training is split up in 2 functions, is to allow to work with mini-batches.  
Mini-batches allow to emulate much bigger batches than fit on you can fit on your computer's (GPU) memory.

By setting an *engine.mini_batch_size* attribute on the engine, it will call *process_batch()* multiple times before calling *train_batch()*, which effectively means you will accumulate the gradients of the parameters in your network, which is how mini-batches work.

If you do not have such an *engine.mini_batch_size* attribute, it will be set to **1** which boils down to having no mini-batches, because each *process_batch()* call will be followed by a *train_batch()* call.

</div>

In [4]:
# Implement engine
class CustomEngine(ln.engine.Engine):
    def start(self):
        """ Do whatever needs to be done before starting """
        self.params.to(self.device)  # Casting parameters to a certain device
        self.optim.zero_grad()       # Make sure to start with no gradients
        self.loss_acc = []           # Loss accumulator
        
    def process_batch(self, data):
        """ Forward and backward pass """
        data, target = data  # Unpack
        
        output = self.network(data)
        loss = self.loss(output, target)
        loss.backward()
                
        self.loss_acc.append(loss.item())
        
    def train_batch(self):
        """ Weight update and logging """
        self.optim.step()
        self.optim.zero_grad()
        
        batch_loss = sum(self.loss_acc) / len(self.loss_acc)
        self.loss_acc = []
        self.log(f'Loss: {batch_loss}')
        
    def quit(self):
        if self.batch >= self.max_batches:  # Should probably save weights here
            print('Reached end of training')
            return True
        return False
        
# Create HyperParameters
params = ln.engine.HyperParameters(
    network=ln.models.Yolo(),
    mini_batch_size=8,
    batch_size=64,
    max_batches=128
)
params.loss = ln.network.loss.RegionLoss(params.network.num_classes, params.network.anchors)
params.optim = torch.optim.SGD(params.network.parameters(), lr=0.001)

# Create engine
engine = CustomEngine(
    params, None,              # Dataloader (None) is not valid
    device=torch.device('cpu')
)

# Run engine
# engine() -> we will not run this here, as we did not provide a valid dataloader

The Engine class also allows to define hooks.
These are functions that run at various points during the training (*batch_start*, *batch_end*, *epoch_start*, *epoch_end*).
You can either specify them by decorating a function with the correct decorator, or by calling the decorator at runtime for your engine.

In [5]:
class CustomEngine(ln.engine.Engine):
    def start(self):
        # Decide of a hook at runtime (backup_rate gets passed to init function)
        if self.backup_rate > 0:
            self.batch_end(self.backup_rate)(self.backup)  # Call self.backup every after every N batches
            
    def backup(self):
        self.params.save(f'backup-{self.batch}.state.pt')
        
    @ln.engine.Engine.epoch_end()
    def every_epoch(self):
        print('END OF EPOCH')
        
    @ln.engine.Engine.batch_start(10)
    def every_ten_batches(self):
        # Runs every ten batches (batch 10, 20, 30, ...)
        print(f'STARTING BATCH {self.batch+1}')