# <font style="color:blue">Modularizing the Training Pipeline</font>


In this Notebook, we will take one more look at the training of the LeNet architecture with the MNIST dataset. But this time, we will not discuss the details of the network or training parameters. Instead, we will work on the training code itself and will try to develop some tools that will save time in the future.

The primary purpose of this notebook is to make you familiar with the idea behind the Trainer helper class. We're going to use this Trainer class for the rest of the course. The training pipeline is similar to what we did for LeNet in the last week, but we will re-organize it in a bit different way.

#  <font style="color:blue">Good Practices for Research and Development: A Brief Overview</font>

So now we are already able to build and train simple neural networks. And as you've seen, this training process involves lots of steps. Each of these steps has its parameters. For example, we can define a different batch size for training and validation and for sure, we use different datasets for these phases. On the lower level, we can use different initialization techniques for the different parts of our model and so on.

As you probably know, we consider the research as a good one if and only if its results are reproducible. Indeed, if you claim that something is real (for example, that this specific model is capable of showing this particular results using that dataset), it should be universally true - and thus another researcher should be able to achieve the same results.

The issue here is that with so many parameters involved, it's hard not to miss something important, something that will spoil the results, make them too good (e.g., doing the validation using the training data), or just not reproducible. And sooner or later, not only will the others have issues trying to get the same results, but you will be stuck trying to get the same performance you reported a year ago.

On one hand, the research community met with this kind of issues some time ago and developed some practices that prevent bad research from being published, like peer review, or blind testing of hypothesis. But now, as lots and lots of research involves doing some simulations, or complex results processing, these techniques are not enough.

On the other hand, the software developers' community also had similar issues. Software systems tend to have multiple places that are subject to the same modification. Typically we say that having the same code in several areas of the software system is a bad practice. The reason is the same as we discussed before with the experiments: having so many parts/parameters to handle in case of modification is error-prone - it's just too simple to miss the change here or there and put the whole system into an inconsistent state.

So the developers' community created a set of simple "rules" that being applied, typically lead you to the better design of the system and it seems that these rules are general enough to be used for research projects: after all, they are software projects, too.

There are many and many of these principles, but let us go through a couple that seem to be the most applicable to our research projects:

> From Wikipedia:
> - "Do not repeat yourself (DRY principle) is a principle of software development aimed at reducing repetition of > software patterns replacing it with abstractions"
>     - The DRY principle is stated as "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system"
> - Rule of three ("Three strikes, and you refactor") is a code refactoring rule of thumb to decide when similar pieces of code should be refactored to avoid duplication.
>     - It states that two instances of same code don't require refactoring, but when similar code is used three times, it should be extracted into a new procedure. The rule was popularised by Martin Fowler in Refactoring and attributed to Don Roberts.
> - Single responsibility principle
>     - A class should only have a single responsibility, that is, only changes to one part of the software's specification should be able to affect the specification of the class.

If you want to learn more about these practices and principles, we recommend you to read the excellent "Pragmatic programmer" book - it is short, but it may boost the productivity of your research a lot.
Let's verify our training code against these principles.

#  <font style="color:blue">Training Code: Software Development Principles Check</font>

Let's discuss the training code from the software development point of view. Now we have more or less plain code structure - yes, we have several functions, but in general, our pipeline can be represented with the image below:

---

<img src='https://www.learnopencv.com/wp-content/uploads/2020/02/c3-w6-model_training_pipeline.svg'>

---

It is a "white box" - we put data to it; it does something we've specified and provides the trained model as the output. To change the behavior, we should deal with the code inside the "Model Training Pipeline" box. What we'd like to have, according to the principles I've listed above, is a more flexible system, a system that we can control via parameters, not via changing its code. So let's try to achieve this goal and take a look at what do we have inside the model training pipeline. Can we see any logical blocks there? Maybe some parts of the code are independent of each other (have separate areas of responsibility)?

---
<img src='https://www.learnopencv.com/wp-content/uploads/2020/02/c3-w6-model_training_pipeline_subcomponents.svg'>

---

Typically, we have the following blocks in the code:

- the model itself (LeNet in our case)
- the dataset
- train loop
- validation loop
- visualization of the results

##  <font style="color:green">Extracting the Model</font>

It seems logical that we can train the same model using different datasets, and we can train different models using the same dataset.
Indeed, we see the case of training different models on the same data each time we read the scientific paper or discuss a new DL architecture - if we're talking about the classification models, it is training on the ImageNet dataset. And regarding training the same model on different datasets - it's just the reason why DL is so popular - we can take an architecture of the model from the paper and apply it to our dataset, right?

So, our scheme now looks a bit different:

---

<img src='https://www.learnopencv.com/wp-content/uploads/2020/02/c3-w6-extract_model.svg'>

---

We've extracted the model as a "plug-n-play" block of the training pipeline. And if we take a look at the code, we already have everything for it - we've specified a class for our model, and our `train` function takes the model as a parameter.

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">class</span> <span style="color: #BB0066; font-weight: bold">LeNet5</span>(nn<span style="color: #333333">.</span>Module):
    <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">__init__</span>(<span style="color: #007020">self</span>):
        <span style="color: #007020">super</span>()<span style="color: #333333">.</span>__init__()
<span/>
        <span style="color: #007020">self</span><span style="color: #333333">.</span>_body <span style="color: #333333">=</span> nn<span style="color: #333333">.</span>Sequential(
            nn<span style="color: #333333">.</span>Conv2d(in_channels<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">1</span>, out_channels<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">6</span>, kernel_size<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">5</span>),
            nn<span style="color: #333333">.</span>ReLU(inplace<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">True</span>),
            nn<span style="color: #333333">.</span>MaxPool2d(kernel_size<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">2</span>),
            nn<span style="color: #333333">.</span>Conv2d(in_channels<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">6</span>, out_channels<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">16</span>, kernel_size<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">5</span>),
            nn<span style="color: #333333">.</span>ReLU(inplace<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">True</span>),
            nn<span style="color: #333333">.</span>MaxPool2d(kernel_size<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">2</span>),
        )
        <span style="color: #007020">self</span><span style="color: #333333">.</span>_head <span style="color: #333333">=</span> nn<span style="color: #333333">.</span>Sequential(
            nn<span style="color: #333333">.</span>Linear(in_features<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">16</span> <span style="color: #333333">*</span> <span style="color: #0000DD; font-weight: bold">5</span> <span style="color: #333333">*</span> <span style="color: #0000DD; font-weight: bold">5</span>, out_features<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">120</span>),
            nn<span style="color: #333333">.</span>ReLU(inplace<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">True</span>),
            nn<span style="color: #333333">.</span>Linear(in_features<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">120</span>, out_features<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">84</span>),
            nn<span style="color: #333333">.</span>ReLU(inplace<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">True</span>),
            nn<span style="color: #333333">.</span>Linear(in_features<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">84</span>, out_features<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">10</span>)
        )
<span/>
    <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">forward</span>(<span style="color: #007020">self</span>, x):
        x <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>_body(x)
        x <span style="color: #333333">=</span> x<span style="color: #333333">.</span>view(x<span style="color: #333333">.</span>size()[<span style="color: #0000DD; font-weight: bold">0</span>], <span style="color: #333333">-</span><span style="color: #0000DD; font-weight: bold">1</span>)
        x <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>_head(x)
        <span style="color: #008800; font-weight: bold">return</span> x
</pre></div>

###  <font style="color:magenta">Develop the Interface for the Train Function</font>

Let's define a template for the training function. We know that it should take our Model:

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">train</span>(
    model: nn<span style="color: #333333">.</span>Module, <span style="color: #888888"># out model</span>
    <span style="color: #333333">*</span>args <span style="color: #888888"># the rest of the arguments</span>
) <span style="color: #333333">-&gt;</span> <span style="color: #008800; font-weight: bold">None</span>:
    <span style="color: #008800; font-weight: bold">pass</span>
</pre></div>

##  <font style="color:green">Extracing the Optimizer</font>

We can also see that the `Optimizer` is agnostic to the dataset and the model it deals with - it should only know the parameters of the models (as a list of tensors), their gradients, and the metric to optimize. What we see in the interface of the optimizer is that it takes the models' parameters and has a set of its own parameters, nothing more. So its configuration can be easily moved out of the training pipeline, and we can pass it as a parameter - and we do, actually!

---

<img src='https://www.learnopencv.com/wp-content/uploads/2020/02/c3-w6-extract_model_and_optimizer.svg'>

---

###  <font style="color:magenta">Develop the Interface for the Train Function</font>

So the `train` function should look like this:

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">train</span>(
    model: nn<span style="color: #333333">.</span>Module, <span style="color: #888888"># our model</span>
    optimizer: torch<span style="color: #333333">.</span>optim<span style="color: #333333">.</span>Optimizer, <span style="color: #888888"># our optimizer</span>
    <span style="color: #333333">*</span>args <span style="color: #888888"># the rest of the arguments</span>
) <span style="color: #333333">-&gt;</span> <span style="color: #008800; font-weight: bold">None</span>:
    <span style="color: #008800; font-weight: bold">pass</span>
</pre></div>


## <font style="color:green">Extracting Visualization</font>

Also, visualization probably does not depend on how do we train the model and the model - it only should know what number to show and what label to associate with this number (like loss value, or quality value, and so on). So as far as we visualize some numerical values, we do not depend on the rest of the code.

---
<img src='https://www.learnopencv.com/wp-content/uploads/2020/02/c3-w6-extract_visualization.svg'>

---

###  <font style="color:magenta">Develop the Interface for the Train Function</font>

So the interface for the `train` should be updated again.

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">train</span>(
    model: nn<span style="color: #333333">.</span>Module, <span style="color: #888888"># our model</span>
    optimizer: torch<span style="color: #333333">.</span>optim<span style="color: #333333">.</span>Optimizer, <span style="color: #888888"># our optimizer</span>
    visualizer, <span style="color: #888888"># our visualization tool</span>
    <span style="color: #333333">*</span>args <span style="color: #888888"># the rest of the arguments</span>
) <span style="color: #333333">-&gt;</span> <span style="color: #008800; font-weight: bold">None</span>:
    <span style="color: #008800; font-weight: bold">pass</span>
</pre></div>


In the previous practice we've introduced the following code for visualization:

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #888888"># Plot loss</span>
plt<span style="color: #333333">.</span>rcParams[<span style="background-color: #fff0f0">&quot;figure.figsize&quot;</span>] <span style="color: #333333">=</span> (<span style="color: #0000DD; font-weight: bold">10</span>, <span style="color: #0000DD; font-weight: bold">6</span>)
x <span style="color: #333333">=</span> <span style="color: #007020">range</span>(<span style="color: #007020">len</span>(epoch_train_loss))
<span/>
plt<span style="color: #333333">.</span>figure
plt<span style="color: #333333">.</span>plot(x, epoch_train_loss, color<span style="color: #333333">=</span><span style="background-color: #fff0f0">&#39;r&#39;</span>, label<span style="color: #333333">=</span><span style="background-color: #fff0f0">&quot;train loss&quot;</span>)
plt<span style="color: #333333">.</span>plot(x, epoch_test_loss, color<span style="color: #333333">=</span><span style="background-color: #fff0f0">&#39;b&#39;</span>, label<span style="color: #333333">=</span><span style="background-color: #fff0f0">&quot;validation loss&quot;</span>)
plt<span style="color: #333333">.</span>xlabel(<span style="background-color: #fff0f0">&#39;epoch no.&#39;</span>)
plt<span style="color: #333333">.</span>ylabel(<span style="background-color: #fff0f0">&#39;loss&#39;</span>)
plt<span style="color: #333333">.</span>legend(loc<span style="color: #333333">=</span><span style="background-color: #fff0f0">&#39;upper right&#39;</span>)
plt<span style="color: #333333">.</span>title(<span style="background-color: #fff0f0">&#39;Training and Validation Loss&#39;</span>)
plt<span style="color: #333333">.</span>show()
</pre></div>

It works after the training is finished. But potentially, we can do the visualization in an online manner - after each epoch or even after each batch. More than that, we can use different visualization tools - matplotlib, tensorboard, or even printing the corresponding values to the standard output - we can name it "visualization," too.

Let's try to design an interface for the class that can do the visualization in online mode.

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">class</span> <span style="color: #BB0066; font-weight: bold">Visualizer</span>:
    <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">__init__</span>(<span style="color: #007020">self</span>):
        <span style="color: #007020">self</span><span style="color: #333333">.</span>_epochs <span style="color: #333333">=</span> []
        <span style="color: #007020">self</span><span style="color: #333333">.</span>_metrics <span style="color: #333333">=</span> defaultdict(<span style="color: #007020">list</span>)
<span/>
    <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">plot</span>(<span style="color: #007020">self</span>):
        <span style="color: #008800; font-weight: bold">for</span> key, value <span style="color: #000000; font-weight: bold">in</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>_metrics<span style="color: #333333">.</span>items():
            <span style="color: #007020">self</span><span style="color: #333333">.</span>_plot_metric(key, value)
<span/>
    <span style="color: #555555; font-weight: bold">@abstractmethod</span>
    <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">_plot_metric</span>(<span style="color: #007020">self</span>, metric_name, metric_values):
        <span style="color: #888888"># do the visualization</span>
        <span style="color: #008800; font-weight: bold">pass</span>
<span/>
    <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">update_metrics</span>(<span style="color: #007020">self</span>, name, value, epoch):
        <span style="color: #007020">self</span><span style="color: #333333">.</span>_metrics[name]<span style="color: #333333">.</span>append(value)
        <span style="color: #007020">self</span><span style="color: #333333">.</span>_epochs<span style="color: #333333">.</span>append(epoch)
</pre></div>


**What do we do here?**

We assume that the training/validation loop takes the Visualizer object to send updates on the metric values to it. Which metric is in use is not the area of responsibility of this class - all it should know is that each metric has a name, a value, and an associated epoch index so that we can draw it as a chart. We may omit the epoch index, but it will make things go wrong if we do not start from the epoch 0 (e.g., continue an interrupted training)

## <font style="color:green">Extracting the Dataset wrapper</font>

The next fruit for our code preparation discussion is the Dataset. As we've decided, we can train the same models using different datasets, so the Dataset class should also be the parameter of the training pipeline. Let's do it.

What's great for us is that Pytorch provides the Dataset and Dataloader abstraction, which are very powerful. Using them, we can not only extract the Dataset as a parameter of the training pipeline but also pre-configure it with the number of workers to load the data, the data transformations to use, and so on. This is what we've done before. We just stress the idea behind these classes - we separate the things that tend to change fast (model, Dataset, visualization) from the things that are typically the same among the DL workloads (training and validation loops).

---
<img src='https://www.learnopencv.com/wp-content/uploads/2020/02/c3-w6-extract_dataset.svg'>

---
<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">get_data</span>(batch_size, data_root<span style="color: #333333">=</span><span style="background-color: #fff0f0">&#39;data&#39;</span>, num_workers<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">1</span>):
<span/>
    train_test_transforms <span style="color: #333333">=</span> transforms<span style="color: #333333">.</span>Compose([
        <span style="color: #888888"># Resize to 32X32</span>
        transforms<span style="color: #333333">.</span>Resize((<span style="color: #0000DD; font-weight: bold">32</span>, <span style="color: #0000DD; font-weight: bold">32</span>)),
        <span style="color: #888888"># this re-scales image tensor values between 0-1. image_tensor /= 255</span>
        transforms<span style="color: #333333">.</span>ToTensor(),
        <span style="color: #888888"># subtract mean (0.1307) and divide by variance (0.3081).</span>
        <span style="color: #888888"># This mean and variance is calculated on training data (verify yourself)</span>
        transforms<span style="color: #333333">.</span>Normalize((<span style="color: #6600EE; font-weight: bold">0.1307</span>, ), (<span style="color: #6600EE; font-weight: bold">0.3081</span>, ))
    ])
<span/>
    <span style="color: #888888"># train dataloader</span>
    train_loader <span style="color: #333333">=</span> torch<span style="color: #333333">.</span>utils<span style="color: #333333">.</span>data<span style="color: #333333">.</span>DataLoader(
        datasets<span style="color: #333333">.</span>MNIST(root<span style="color: #333333">=</span>data_root, train<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">True</span>, download<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">True</span>, transform<span style="color: #333333">=</span>train_test_transforms),
        batch_size<span style="color: #333333">=</span>batch_size,
        shuffle<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">True</span>,
        num_workers<span style="color: #333333">=</span>num_workers
    )
<span/>
    <span style="color: #888888"># test dataloader</span>
    test_loader <span style="color: #333333">=</span> torch<span style="color: #333333">.</span>utils<span style="color: #333333">.</span>data<span style="color: #333333">.</span>DataLoader(
        datasets<span style="color: #333333">.</span>MNIST(root<span style="color: #333333">=</span>data_root, train<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">False</span>, download<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">True</span>, transform<span style="color: #333333">=</span>train_test_transforms),
        batch_size<span style="color: #333333">=</span>batch_size,
        shuffle<span style="color: #333333">=</span><span style="color: #008800; font-weight: bold">False</span>,
        num_workers<span style="color: #333333">=</span>num_workers
    )
    <span style="color: #008800; font-weight: bold">return</span> train_loader, test_loader
</pre></div>

###  <font style="color:magenta">Develop the Interface for the Train Function</font>

Now, as we've moved almost all the modules of the system out of the training pipeline and converted them to the pluggable modules, we can look at the core of the training pipeline - training and validation loops. Let's look at their current implementations. We've already mentioned the `train` function interface, so let's start with this function. In the previous practice, we had the following code:

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">train</span>(
    device: Any(torch<span style="color: #333333">.</span>device, string),
    log_interval: <span style="color: #007020">int</span>,
    model: nn<span style="color: #333333">.</span>Module,
    optimizer: torch<span style="color: #333333">.</span>optim<span style="color: #333333">.</span>Optimizer,
    train_loader: torch<span style="color: #333333">.</span>utils<span style="color: #333333">.</span>data<span style="color: #333333">.</span>DataLoader,
    epoch_idx: <span style="color: #007020">int</span>,
    visualizer
) <span style="color: #333333">-&gt;</span> <span style="color: #008800; font-weight: bold">None</span>:
<span/>
    <span style="color: #888888"># change model in training mode</span>
    model<span style="color: #333333">.</span>train()
<span/>
    <span style="color: #888888"># to get batch loss</span>
    batch_loss <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array([])
<span/>
    <span style="color: #888888"># to get batch accuracy</span>
    batch_acc <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array([])
<span/>
    <span style="color: #008800; font-weight: bold">for</span> batch_idx, (data, target) <span style="color: #000000; font-weight: bold">in</span> <span style="color: #007020">enumerate</span>(train_loader):
<span/>
        <span style="color: #888888"># clone target</span>
        indx_target <span style="color: #333333">=</span> target<span style="color: #333333">.</span>clone()
        <span style="color: #888888"># send data to device (it is mandatory if GPU has to be used)</span>
        data <span style="color: #333333">=</span> data<span style="color: #333333">.</span>to(device)
        <span style="color: #888888"># send target to device</span>
        target <span style="color: #333333">=</span> target<span style="color: #333333">.</span>to(device)
<span/>
        <span style="color: #888888"># reset parameters gradient to zero</span>
        optimizer<span style="color: #333333">.</span>zero_grad()
<span/>
        <span style="color: #888888"># forward pass to the model</span>
        output <span style="color: #333333">=</span> model(data)
<span/>
        <span style="color: #888888"># cross entropy loss</span>
        loss <span style="color: #333333">=</span> F<span style="color: #333333">.</span>cross_entropy(output, target)
<span/>
        <span style="color: #888888"># find gradients w.r.t training parameters</span>
        loss<span style="color: #333333">.</span>backward()
        <span style="color: #888888"># Update parameters using gradients</span>
        optimizer<span style="color: #333333">.</span>step()
<span/>
        batch_loss <span style="color: #333333">=</span> np<span style="color: #333333">.</span>append(batch_loss, [loss<span style="color: #333333">.</span>item()])
<span/>
        <span style="color: #888888"># get probability score using softmax</span>
        prob <span style="color: #333333">=</span> F<span style="color: #333333">.</span>softmax(output, dim<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">1</span>)
<span/>
        <span style="color: #888888"># get the index of the max probability</span>
        pred <span style="color: #333333">=</span> prob<span style="color: #333333">.</span>data<span style="color: #333333">.</span>max(dim<span style="color: #333333">=</span><span style="color: #0000DD; font-weight: bold">1</span>)[<span style="color: #0000DD; font-weight: bold">1</span>]
<span/>
        <span style="color: #888888"># correct prediction</span>
        correct <span style="color: #333333">=</span> pred<span style="color: #333333">.</span>cpu()<span style="color: #333333">.</span>eq(indx_target)<span style="color: #333333">.</span>sum()
<span/>
        <span style="color: #888888"># accuracy</span>
        acc <span style="color: #333333">=</span> <span style="color: #007020">float</span>(correct) <span style="color: #333333">/</span> <span style="color: #007020">float</span>(<span style="color: #007020">len</span>(data))
<span/>
        batch_acc <span style="color: #333333">=</span> np<span style="color: #333333">.</span>append(batch_acc, [acc])
<span/>
        <span style="color: #008800; font-weight: bold">if</span> batch_idx <span style="color: #333333">%</span> log_interval <span style="color: #333333">==</span> <span style="color: #0000DD; font-weight: bold">0</span> <span style="color: #000000; font-weight: bold">and</span> batch_idx <span style="color: #333333">&gt;</span> <span style="color: #0000DD; font-weight: bold">0</span>:
            <span style="color: #007020">print</span>(
                <span style="background-color: #fff0f0">&#39;Train Epoch: {} [{}/{}] Loss: {:.6f} Acc: {:.4f}&#39;</span><span style="color: #333333">.</span>format(
                    epoch_idx, batch_idx <span style="color: #333333">*</span> <span style="color: #007020">len</span>(data), <span style="color: #007020">len</span>(train_loader<span style="color: #333333">.</span>dataset), loss<span style="color: #333333">.</span>item(), acc
                )
            )
<span/>
    epoch_loss <span style="color: #333333">=</span> batch_loss<span style="color: #333333">.</span>mean()
    epoch_acc <span style="color: #333333">=</span> batch_acc<span style="color: #333333">.</span>mean()
    <span style="color: #008800; font-weight: bold">return</span> epoch_loss, epoch_acc
</pre></div>

## <font style="color:green">Extracting the Loss Function and Quality Metric</font>

What can we say about this train function? Hm, it seems to be agnostic to the model we pass and to the data. But we are restricted with the task - we can only train classification models using cross-entropy loss and accuracy as a metric of quality. But why should we restrict ourselves with these settings? Let's follow the same procedure as we've done with the model and other modules and make loss and quality metric configurable.


<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">train</span>(
    device: Any(torch<span style="color: #333333">.</span>device, string),
    log_interval: <span style="color: #007020">int</span>,
    model: nn<span style="color: #333333">.</span>Module,
    optimizer: torch<span style="color: #333333">.</span>optim<span style="color: #333333">.</span>Optimizer,
    train_loader: torch<span style="color: #333333">.</span>utils<span style="color: #333333">.</span>data<span style="color: #333333">.</span>DataLoader,
    loss_function: Callable, <span style="color: #888888"># loss function</span>
    quality_estimator, <span style="color: #888888"># accuracy or other metric calculator</span>
    visualizer,
    epoch_idx: <span style="color: #007020">int</span>
) <span style="color: #333333">-&gt;</span> <span style="color: #008800; font-weight: bold">None</span>:
<span/>
    <span style="color: #888888"># change model in training mode</span>
    model<span style="color: #333333">.</span>train()
<span/>
    <span style="color: #888888"># to get batch loss we can use the visualizer now</span>
<span style="color: #888888">#     batch_loss = np.array([])</span>
<span/>
    <span style="color: #008800; font-weight: bold">for</span> batch_idx, (data, target) <span style="color: #000000; font-weight: bold">in</span> <span style="color: #007020">enumerate</span>(train_loader):
<span/>
        <span style="color: #888888"># clone target</span>
        indx_target <span style="color: #333333">=</span> target<span style="color: #333333">.</span>clone()
        <span style="color: #888888"># send data to device (it is mandatory if GPU has to be used)</span>
        data <span style="color: #333333">=</span> data<span style="color: #333333">.</span>to(device)
        <span style="color: #888888"># send target to device</span>
        target <span style="color: #333333">=</span> target<span style="color: #333333">.</span>to(device)
<span/>
        <span style="color: #888888"># reset parameters gradient to zero</span>
        optimizer<span style="color: #333333">.</span>zero_grad()
<span/>
        <span style="color: #888888"># forward pass to the model</span>
        output <span style="color: #333333">=</span> model(data)
<span/>
        <span style="color: #888888"># cross entropy loss</span>
        loss <span style="color: #333333">=</span> loss_function(output, target)
<span/>
        <span style="color: #888888"># find gradients w.r.t training parameters</span>
        loss<span style="color: #333333">.</span>backward()
        <span style="color: #888888"># Update parameters using gradients</span>
        optimizer<span style="color: #333333">.</span>step()

<span style="color: #888888">#         batch_loss = np.append(batch_loss, [loss.item()])</span>
        <span style="color: #888888"># we should do it using the visualizer</span>
        visualizer<span style="color: #333333">.</span>update_metrics(<span style="background-color: #fff0f0">&#39;batch_loss&#39;</span>, loss<span style="color: #333333">.</span>item(), batch_idx)
<span/>
        <span style="color: #888888"># get the index of the max probability</span>
        <span style="color: #888888"># we can handle it inside the quality estimator</span>
<span style="color: #888888">#         pred = output.data.max(dim=1)[1]  </span>
<span/>
        quality_estimator<span style="color: #333333">.</span>update(output, indx_target)
<span/>
        <span style="color: #888888"># visualizer should handle it for us</span>
<span style="color: #888888">#         if batch_idx % log_interval == 0 and batch_idx &gt; 0:              </span>
<span style="color: #888888">#             print(</span>
<span style="color: #888888">#                 &#39;Train Epoch: {} [{}/{}] Loss: {:.6f} Acc: {:.4f}&#39;.format(</span>
<span style="color: #888888">#                     epoch_idx, batch_idx * len(data), len(train_loader.dataset), loss.item(), acc</span>
<span style="color: #888888">#                 )</span>
<span style="color: #888888">#             )</span>

<span style="color: #888888">#     epoch_loss = batch_loss.mean()</span>
    <span style="color: #888888"># epoch_acc = batch_acc.mean()</span>
    <span style="color: #888888"># we can do it outside of the loop now, as quality estimator and visualizer may store the values for us</span>
    <span style="color: #008800; font-weight: bold">return</span>
</pre></div>


So if we remove all the commented code, we can see that our `train` function is much shorter and is much more readable now. That is great because the less code we put into one function, the less the chance that we will miss an error in the code.

---

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">train</span>(
    device: Any(torch<span style="color: #333333">.</span>device, string),
    log_interval: <span style="color: #007020">int</span>,
    model: nn<span style="color: #333333">.</span>Module,
    optimizer: torch<span style="color: #333333">.</span>optim<span style="color: #333333">.</span>Optimizer,
    train_loader: torch<span style="color: #333333">.</span>utils<span style="color: #333333">.</span>data<span style="color: #333333">.</span>DataLoader,
    loss_function: Callable, <span style="color: #888888"># loss function</span>
    quality_estimator, <span style="color: #888888"># accuracy or other metric calculator</span>
    visualizer
) <span style="color: #333333">-&gt;</span> <span style="color: #008800; font-weight: bold">None</span>:
<span/>
    model<span style="color: #333333">.</span>train()
<span/>
    <span style="color: #008800; font-weight: bold">for</span> batch_idx, (data, target) <span style="color: #000000; font-weight: bold">in</span> <span style="color: #007020">enumerate</span>(train_loader):
<span/>
        <span style="color: #888888"># clone target</span>
        indx_target <span style="color: #333333">=</span> target<span style="color: #333333">.</span>clone()
        <span style="color: #888888"># send data to device (it is mandatory if GPU has to be used)</span>
        data <span style="color: #333333">=</span> data<span style="color: #333333">.</span>to(device)
        <span style="color: #888888"># send target to device</span>
        target <span style="color: #333333">=</span> target<span style="color: #333333">.</span>to(device)
<span/>
        <span style="color: #888888"># reset parameters gradient to zero</span>
        optimizer<span style="color: #333333">.</span>zero_grad()
<span/>
        <span style="color: #888888"># forward pass to the model</span>
        output <span style="color: #333333">=</span> model(data)
<span/>
        <span style="color: #888888"># cross entropy loss</span>
        loss <span style="color: #333333">=</span> loss_function(output, target)
<span/>
        <span style="color: #888888"># find gradients w.r.t training parameters</span>
        loss<span style="color: #333333">.</span>backward()
        <span style="color: #888888"># Update parameters using gradients</span>
        optimizer<span style="color: #333333">.</span>step()
<span/>
        visualizer<span style="color: #333333">.</span>update_metrics(<span style="background-color: #fff0f0">&#39;batch_loss&#39;</span>, loss<span style="color: #333333">.</span>item(), batch_idx)
<span/>
        quality_estimator<span style="color: #333333">.</span>update(output, indx_target)
    <span style="color: #008800; font-weight: bold">return</span>
</pre></div>


Wow, the train function fits into one screen now! More than that, it is now agnostic to the model we train, to the way we train this model, to the quality metric we use, and so on.

We can do the same procedure for the validation function, but let's omit it for your own practice.

# <font style="color:blue">Configurations</font>

It seems to me now that we have too many parameters for the train function. Maybe we can group them somehow?
Actually, yes - if we take a close look at these parameters, we may notice that there are two distinct groups of the parameters. One is for callable parameters: model, loss function, quality estimator, and so on. And another one is for things like a training device and log interval. We can name the latter group as `TrainingConfiguration` class. We put all these training process-related settings to it - just to have a more transparent interface with fewer parameters. Having the object of such class, we can easily print it to the file. We can place that file somewhere near the results of the training and have these parameters logged for further usage.

We can exploit the same idea of the `<Something>Configuration` classes for the other parts of our training pipeline, like the dataloader.

---

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #555555; font-weight: bold">@dataclass</span>
<span style="color: #008800; font-weight: bold">class</span> <span style="color: #BB0066; font-weight: bold">SystemConfiguration</span>:
    <span style="color: #DD4422">&#39;&#39;&#39;</span>
<span style="color: #DD4422">    Describes the common system setting needed for reproducible training</span>
<span style="color: #DD4422">    &#39;&#39;&#39;</span>
    seed: <span style="color: #007020">int</span> <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">42</span>  <span style="color: #888888"># seed number to set the state of all random number generators</span>
    cudnn_benchmark_enabled: <span style="color: #007020">bool</span> <span style="color: #333333">=</span> <span style="color: #008800; font-weight: bold">True</span>  <span style="color: #888888"># enable CuDNN benchmark for the sake of performance</span>
    cudnn_deterministic: <span style="color: #007020">bool</span> <span style="color: #333333">=</span> <span style="color: #008800; font-weight: bold">True</span>  <span style="color: #888888"># make cudnn deterministic (reproducible training)</span>

<span style="color: #555555; font-weight: bold">@dataclass</span>
<span style="color: #008800; font-weight: bold">class</span> <span style="color: #BB0066; font-weight: bold">TrainingConfiguration</span>:
    <span style="color: #DD4422">&#39;&#39;&#39;</span>
<span style="color: #DD4422">    Describes configuration of the training process</span>
<span style="color: #DD4422">    &#39;&#39;&#39;</span>
    batch_size: <span style="color: #007020">int</span> <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">32</span>  <span style="color: #888888"># amount of data to pass through the network at each forward-backward iteration</span>
    epochs_count: <span style="color: #007020">int</span> <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">20</span>  <span style="color: #888888"># number of times the whole dataset will be passed through the network</span>
    learning_rate: <span style="color: #007020">float</span> <span style="color: #333333">=</span> <span style="color: #6600EE; font-weight: bold">0.01</span>  <span style="color: #888888"># determines the speed of network&#39;s weights update</span>
    log_interval: <span style="color: #007020">int</span> <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">100</span>  <span style="color: #888888"># how many batches to wait between logging training status</span>
    test_interval: <span style="color: #007020">int</span> <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">1</span>  <span style="color: #888888"># how many epochs to wait before another test. Set to 1 to get val loss at each epoch</span>
    data_root: <span style="color: #007020">str</span> <span style="color: #333333">=</span> <span style="background-color: #fff0f0">&quot;data&quot;</span>  <span style="color: #888888"># folder to save MNIST data (default: data/mnist-data)</span>
    num_workers: <span style="color: #007020">int</span> <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">10</span>  <span style="color: #888888"># number of concurrent processes used to prepare data</span>
    device: <span style="color: #007020">str</span> <span style="color: #333333">=</span> <span style="background-color: #fff0f0">&#39;cuda&#39;</span>  <span style="color: #888888"># device to use for training.</span>
</pre></div>


#  <font style="color:blue">Main Function</font>

**Now let's take a look at our `main` function, where we iterate over the epochs of the model training and gather all the pieces.**

---

<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">main</span>(system_configuration<span style="color: #333333">=</span>SystemConfiguration(), training_configuration<span style="color: #333333">=</span>TrainingConfiguration()):
<span/>
    <span style="color: #888888"># system configuration</span>
    setup_system(system_configuration)
<span/>
    <span style="color: #888888"># batch size</span>
    batch_size_to_set <span style="color: #333333">=</span> training_configuration<span style="color: #333333">.</span>batch_size
    <span style="color: #888888"># num_workers</span>
    num_workers_to_set <span style="color: #333333">=</span> training_configuration<span style="color: #333333">.</span>num_workers
    <span style="color: #888888"># epochs</span>
    epoch_num_to_set <span style="color: #333333">=</span> training_configuration<span style="color: #333333">.</span>epochs_count
<span/>
    <span style="color: #888888"># if GPU is available use training config, </span>
    <span style="color: #888888"># else lower batch_size, num_workers and epochs count</span>
    <span style="color: #008800; font-weight: bold">if</span> torch<span style="color: #333333">.</span>cuda<span style="color: #333333">.</span>is_available():
        device <span style="color: #333333">=</span> <span style="background-color: #fff0f0">&quot;cuda&quot;</span>
    <span style="color: #008800; font-weight: bold">else</span>:
        device <span style="color: #333333">=</span> <span style="background-color: #fff0f0">&quot;cpu&quot;</span>
        batch_size_to_set <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">16</span>
        num_workers_to_set <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">2</span>
        epoch_num_to_set <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">5</span>
<span/>
    <span style="color: #888888"># data loader</span>
    train_loader, test_loader <span style="color: #333333">=</span> get_data(
        batch_size<span style="color: #333333">=</span>batch_size_to_set,
        data_root<span style="color: #333333">=</span>training_configuration<span style="color: #333333">.</span>data_root,
        num_workers<span style="color: #333333">=</span>num_workers_to_set
    )
<span/>
    <span style="color: #888888"># Update training configuration</span>
    training_configuration <span style="color: #333333">=</span> TrainingConfiguration(
        device<span style="color: #333333">=</span>device,
        epochs_count<span style="color: #333333">=</span>epoch_num_to_set,
        batch_size<span style="color: #333333">=</span>batch_size_to_set,
        num_workers<span style="color: #333333">=</span>num_workers_to_set
    )
<span/>
    <span style="color: #888888"># initiate model</span>
    model <span style="color: #333333">=</span> LeNet5()
<span/>
    <span style="color: #888888"># send model to device (GPU/CPU)</span>
    model<span style="color: #333333">.</span>to(training_configuration<span style="color: #333333">.</span>device)
<span/>
    <span style="color: #888888"># optimizer</span>
    optimizer <span style="color: #333333">=</span> optim<span style="color: #333333">.</span>SGD(
        model<span style="color: #333333">.</span>parameters(),
        lr<span style="color: #333333">=</span>training_configuration<span style="color: #333333">.</span>learning_rate
    )
<span/>
    best_loss <span style="color: #333333">=</span> torch<span style="color: #333333">.</span>tensor(np<span style="color: #333333">.</span>inf)
<span/>
    <span style="color: #888888"># epoch train/test loss</span>
    epoch_train_loss <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array([])
    epoch_test_loss <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array([])
<span/>
    <span style="color: #888888"># epoch train/test accuracy</span>
    epoch_train_acc <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array([])
    epoch_test_acc <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array([])
<span/>
    <span style="color: #888888"># training time measurement</span>
    t_begin <span style="color: #333333">=</span> time<span style="color: #333333">.</span>time()
    <span style="color: #008800; font-weight: bold">for</span> epoch <span style="color: #000000; font-weight: bold">in</span> <span style="color: #007020">range</span>(training_configuration<span style="color: #333333">.</span>epochs_count):
<span/>
        train_loss, train_acc <span style="color: #333333">=</span> train(training_configuration, model, optimizer, train_loader, epoch)
<span/>
        epoch_train_loss <span style="color: #333333">=</span> np<span style="color: #333333">.</span>append(epoch_train_loss, [train_loss])
<span/>
        epoch_train_acc <span style="color: #333333">=</span> np<span style="color: #333333">.</span>append(epoch_train_acc, [train_acc])
<span/>
        elapsed_time <span style="color: #333333">=</span> time<span style="color: #333333">.</span>time() <span style="color: #333333">-</span> t_begin
        speed_epoch <span style="color: #333333">=</span> elapsed_time <span style="color: #333333">/</span> (epoch <span style="color: #333333">+</span> <span style="color: #0000DD; font-weight: bold">1</span>)
        speed_batch <span style="color: #333333">=</span> speed_epoch <span style="color: #333333">/</span> <span style="color: #007020">len</span>(train_loader)
        eta <span style="color: #333333">=</span> speed_epoch <span style="color: #333333">*</span> training_configuration<span style="color: #333333">.</span>epochs_count <span style="color: #333333">-</span> elapsed_time
<span/>
        <span style="color: #007020">print</span>(
            <span style="background-color: #fff0f0">&quot;Elapsed {:.2f}s, {:.2f} s/epoch, {:.2f} s/batch, ets {:.2f}s&quot;</span><span style="color: #333333">.</span>format(
                elapsed_time, speed_epoch, speed_batch, eta
            )
        )
<span/>
        <span style="color: #008800; font-weight: bold">if</span> epoch <span style="color: #333333">%</span> training_configuration<span style="color: #333333">.</span>test_interval <span style="color: #333333">==</span> <span style="color: #0000DD; font-weight: bold">0</span>:
            current_loss, current_accuracy <span style="color: #333333">=</span> validate(training_configuration, model, test_loader)
<span/>
            epoch_test_loss <span style="color: #333333">=</span> np<span style="color: #333333">.</span>append(epoch_test_loss, [current_loss])
<span/>
            epoch_test_acc <span style="color: #333333">=</span> np<span style="color: #333333">.</span>append(epoch_test_acc, [current_accuracy])
<span/>
            <span style="color: #008800; font-weight: bold">if</span> current_loss <span style="color: #333333">&lt;</span> best_loss:
                best_loss <span style="color: #333333">=</span> current_loss
<span/>
    <span style="color: #007020">print</span>(<span style="background-color: #fff0f0">&quot;Total time: {:.2f}, Best Loss: {:.3f}&quot;</span><span style="color: #333333">.</span>format(time<span style="color: #333333">.</span>time() <span style="color: #333333">-</span> t_begin, best_loss))
<span/>
    <span style="color: #008800; font-weight: bold">return</span> model, epoch_train_loss, epoch_train_acc, epoch_test_loss, epoch_test_acc
</pre></div>


###  <font style="color:magenta">Develop the Interface for the Main Function</font>

**Let me collapse the logical blocks of this function and provide the following structure:**

---

<div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">main</span>(system_configuration<span style="color: #333333">=</span>SystemConfiguration(), training_configuration<span style="color: #333333">=</span>TrainingConfiguration()):
    <span style="color: #888888"># prepare the model</span>
    <span style="color: #888888"># prepare the data</span>
    <span style="color: #888888"># prepare the loss</span>
    <span style="color: #888888"># prepare the optimizer</span>
    <span style="color: #888888"># prepare the visualization</span>
<span/>
    <span style="color: #888888"># initialize some internal stuff</span>
<span/>
    <span style="color: #008800; font-weight: bold">for</span> epoch <span style="color: #000000; font-weight: bold">in</span> <span style="color: #007020">range</span>(training_configuration<span style="color: #333333">.</span>epochs_count):
<span/>
        <span style="color: #888888"># train one epoch with train()</span>
<span/>
        <span style="color: #888888"># do visualization</span>
<span/>
        <span style="color: #008800; font-weight: bold">if</span> epoch <span style="color: #333333">%</span> training_configuration<span style="color: #333333">.</span>test_interval <span style="color: #333333">==</span> <span style="color: #0000DD; font-weight: bold">0</span>:
            <span style="color: #888888"># validate</span>
<span/>
            <span style="color: #888888"># do some internal stuff on the best model selection</span>
            <span style="color: #008800; font-weight: bold">pass</span>
</pre></div>

The structure

```
    # initialize some internal stuff

    for epoch in range(training_configuration.epochs_count):

        # train one epoch with train()

        # do visualization

        if epoch % training_configuration.test_interval == 0:
            # validate

            # do some internal stuff on the best model selection
```

seems to be agnostic to the rest of the things we deal with - so we can generalize it the same way as we did for the `train` function. We rename the `train` to `train_epoch` and this new function will be just `train`.

But now we have generalized `train` and generalized `train_epoch`, and also we have some internal things around these functions that are needed for the training time printing and models saving and so on. So it worth it to extract this stuff to a class and name it `Trainer`.

#  <font style="color:blue">Experiment Pipeline: The Training Process</font>

**Let's take a look at the modules diagram in the training pipeline again:**

---
<img src='https://www.learnopencv.com/wp-content/uploads/2020/02/c3-w6-training_pipeline_science_vs_engineering.svg'>

---

Obviously, there are many different ways of decomposing the system into the independent blocks and this scheme only shows one of them (that we found to work well in practice and be aligned well with Pytorch philosophy)

Blocks in green are the blocks that typically do not change with the change of your business/research task. They are mostly engineering-related parts of your deep learning pipeline. Typically you invest some time into their development or use some open-source blocks for them - they help you to obtain reliable and reproducible results, do not fail because your best model was overwritten with another one, and so on.

Blocks in blue are the blocks that require your attention as a researcher - they are task-specific, and it is presumed that the most time you'll spend changing these blocks and trying different hypotheses.

The separation of the deep learning pipeline helps a lot to speed up the research and get reliable results.

Since the "engineering" part of the pipeline is assumed to change much less often, compared to the "research" part, it makes sense to extract the code of these classes and functions to a separate library. The idea is that you can share this library among your research group and benefit from the improvements your colleagues made. Also, it makes your research code more clean and concise - you do not go through the system setup and files saving code anymore while you're taking a top-level overview of your project.

So, in this course, we've developed a small version of such an "engineering" part of the training pipeline. You will find it in the "trainer" folder in the following lectures. We will put the "research" part in each practice into the notebook. We will work with the trainer as we do in non-study projects: configure it to train our models properly using the correct data. We will not pay attention to the trainer code unless it is necessary.

# <font style="color:blue">References</font>

You may wonder whether it is a common way of doing deep learning or we're overengineering here. We may assure you that this is a common way to do deep learning research in an industry - most of the companies and research groups invest in building these DL training frameworks for their projects, and some of them are even published to the open-source. To name a couple of them:
- https://github.com/NVlabs/SPADE
- https://github.com/pytorch/ignite
- https://github.com/PyTorchLightning/pytorch-lightning
- https://github.com/catalyst-team/catalyst
- https://github.com/open-mmlab/mmdetection
- https://github.com/fastai/fastai