# Microbatching

## What's the point of microbathing?

As we know, bigger batch size allows for better gradients estimates, thus helping training models. Unfortunately, there is a hard constraint on device memory: even the modern accelerators can't fit more that hundreds of gigabytes, which is sometimes just not enough. This is where `microbatch` comes to the rescue: it allows to evenly split batch data into multiple pieces (called microbatches), evaluate gradients on each of them separately, and the apply average value of gradient directly to the weights of the model.

In [1]:
import os
import sys
import warnings

sys.path.append('../../..')
from batchflow import Pipeline, B, C, V, D
from batchflow.opensets import MNIST
from batchflow.models.tf import ResNet18

Specify which GPU(s) to be used. More about it in [CUDA documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars).

In [2]:
%env CUDA_DEVICE_ORDER=PCI_BUS_ID
%env CUDA_VISIBLE_DEVICES=4

env: CUDA_DEVICE_ORDER=PCI_BUS_ID
env: CUDA_VISIBLE_DEVICES=4


## Create a dataset, define a pipeline config, define a default model config

In [3]:
dataset = MNIST(bar=True)

config = dict(model=ResNet18)

model_config = {'inputs/images/shape': B.image_shape,
                'inputs/labels/classes': D.num_classes,
                'initial_block/inputs': 'images'}

100%|██████████| 8/8 [00:02<00:00,  1.81it/s]


In [4]:
BATCH_SIZE = 64

# Add microbatch

We can add `microbatch` to model configuration:

In [5]:
model_config.update({'microbatch': 16})

Now, if we run the pipeline, the model will receive batches with size 16 not BATCH_SIZE.

> **MICROBATH SIZE MUST BE A DIVISOR OF THE BATCH SIZE!**

# Train model with microbatch

In [6]:
train_template_microbatch = (Pipeline(config=config)
                  .to_array()
                  .train_model('conv_nn', fetches='loss', 
                               images=B.images, labels=B.labels,
                               save_to=V('loss_history', mode='a')))

(train_template_microbatch.before
 .init_variable('loss_history', [])
 .init_model('dynamic', C('model'), 'conv_nn', config=model_config))

<batchflow.once_pipeline.OncePipeline at 0x7fefd02bd390>

In [7]:
train_pipeline_microbatch = train_template_microbatch << dataset.train
train_pipeline_microbatch.run(BATCH_SIZE, shuffle=True, n_epochs=1, bar=True, drop_last=True)

100%|██████████| 937/937 [01:53<00:00,  9.18it/s]


<batchflow.pipeline.Pipeline at 0x7ff077be4ef0>

If we didn’t have `microbatch` in the model configuration and we want to split batch into microbatches. We could run pipeline with parameter `microbatch=microbatch_size`:
```python
train_pipeline_microbatch.run(BATCH_SIZE, shuffle=True, n_epochs=1, microbatch=16, bar=True, drop_last=True)
```

Comparing to pipeline from [03_ready_to_use_model_tf](../03_ready_to_use_model_tf.ipynb) we can see, that `microbatch` allows us to leverage simple trade-off between bigger batch size (thus model performance) and model training time. **If the data in batch can fit memory constraints, there is no reason to use `microbatch` due to inherently slower processing of batches.**

Now we can train models using `microbatch` and you might want to see next tutorial about [multiple devices](./02_device.ipynb).