# Multi-GPU

## What's the point of multi-GPU?

You may need more than one GPU if model training time consumes a significant fraction of execution pipeline time.

If you have several GPUs, you can use them to train model. This will speed up training process of the model. 

Parameter `device` allows train model on multiple GPU (Сreates a copy of model on each selected GPU).
Initialization of large model on a large number of GPU may take some time (minuts or tens of minutes)! 

In [1]:
import os
import sys
import warnings

import tensorflow as tf

sys.path.append('../../..')
from batchflow import Pipeline, B, C, V, D
from batchflow.opensets import MNIST
from batchflow.models.tf import VGG7

Specify which GPU(s) to be used. More about it in [CUDA documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars).

In [2]:
%env CUDA_DEVICE_ORDER=PCI_BUS_ID
%env CUDA_VISIBLE_DEVICES=5,6

env: CUDA_DEVICE_ORDER=PCI_BUS_ID
env: CUDA_VISIBLE_DEVICES=5,6


## Create a dataset, define a pipeline config, define a default model config

In [3]:
dataset = MNIST(bar=True)

config = dict(model=VGG7)

model_config = {'inputs': {'images/shape': B.image_shape,
                           'labels': {'classes': D.num_classes,
                                      'transform': 'ohe'}},
                'initial_block': {'inputs': 'images'}}

100%|██████████| 8/8 [00:02<00:00,  1.78it/s]


# Train model on one GPU

In [4]:
train_template = (Pipeline(config=config)
                  .to_array()
                  .train_model('conv_nn', fetches='loss', 
                               images=B.images, labels=B.labels,
                               save_to=V('loss_history', mode='a'),
                               use_lock=True))

(train_template.before
 .init_variable('loss_history', default=[])
 .init_model('dynamic', C('model'),'conv_nn',
             config=model_config))

<batchflow.once_pipeline.OncePipeline at 0x7f4fee13a278>

In [5]:
BATCH_SIZE = 1000

In [6]:
train_pipeline = train_template << dataset.train
train_pipeline.run(BATCH_SIZE, shuffle=True, n_iters=1000, bar=True, drop_last=True)

100%|██████████| 1000/1000 [04:01<00:00,  4.40it/s]


<batchflow.pipeline.Pipeline at 0x7f4ffe14d2e8>

Model training lasted 4 minutes on one GPU.

# Add GPU

We could use `device` and set up 2 GPUs to train model:

In [7]:
model_config.update({'device': ['GPU:0', 'GPU:1']})

Parameter `device` can be either string or sequence of strings.

Example:
```python
'device': 'GPU:0'                     # Used only GPU:0
'device': ['GPU:0', 'GPU:1', 'GPU:2'] # Used GPU:0, GPU:1 and GPU:2
'device': 'GPU:*'                     # Used all avalible GPU
```

> **NUMBER OF DEVICES MUST BE A DEVISOR OF THE BATCH SIZE! (IF MICROBATHING ~~BATCH SIZE~~ MICROBATCH SIZE)**

# Train model on several GPU

In [8]:
train_template_multi = (Pipeline(config=config)
                        .to_array()
                        .train_model('conv_nn', fetches='loss', 
                                     images=B.images, labels=B.labels,
                                     save_to=V('loss_history', mode='a'), 
                                     use_lock=True))

(train_template_multi.before
 .init_variable('loss_history', default=[])
 .init_model('dynamic', C('model'),'conv_nn',
             config=model_config))

<batchflow.once_pipeline.OncePipeline at 0x7f4f2c4bcef0>

In [9]:
train_pipeline_multi = train_template_multi << dataset.train
train_pipeline_multi.run(BATCH_SIZE, shuffle=True, n_iters=1000, bar=True, drop_last=True)

100%|██████████| 1000/1000 [02:44<00:00,  6.62it/s]


<batchflow.pipeline.Pipeline at 0x7f4f2c4bcfd0>

Model training lasted 2:44 on two GPUs it is significant increase of training speed.

Using `device` you can decrease training time!

# Multi-GPU and microbathing

###  Schematic illustration of the formation of batches to each GPU

<img src="./img/Batch_microbatch_GPU.png" width="700">

We can use `microbatch` and `device` at the same time. If we have huge batches it be useful.

Add microbathing and define new batch size.

In [19]:
model_config.update({'microbatch': 100})

In [21]:
BATCH_SIZE_2 = 8000

# Train model with several GPUs and microbatching

In [22]:
template_multi_micro = (Pipeline(config=config)
                        .to_array()
                        .train_model('conv_nn', fetches='loss', 
                                     images=B.images, labels=B.labels,
                                     save_to=V('loss_history', mode='a'), 
                                     use_lock=True))

(template_multi_micro.before
 .init_variable('loss_history', default=[])
 .init_model('dynamic', C('model'),'conv_nn',
             config=model_config))

<batchflow.once_pipeline.OncePipeline at 0x7f4d7c692ba8>

In [23]:
pipeline_multi_micro = template_multi_micro << dataset.train
pipeline_multi_micro.run(BATCH_SIZE_2, shuffle=True, n_iters=100, bar=True, drop_last=True)

100%|██████████| 100/100 [02:39<00:00,  1.53s/it]


<batchflow.pipeline.Pipeline at 0x7f4d7c68f358>

Model training finish without error it means that we can use `device` and `microbatch` together. 
