# Multi-GPU

## What's the point of multi-GPU?

You may need more than one GPU if model training time consumes a significant fraction of execution pipeline time. 
Therefore, if you have several GPUs, you can use all of them to train model. This will speed up training process of the model. 

Parameter `device` allows train model on multiple GPU (Сreates a copy of model on each selected GPU).
Initialization of large model on a large number of GPU may take some time (minuts or tens of minutes)! 

In [1]:
import os
import sys
import warnings

sys.path.append('../../..')
from batchflow import Pipeline, B, C, V, D
from batchflow.opensets import MNIST
from batchflow.models.tf import ResNet18

Specify which GPU(s) to be used. More about it in [CUDA documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars).

In [2]:
%env CUDA_DEVICE_ORDER=PCI_BUS_ID
%env CUDA_VISIBLE_DEVICES=5,6

env: CUDA_DEVICE_ORDER=PCI_BUS_ID
env: CUDA_VISIBLE_DEVICES=5,6


## Create a dataset, define a pipeline config, define a default model config

In [3]:
dataset = MNIST(bar=True)

config = dict(model=ResNet18)

model_config = {'inputs/images/shape': B.image_shape,
                'inputs/labels/classes': D.num_classes,
                'initial_block/inputs': 'images'}

100%|██████████| 8/8 [00:01<00:00,  1.98it/s]


In [4]:
BATCH_SIZE = 64

# Add GPUs

We could use `device` and set up 2 GPUs to train model:

In [5]:
model_config.update({'device': ['GPU:0', 'GPU:1']})

Parameter `device` can be string, list of strings or regular expression.

Example:
```python
'device': 'GPU:0'                     # Used only GPU:0
'device': ['GPU:0', 'GPU:1', 'GPU:2'] # Used GPU:0, GPU:1 and GPU:2
'device': 'GPU:*'                     # Used all avalible GPU
```

> **BATCH SIZE MUST BE DIVISIBLE BY NUMBER OF DEVICES WITHOUT A REMAINDER! (IF MICROBATHING ~~BATCH SIZE~~ MICROBATCH SIZE)**

# Train model on several GPU

In [6]:
template_multi = (Pipeline(config=config)
                        .to_array()
                        .train_model('conv_nn', fetches='loss', 
                                     images=B.images, labels=B.labels,
                                     save_to=V('loss_history', mode='a'), use_lock=True))

(template_multi.before
 .init_variable('loss_history', [])
 .init_model('dynamic', C('model'),'conv_nn',
             config=model_config))

<batchflow.once_pipeline.OncePipeline at 0x7f2640e2db38>

In [7]:
pipeline_multi = template_multi << dataset.train
pipeline_multi.run(BATCH_SIZE, shuffle=True, n_epochs=1, bar=True, drop_last=True)

100%|██████████| 937/937 [01:03<00:00, 24.25it/s]


<batchflow.pipeline.Pipeline at 0x7f265482b128>

Model training lasted 1:03 on two GPUs. If we add more GPUs, we get even less training time.

# Multi-GPU and microbathing

###  Schematic illustration of the formation of batches to each GPU

<img src="./img/Batch_microbatch_GPU.png" width="700">

We can use `microbatch` and `device` at the same time. If we have huge batches it be useful.

Add microbathing.

In [8]:
model_config.update({'microbatch': 32})

# Train model with several GPUs and microbatching

In [9]:
template_multi_micro = (Pipeline(config=config)
                        .to_array()
                        .train_model('conv_nn', fetches='loss', 
                                     images=B.images, labels=B.labels,
                                     save_to=V('loss_history', mode='a'), 
                                     use_lock=True))

(template_multi_micro.before
 .init_variable('loss_history', [])
 .init_model('dynamic', C('model'),'conv_nn',
             config=model_config))

<batchflow.once_pipeline.OncePipeline at 0x7f23ac624e10>

In [10]:
pipeline_multi_micro = template_multi_micro << dataset.train
pipeline_multi_micro.run(BATCH_SIZE, shuffle=True, n_epochs=1, bar=True, drop_last=True)

100%|██████████| 937/937 [01:28<00:00, 14.56it/s]


<batchflow.pipeline.Pipeline at 0x7f23ac624828>

Model training finish without error it means that we can use `device` and `microbatch` together. 
Look at training time of current model, the previous model, the model from [03_ready_to_use_model_tf](../03_ready_to_use_model_tf.ipynb) and the model 
from [01_microbatch](./01_microbatch.ipynb). When we added one more gpu to the model from [03_ready_to_use_model_tf](../03_ready_to_use_model_tf.ipynb) (made previous model) we got significant speed up, training time was reduced by 2 times! 
And when we added gpu to the model from [01_microbatch](./01_microbatch.ipynb) (made current model) we also got speed up.

If you will use several GPUs, you can save tens of hours training the model or even more!

Next tutorial about [different training procedures](./03_train_steps.ipynb).