# Scaling out (multi-nodes + multi-GPUs)

*Lightning integrates with horovod, standard.*

### Horovod

Horovods is an [Open Source framework](https://horovod.readthedocs.io/en/stable/summary_include.html) backed by Uber for distributed DL, compatible with TensorFlow, PyTorch and MXNet. 

Parallelism is done by [SPMD programming](https://en.wikipedia.org/wiki/SPMD) with MPI, and its development was motivated by the following leitmotiv: 'Internally at Uber we found the MPI model to be much more straightforward and require far less code changes than previous solutions such as Distributed TensorFlow with parameter servers. Once a training script has been written for scale with Horovod, it can run on a single-GPU, multiple-GPUs, or even multiple hosts without any further code changes'.

Horovod encompasses the design principle for any core MPI program. [More info](https://horovod.readthedocs.io/en/stable/concepts.html) 
* Size: number of processes
* Rank: unique process identifier
* Lank rank: unique process identifier within the server
* AllReduce: operation that aggregates data among multiple processes and distributes them back to them
* AllGather: operation that gathers data from all processes on every process
* Broadcast: operation that broadcasts data from one to every other processes
* AllToAll: operation to distribute data between all processes

Adapting an existing code from pure TensorFlow + Keras or pure PyTorch to Horovod [is just a few lines of code](https://horovod.readthedocs.io/en/stable/pytorch.html).

Hopefully, Lightning takes care of the burden for us. Simply change the `accelerator` option of the `pl.Trainer` to `horovod`, then run the command below for single-node execution:

In [None]:
%%bash
horovodrun -np 4 python trainout.py > ${HOME}/.kosmoss/logs/trainout_gpu_horovod.stdout

Contraire to native DistributedDataParallel detailed in the previous section, no need to adjust the learning rate `lr` this time, horovod takes care of that underneath.

To go multi-node, use the same mpirun execution syntax:

In [None]:
%%bash
horovodrun -np 8 -H hostname1:4,hostname2:4 python trainout.py > ${HOME}/.kosmoss/logs/trainout_gpu_horovod.stdout