Lightning extension: Horovod

Horovod allows the same training script for single-GPU, multi-GPU, and multi-node training.

Like Distributed Data-Parallel, Horovod's processes operate on a single GPU with a fixed subset of the data. Gradients are averaged across all GPUs in parallel during the backward pass, then synchronously applied before beginning the next step.

The number of worker processes is configured by a driver application (horovodrun or mpirun). Horovod will detect the number of workers from the environment in the training script and automatically scale the learning rate to compensate for the increased total batch size.

Horovod can be configured in the training script to run with any number of GPUs / processes as follows:

from lightning import Trainer
from lightning_horovod import HorovodStrategy

# train Horovod on GPU (number of GPUs / machines provided on command-line)
trainer = Trainer(strategy="horovod", accelerator="gpu", devices=1)

# train Horovod on CPU (number of processes/machines provided on command-line)
trainer = Trainer(strategy=HorovodStrategy())

When starting the training job, the driver application will then be used to specify the total number of worker processes:

# run training with 4 GPUs on a single machine
horovodrun -np 4 python train.py

# run training with 8 GPUs on two machines (4 GPUs each)
horovodrun -np 8 -H hostname1:4,hostname2:4 python train.py

See the official Horovod documentation for installation and performance tuning details.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.azure		.azure
.github		.github
docs		docs
src/lightning_horovod		src/lightning_horovod
tests		tests
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

Lightning-Universe/lightning-Horovod

Folders and files

Latest commit

History

Repository files navigation

Lightning extension: Horovod

About

Resources

License

Stars

Watchers

Forks

Languages