Deep Learning Training Acceleration with Bagua and Lightning AI
Bagua is a deep learning training acceleration framework that supports multiple advanced distributed training algorithms, including:
- Gradient AllReduce for centralized synchronous communication, where gradients are averaged among all workers.
- Decentralized SGD for decentralized synchronous communication, where each worker exchanges data with one or a few specific workers.
- ByteGrad and QAdam for low precision communication, where data is compressed into low precision before communication.
- Asynchronous Model Average for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.
By default, Bagua uses Gradient AllReduce algorithm, which is also the algorithm implemented in DDP, but Bagua can usually produce a higher training throughput due to its backend written in Rust.
pip install -U lightning-bagua
Simply set the strategy argument in the Trainer:
from lightning import Trainer
# train on 4 GPUs (using Bagua mode)
trainer = Trainer(strategy="bagua", accelerator="gpu", devices=4)
See Bagua Tutorials for more details on installation and advanced features.