Skip to content
This repository has been archived by the owner on Sep 1, 2023. It is now read-only.

Lightning-Universe/lightning-Bagua

Repository files navigation

Lightning ⚡ Bagua

Deep Learning Training Acceleration with Bagua and Lightning AI

lightning PyPI Status PyPI - Python Version PyPI - Downloads Deploy Docs

General checks Build Status pre-commit.ci status

Bagua is a deep learning training acceleration framework that supports multiple advanced distributed training algorithms, including:

  • Gradient AllReduce for centralized synchronous communication, where gradients are averaged among all workers.
  • Decentralized SGD for decentralized synchronous communication, where each worker exchanges data with one or a few specific workers.
  • ByteGrad and QAdam for low precision communication, where data is compressed into low precision before communication.
  • Asynchronous Model Average for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.

By default, Bagua uses Gradient AllReduce algorithm, which is also the algorithm implemented in DDP, but Bagua can usually produce a higher training throughput due to its backend written in Rust.

Installation

pip install -U lightning-bagua

Usage

Simply set the strategy argument in the Trainer:

from lightning import Trainer

# train on 4 GPUs (using Bagua mode)
trainer = Trainer(strategy="bagua", accelerator="gpu", devices=4)

See Bagua Tutorials for more details on installation and advanced features.