QSync

Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".

Description

QSync aims to explore the potential of removing unnecessary quantized operations to improve training accuracy. It achieves this through the following components:

Quantization perturbation indicator/Replayer for analyzing the global data flow graph's memory and latency under mixed-precision (Predictor)
Allocator for selecting the optimal quantized operations for training (Allocator / Syncer)
Support for low-precision backends (CUTLASS, CUDNN) (LP-PyTorch)

In particular, QSync addresses a specific practical scenario: hybrid-cluster training, which involves inference GPUs with power capabilities (memory, compute) and training GPUs with higher capabilities.

The provided scripts support both convolution-based and transformer-based models.

NOTE: The project is a bit old. The performance of kernel implementation may not catch up with latest PyTorch.

Set Environment

Clone the repo git clone --recursive https://github.com/bytedance/QSync.git

Docker

run build.sh under dockerfile
run run.sh, specifiying the necessary path mounting inside.
run pip install -e . right in the root folder of QSync, compilation of kernels will start.

Manual Installation

Some libs may hard to install without proxy. Change <abspath_to_root> in m_install.sh to the absolute path to the root folder. Then

bash m_install.sh
make

Usage

QSync is implemented under the qsync folder, composed of syncer, predictor and LpTorch.

to use LpTorch and convert your model to mixed-biwdith model, use model = QModule(model)
See detail for usage of predictor and syncer in the corresponding page.
See sample under benchmark_convs / benchmark_transformers

notice the cross-node cost modeling is not as accurate as single-node is. Extra efforts required to align the communication start.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
3rd_configs		3rd_configs
3rd_party		3rd_party
DynamicBatching		DynamicBatching
benchmark_convs		benchmark_convs
benchmark_transformers		benchmark_transformers
configs		configs
dockerfile		dockerfile
pytorch		pytorch
qsync		qsync
ref_bitplans		ref_bitplans
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
m_install.sh		m_install.sh
requirements.txt		requirements.txt
setup.py		setup.py
target_conf_gen.py		target_conf_gen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QSync

Description

Set Environment

Docker

Manual Installation

Usage

About

Releases

Packages

Languages

License

bytedance/QSync

Folders and files

Latest commit

History

Repository files navigation

QSync

Description

Set Environment

Docker

Manual Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages