This is the repository for multi-level training framework. We provide the Coalescing, De-coalescing and Interpolation operators described in the paper and an example of accelerating the pre-training of GPT-2 on Wiki-En. The framework is built based on the transformers.
Step 1
To use the map_tools for the model mapping, please install following packages.
pip install torch==2.0.1+cu118 transformers==4.31.0
Step 2
If you hope to run the pre-training acceleration example, please install packages as follows.
cd example
pip install -r requirements.txt
We implemented the three operators to ochestrate the multi-level training framework in map_tools. With map tools, it's convenient to resize and merge transformers. The usage of map tools could be found in map_tools document.
To better illustrate the usage of map tools and demonstrate the effectiveness of the multi-level training framework, we provide a example of accelerating the GPT-2 pre-training on Wikipedia-En.
If you hope to run the example, 150GB space of disk is required to preprocess the wikipedia dataset.
In the initial paper, the interpolation ratio is set based on priliminary experimental results. After acceptance, we find that it's possible to adaptively refine this process further.
We have implemented an adaptive mechanism to determine the interpolation ratio dynamically. First, we normalize the parameters of the de-coalesced model to align with those of the original model prior to coalescing. The normalization process balances the parameters of these two model. Then we simply use the ratio of 0.5 to merge all parameters. We termed the process as adaptive interpolation, since it dynamically merge parameters with different ratios. Our experiments show that adaptive interpolation could further save around 10% FLOPs.
If you hope to boost the multigrid training with adaptive interpolation, just utilize the --norm-src
flag during the model interpolation phase with the map tools.
- BERT
- GPT2
- LLaMA
- DeiT
Please cite our paper if you find the repo helpful for you:
@inproceedings{
zou2024a,
title={A Multi-Level Framework for Accelerating Training Transformer Models},
author={Longwei Zou and Han Zhang and Yangdong Deng},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=BI1N3lTWtn}
}