This repository contains the official code for Deep Incubation: Training Large Models by Divide-and-Conquering.
Title: Deep Incubation: Training Large Models by Divide-and-Conquering
Authors: Zanlin Ni*, Yulin Wang*, Jiangwei Yu, Haojun Jiang, Yue Cao, Gao Huang (Corresponding Author)
Institute: Tsinghua University and Beijing Academy of Artificial Intelligence (BAAI)
Publish: arXiv preprint (arXiv 2212.04129)
Contact: nzl22 at mails dot tsinghua dot edu dot cn
Dec 22, 2022
: release all pre-trained models on ImageNet-1K, including models tuned at higher resolutions.Dec 15, 2022
: release pre-trained models for ViT-B, ViT-L and ViT-H on ImageNet-1K.Dec 13, 2022
: release pre-trained meta models for ViT-B, ViT-L and ViT-H on ImageNet-1K.Dec 10, 2022
: release code for training ViT-B, ViT-L and ViT-H on ImageNet-1K.
Our final models and the pre-trained meta models are all available at 🤗 Hugging Face. Please follow the instructions in EVAL.md and TRAINING.md for their usage.
In this paper, we present a divide-and-conquer strategy for training large models. Our algorithm, Deep Incubation, divides a large model into smaller modules, optimizes them independently, and then assembles them together. Though conceptually simple, our method significantly outperforms end-to-end (E2E) training in terms of both training efficiency and final accuracy. For example, on ViT-H, Model Incubation outperforms E2E training by 2.7% or achieves similar performance with 4x less training time.
- The ImageNet dataset should be prepared as follows:
data
├── train
│ ├── folder 1 (class 1)
│ ├── folder 2 (class 1)
│ ├── ...
├── val
│ ├── folder 1 (class 1)
│ ├── folder 2 (class 1)
│ ├── ...
See EVAL.md for the pre-trained models and the evaluation instructions.
See TRAINING.md for the training instructions.
If you find our work helpful, please star🌟 this repo and cite📑 our paper. Thanks for your support!
@article{Ni2022Incub,
title={Deep Incubation: Training Large Models by Divide-and-Conquering},
author={Ni, Zanlin and Wang, Yulin and Yu, Jiangwei and Jiang, Haojun and Cao, Yue and Huang, Gao},
journal={arXiv preprint arXiv:2212.04129},
year={2022}
}
Our implementation is mainly based on deit. We thank to their clean codebase.
If you have any questions or concerns, please send mail to nzl22@mails.tsinghua.edu.cn.