Skip to content

A PyTorch toolkit for extremely fast ImageNet training with NVIDIA DALI.


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



6 Commits

Repository files navigation


This is a PyTorch toolkit for accelerating ImageNet training based on the distributed mode with NVIDIA DALI equipped. Though with extremely high training speed, the toolkit achieves similar or higher performance than that reported in the original papers.

Main Requirements (recommended)

  • python >= 3.7.0
  • pytorch >= 1.2.0
  • CUDA >= 10.0
  • nvidia-dali-cuda100 >= 0.23.0
  • mmcv >= 1.0.5
  • mxnet >= 1.6.0 (only used for preparing the dataset)


  1. As shown in the official evaluation, the MXNet data format shows notable speed advantages. We use the MXNet .rec dataset format for training. Please prepare the ImageNet dataset as .rec following "Create a Dataset Using RecordIO". speed_comp

  2. Train your model with the following script. It is recommended to run the training code on 8 GPUs with a total batch size of 128x8. Taken MobileNetV2 training as an example,

sh ./scripts/ 8 ./imagenet-rec


We evaluate our toolkit on several popular networks as follows, which achieves similar or higher performance than the original papers reported ones. All the experiments are performed on 8 TITAN-XP GPUs.

Our pre-trained models and corresponding training logs can be downloaded at DALI_MODEL_ZOO.

Model Reported Top-1(%) Top-1(%) Epochs Time w/dali
ResNet18 69.76* 72.15 240 16h
MobileNetV2 72.0 72.94 240 1d4.5h
MobileNetV3 75.2 75.07 400 2d1.5h
GhostNet 1.0x 73.9 73.97 400 1d21.5h

* denotes the result reported in the torchvision model zoo.
"d": day; "h": hour.

Comparisons between training w/ and w/o dali:

Model Method Top-1(%) Time
ResNet18 w/ dali 72.15 16h
w/o dali 72.01 47.5h

The w\o dali setting is based on the lmdb data format from the codebase of DenseNAS.


This project is in part supported by Horizon Robotics.

The code of some models is from ghostnet and mobilenetv3. Thanks for the contribution of these repositories.


This repository is partially based on DenseNAS. If you find this toolkit helpful in your research, please cite

  title={Densely connected search space for more flexible neural architecture search},
  author={Fang, Jiemin and Sun, Yuzhu and Zhang, Qian and Li, Yuan and Liu, Wenyu and Wang, Xinggang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},


A PyTorch toolkit for extremely fast ImageNet training with NVIDIA DALI.







No releases published