Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

This repository contains PyTorch training code for the AAAI 2023 paper.

Comparison of different models with various accuracy-training time trade-off..

Usage

Requirements

torch>=1.8.0
torchvision>=0.9.0
timm==0.4.5

Data preparation: download and extract ImageNet images from http://image-net.org/. The directory structure should be

│ILSVRC2012/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Training

To train the model on ImageNet from scratch, run:

DeiT

CUDA_VISIBLE_DEVICES="0,1,2,3" python3 -m torch.distributed.launch --nproc_per_node=4 --use_env main.py 
                               --model deit_tiny_patch16_224_attn_dst 
                               --batch-size 512 
                               --data-path /datasets/imagenet 
                               --keep_ratio 0.9 
                               --attn_ratio 0.1 
                               --output_dir output_dir 
                               --remove-n 64058

You can train models with different ratio by adjusting token ratio keep_ratio and attention ratio attn_ratio . For the ratio of example level, modify the amount of examples to remove and restore in remove-n and also the random remove before training in train_example_idx, removed_example_idx of the code main.py

For DeiT-S and DeiT-B, replace the --model as in DeiT

Swin Transformer

cd Swin

CUDA_VISIBLE_DEVICES="0,1,2,3" python3 -m torch.distributed.launch --nproc_per_node=4 --use_env main.py 
                               --cfg configs/swin/swin_tiny_patch4_window7_224_token.yaml 
                               --batch-size 128 
                               --data-path /datasets/imagenet 
                               --output output_dir

License

MIT License

Acknowledgements

Our code is based on pytorch-image-models, DeiT, Swin-Transformer.

Citation

If you find our work useful in your research, please consider citing:

@article{kong2022peeling,
  title={Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training},
  author={Kong, Zhenglun and Ma, Haoyu and Yuan, Geng and Sun, Mengshu and Xie, Yanyue and Dong, Peiyan and Meng, Xin and Shen, Xuan and Tang, Hao and Qin, Minghai and others},
  journal={arXiv preprint arXiv:2211.10801},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Swin		Swin
models		models
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
engine.py		engine.py
engine_cls.py		engine_cls.py
frameworkv4.pdf		frameworkv4.pdf
hubconf.py		hubconf.py
losses.py		losses.py
main-cls.py		main-cls.py
main.py		main.py
plot_time_acc_table.png		plot_time_acc_table.png
requirements.txt		requirements.txt
run.sh		run.sh
run_with_submitit.py		run_with_submitit.py
samplers.py		samplers.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

Usage

Requirements

Training

License

Acknowledgements

Citation

About

Releases

Packages

Languages

License

ZLKong/Tri-Level-ViT

Folders and files

Latest commit

History

Repository files navigation

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

Usage

Requirements

Training

License

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages