Efficient VisionTransformer

This repository contains implementation for the paper Training a Vision Transformer from scratch in less than 24 hours with 1 GPU published in HiTY workshop at Neurips 2022.

The implementation is a PyTorch evaluation code and training code based on DeiT. We also use and edit some code from LocalViT, Timm and torchvision.

In all experiments we build on DeiT-small model, and try to make the training more efficient time-wise (24 hours) and GPU-wise (1). This includes removing warm-up, an improved LocalViT model, in addition to our own multi-size training. There's also the possibility to use LayerScale in the code.

Our Best results are as below:

Before using it, make sure you have the pytorch-image-models package timm==0.3.2 by Ross Wightman installed.

Usage

First, clone the repository locally:

Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Training

In all experiments with 1 GPU we use --batch-size 64 and --lr 1e-3. (If you want to experiment with 4 GPUs, use --batch-size 128 and --lr 2e-4) We stop the training after 1 day.

To Train the network with the best config on 1 GPU, run varsize_1gpu_best.sh with your own paths.

Results

To plot the accuracy per time results, use plot_output.py with your own paths.

Cite

Please cite the paper if you use the idea or code.

@misc{irandoust2022training,
      title={{Training a Vision Transformer from scratch in less than 24 hours with 1 GPU}}, 
      author={Saghar Irandoust and Thibaut Durand and Yunduz Rakhmangulova and Wenjie Zi and Hossein Hajimirsadeghi},
      year={2022},
      eprint={2211.05187},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
models_vit		models_vit
torchv_data		torchv_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
engine.py		engine.py
hubconf.py		hubconf.py
losses.py		losses.py
main.py		main.py
models.py		models.py
plot_output.py		plot_output.py
requirements.txt		requirements.txt
samplers.py		samplers.py
utils.py		utils.py
varsize_1gpu_best.sh		varsize_1gpu_best.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient VisionTransformer

Usage

Data preparation

Training

Results

Cite

About

Releases

Packages

Contributors 2

Languages

License

BorealisAI/efficient-vit-training

Folders and files

Latest commit

History

Repository files navigation

Efficient VisionTransformer

Usage

Data preparation

Training

Results

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages