Three things everyone should know about Vision Transformers

This repository contains PyTorch evaluation code, training code and pretrained models for the following projects:

DeiT (Data-Efficient Image Transformers), ICML 2021
CaiT (Going deeper with Image Transformers), ICCV 2021 (Oral)
ResMLP (ResMLP: Feedforward networks for image classification with data-efficient training)
PatchConvnet (Augmenting Convolutional networks with attention-based aggregation)
3Things (Three things everyone should know about Vision Transformers)
DeiT III (DeiT III: Revenge of the ViT)

For details see Three things everyone should know about Vision Transformers by Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek and Hervé Jégou.

If you use this code for a paper please cite:

@article{Touvron2022ThreeTE,
  title={Three things everyone should know about Vision Transformers},
  author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Jakob Verbeek and Herve Jegou},
  journal={arXiv preprint arXiv:2203.09795},
  year={2022},
}

Attention only fine-tuning

We propose to finetune only the attentions (flag --attn-only) to adapt the models to higher resolutions or to do transfer learning.

MLP patch projection

We propose to replace the linear patch projection by an MLP patch projection (see class hMLP_stem). A key advantage is that this pre-processing stem is compatible with and improves mask-based self-supervised training like BeiT.

Parallel blocks

We propose to use block in parallele in order to have more flexible architectures (see class Layer_scale_init_Block_paralx2):

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_3things.md

README_3things.md

Three things everyone should know about Vision Transformers

Attention only fine-tuning

MLP patch projection

Parallel blocks

License

Contributing

Files

README_3things.md

Latest commit

History

README_3things.md

File metadata and controls

Three things everyone should know about Vision Transformers

Attention only fine-tuning

MLP patch projection

Parallel blocks

License

Contributing