Skip to content

Latest commit

 

History

History
54 lines (31 loc) · 2.19 KB

README_3things.md

File metadata and controls

54 lines (31 loc) · 2.19 KB

Three things everyone should know about Vision Transformers

This repository contains PyTorch evaluation code, training code and pretrained models for the following projects:

  • DeiT (Data-Efficient Image Transformers), ICML 2021
  • CaiT (Going deeper with Image Transformers), ICCV 2021 (Oral)
  • ResMLP (ResMLP: Feedforward networks for image classification with data-efficient training)
  • PatchConvnet (Augmenting Convolutional networks with attention-based aggregation)
  • 3Things (Three things everyone should know about Vision Transformers)
  • DeiT III (DeiT III: Revenge of the ViT)

For details see Three things everyone should know about Vision Transformers by Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek and Hervé Jégou.

If you use this code for a paper please cite:

@article{Touvron2022ThreeTE,
  title={Three things everyone should know about Vision Transformers},
  author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Jakob Verbeek and Herve Jegou},
  journal={arXiv preprint arXiv:2203.09795},
  year={2022},
}

Attention only fine-tuning

We propose to finetune only the attentions (flag --attn-only) to adapt the models to higher resolutions or to do transfer learning.

MLP patch projection

We propose to replace the linear patch projection by an MLP patch projection (see class hMLP_stem). A key advantage is that this pre-processing stem is compatible with and improves mask-based self-supervised training like BeiT.

Parallel blocks

We propose to use block in parallele in order to have more flexible architectures (see class Layer_scale_init_Block_paralx2):

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.