Skip to content

Implementation of the original vision transformer, for self-educational purposes. Trains from scratch on any dataset, such as ImageNet 1k.

Notifications You must be signed in to change notification settings

caiocj1/vit-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Transformer Implementation


Implementation of the vision transformer from [1] in PyTorch, for self-educational purposes. Trains from scratch. Uses multiple GPUs with nn.DataParallel.

Usage

To launch training, python main.py -v <version_name> -i <path_to_dataset>.

Make sure the path given has train and val folders with images separated by class.

To track training, tensorboard --logdir tb_logs.

Useful repositories:

References

[1] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020). https://arxiv.org/abs/2010.11929.

About

Implementation of the original vision transformer, for self-educational purposes. Trains from scratch on any dataset, such as ImageNet 1k.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages