Skip to content

Enigmatisms/Maevit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Maevit


(1) Unofficial implementation of Vision Transformer and its variants: ViT-Lite, Compact Vision Transformer and Compact Convolution Transformer:

​ For more information, plz refer to my blog post: Event Horizon: Vision Transformers.

(2) Unofficial implementation of Swin Transformer.

​ For more information, plz refer to my blog post: Event Horizon: Swin Transformer复现.

​ For swin transformer, guide for running the code is omitted (for it closely resembles to that of CCT).


Requirements

Run the following for dependency check.

python3 -m pip install -r requirements.txt
torch torchvision numpy matplotlib tensorboard timm
>=1.7.0 >=0.8.1 >=1.18.5 >=3.3.1 >=2.4.0 >=0.4.5

​ Plus, cuDNN is suggested, for in the main executable train.py, cuDNN acceleration is set.

​ CUDA is also required. Guess you won't want to fry eggs with your CPU for several days.


Repo Structure

.
├── logs/ --- tensorboard log storage (compulsory)
├── model/ --- folder from and to which the models are loaded & stored (compulsory)
├── check_points/ --- check_points folder (compulsory)
├── train.sh --- Script for training
├── train.py --- Python main module
├── plot.sh --- quick tensorboard intialization script
├── test --- quick testing script
	 ├── ...
├── swin
	 ├── swinLayer.py --- swin transformer class definition
	 └── wniMSA.py	--- window-based and shifted-window-based multihead attention block definition
└── py/ 
	 ├── CCT.py  --- Compact Convolution Transformer class
	 ├── LECosineAnnealing.py --- LECAWS lr, see my blog for more info
	 ├── LabelSmoothing.py --- Label smooting cross entropy loss
	 ├── StochasticDepth.py --- Adopted from timm
	 ├── TEncoder.py --- Tranformer encoder
	 ├── ViTLite.py --- ViT-Lite Implementaion
	 ├── configs.py --- Mixup configurations
	 ├── train_utils.py --- training utility functions.
	 └── SeqPool.py --- Sequential pooling layer

Run the Code

​ Since argparser is used, some arguments must be given or they can only be loaded by defaults. In train.sh, some of the most-used args are provided. To run the code, therefore, run:

sudo chmod +x ./train.sh
./train.sh

​ Make sure you have check_points model logs.

​ In py/train_utils.py, the loading path of dataset is specified. ../dataset/ is the default cifar-10 dataset path. If you have no downloaded CIFAR-10 dataset, function getCIFAR10Dataset will download the dataset if root is set to ../datatset and the root is empty. Once the dataset is ready, everything might just be good to go.


Results

​ The tested implementation is CCT-7 transformer layers-1 x 3*3 conv kernel. The result of official counterpart is: (200 epochs - acc 94.78%).

​ The implementation of mine is: (300 epochs - training acc: roughly 100%, test acc: 94.5%), without mixup or cutmix like it does in SHI-Labs/Compact-Transformers.

​ The following image is the final version with mixup and cutmix (configuration is nquite different from the official implementation). I didn't train this from the scratch:

​ No result available for swin transformer on ImageNet. Swin transformer is trained on imagenette2-320, which has not been fully trained, yet.

About

My own ViT (Basic ViT & Compact Transformer) & Swin Transformer implementation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published