This is a PyTorch implementation of DILEMMA for self-supervised ViT.
This is the reimplementation of the original work with minimal changes to the official pytorch implementation of MoCov3.
Install PyTorch and download the ImageNet dataset following the official PyTorch ImageNet training code. Similar to MoCo v1/2, this repo contains minimal modifications on the official PyTorch ImageNet code. We assume the user can successfully run the official PyTorch ImageNet code. For ViT models, install timm.
python main_moco.py \
-a vit_small -b 1024 \
--optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
--epochs=300 --warmup-epochs=40 \
--stop-grad-conv1 --moco-m-cos --moco-t=.2 \
--dist-url 'tcp://localhost:10001' \
--multiprocessing-distributed --world-size 1 --rank 0 \
[your imagenet-folder with train and val folders]
- To enable sparsity you should set
--token_drop_rate
to a non-zero value, for example 0.75 keeps 25% of the tokens, general rule of thumb is that bigger models can have larger sparsities. - To enable DILEMMA loss, you can set
--dilemma_probability
to a non-zero value, it seems that 0.2 is always good. - The batch size specified by
-b
is the total batch size across all GPUs. - The learning rate specified by
--lr
is the base lr, and is adjusted by the linear lr scaling rule in this line. - Using a smaller batch size has a more stable result (see paper), but has lower speed. Using a large batch size is critical for good speed in TPUs (as we did in the paper).
- In this repo, only multi-gpu, DistributedDataParallel training is supported; single-gpu or DataParallel training is not supported. This code is improved to better suit the multi-node setting, and by default uses automatic mixed-precision for pre-training.
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
@article{Sameni2022DILEMMASS,
title={DILEMMA: Self-Supervised Shape and Texture Learning with Transformers},
author={Sepehr Sameni and Simon Jenni and Paolo Favaro},
journal={ArXiv},
year={2022},
volume={abs/2204.04788}
}