This is a PyTorch official implementation of the paper Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture
- PyTorch: 1.13.1
- CUDA: 11.6
- timm: 0.6.13
- kornia: 0.6.11
- mmsegmentation: v0.30.0
- mmdetection: v2.28.1
To pre-train ViT-Small (recommended default) with single-node distributed training, run the following on 1 nodes with 8 GPUs. our default pretraining epoch is 100.
python -m torch.distributed.launch --nnodes 1 --nproc_per_node 8 main_pretrain.py --data /data_path CoCo or ADE20K --batch_size 64 --model gtsa_small
The following table provides the pre-trained checkpoints used in the paper.
Model | Pretraining Data | Pretrain Epochs | Checkpoint |
---|---|---|---|
GTSA(ours) | COCO train2017 | 100 | Download |
GTSA(ours) | ADE20K(2016) train | 100 | Download |
DINO | COCO train2017 | 100 | Download |
DINO | ADE20K(2016) train | 100 | Download |
1.Classification
We evaluated the performance of our models on the iNaturalists 2019 classification benchmark.
To fine-tuning ViT-Small with iNat19 dataset, first go to dir ./downstream/classification and run the following on 1 nodes with 8 GPUs
python -m torch.distributed.launch --nproc_per_node=8 --nnodes 1 main_finetune.py --accum_iter 1 --batch_size 128 --model vit_small --finetune /your_checkpoint --epochs 300 --blr 5e-4 --layer_decay 0.65 --weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 --dist_eval
The following table provides the finetuning log.
Model | Pretraining Data | Pretrain Epochs | Fintuning Data | Log |
---|---|---|---|---|
GTSA(ours) | COCO train2017 | 100 | iNaturalists 2019 | Download |
DINO | COCO train2017 | 100 | iNaturalists 2019 | Download |
The results should be
Method | Top-1 Acc | Top-5 Acc |
---|---|---|
DINO | 54.8 | 82.9 |
GTSA (Ours) | 59.7 | 85.7 |
2.Detection & Instace Segmentation
We evaluated the performance of our models on the COCO 2017 Detection & Instace Segmentation benchmark with mask-rcnn model.
To fine-tuning mask-rcnn with COCO dataset, first download mmdetection. and use configs, model of ours(in /dowstream/mmdet). The following code should run mmdetection dir.
tools/dist_train.sh /your_path/GTSA/downstream/mmdet/my_configs/CoCo_GTSA_mask_rcnn_vit_small_12_p16_1x_coco.py 8 --work-dir ./save
The following table provides the finetuning log.
Model | Pretraining Data | Pretrain Epochs | Fintuning Data | Log |
---|---|---|---|---|
GTSA(ours) | COCO train2017 | 100 | COCO2017 | Download |
DINO | COCO train2017 | 100 | COCO2017 | Download |
The results should be
Method | Detection | Instance Segmentation | ||||
---|---|---|---|---|---|---|
APb | APb50 | APb75 | APm | APm50 | APm75 | |
DINO | 32.4 | 54.2 | 33.8 | 30.8 | 51.1 | 32.2 |
GTSA(ours) | 35.8 | 57.8 | 38.5 | 33.5 | 54.7 | 35.3 |
3.Semantic Segmentation
We evaluated the performance of our models on the ADE20K Semantic Segmentation benchmark.
To fine-tuning Semantic FPN with ADE20K dataset, first download mmsegmentation. Second convert checkpoint to mmsegmentation vit style with following code.
python tools/model_converters/vit2mmseg.py /your_checkpoint ./new_checkpoint_name
Finally, use configs of ours(in /dowstream/mmseg). The following code should run mmsegmentation dir.
tools/dist_train.sh /your_path/GTSA/downstream/mmseg/my_configs/ADE20K_GTSA_pretrained_semfpn_vit-s16_512_512_40k_ade20k.py 8 --work-dir ./save --seed 0 --deterministic
The following table provides the finetuning log.
Model | Pretraining Data | Pretrain Epochs | Fintuning Data | Log |
---|---|---|---|---|
GTSA(ours) | COCO train2017 | 100 | ADE20K | Download |
GTSA(ours) | ADE20K(2016) train | 100 | ADE20K | Download |
DINO | COCO train2017 | 100 | ADE20K | Download |
DINO | ADE20K(2016) train | 100 | ADE20K | Download |
The results should be
Method | aAcc | mIoU | mAcc |
---|---|---|---|
DINO | 74.7 | 27.3 | 35.9 |
GTSA (Ours) | 76.4 | 30.6 | 40.0 |