SERE: Exploring Feature Self-relation for Self-supervised Transformer (TPAMI 2023)

The official codebase for SERE: Exploring Feature Self-relation for Self-supervised Transformer.

Introduction

Learning representations with self-supervision for convolutional networks (CNN) has been validated to be effective for vision tasks. As an alternative to CNN, vision transformers (ViT) have strong representation ability with spatial self-attention and channel-level feedforward networks. Recent works reveal that self-supervised learning helps unleash the great potential of ViT. Still, most works follow self-supervised strategies designed for CNN, e.g., instance-level discrimination of samples, but they ignore the properties of ViT. We observe that relational modeling on spatial and channel dimensions distinguishes ViT from other networks. To enforce this property, we explore the feature SElf-RElation (SERE) for training self-supervised ViT. Specifically, instead of conducting self-supervised learning solely on feature embeddings from multiple views, we utilize the feature self-relations, i.e., spatial/channel self-relations, for self-supervised learning. Self-relation based learning further enhances the relation modeling ability of ViT, resulting in stronger representations that stably improve performance on multiple downstream tasks.

Installation

Please install PyTorch and download the ImageNet dataset. This codebase has been developed with python version 3.8, PyTorch version 1.10.1, CUDA 11.3 and torchvision 0.11.2.

Training and Pre-trained Models

Architecture	Method	Parameters	Pre-training Epochs	Fine-tuning Epochs	Top-1	download
ViT-S/16	iBOT+SERE	21M	100	100	81.5%	backbone
ViT-B/16	iBOT+SERE	85M	100	100	83.7%	backbone

IBOT+SERE with ViT-S/16:

python -m torch.distributed.launch --nproc_per_node=8 \
--master_port=$PORT \
main_sere.py \
--arch vit_small \
--output_dir $OUTPUT_DIR \
--data_path $IMAGENET \
--teacher_temp 0.07 \
--warmup_teacher_temp_epochs 30 \
--norm_last_layer false \
--epochs 100 \
--shared_head true \
--out_dim 8192 \
--local_crops_number 10 \
--global_crops_scale 0.40 1 \
--local_crops_scale 0.05 0.40 \
--pred_ratio 0 0.3 \
--pred_ratio_var 0 0.2 \
--batch_size_per_gpu 128 \
--num_workers 6 \
--saveckp_freq 10 \
--alpha 0.2 \
--beta 0.5 \
--clip_grad 0.3

IBOT+SERE with ViT-B/16:

python -m torch.distributed.launch --nproc_per_node=8 \
--master_port=$PORT \
main_sere.py \
--arch vit_base \
--output_dir $OUTPUT_DIR \
--data_path $IMAGENET \
--teacher_temp 0.07 \
--teacher_patch_temp 0.07 \
--warmup_teacher_temp 0.04 \
--warmup_teacher_patch_temp 0.04 \
--warmup_teacher_temp_epochs 50 \
--norm_last_layer true \
--warmup_epochs 10 \
--epochs 100 \
--lr 0.00075 \
--min_lr 2e-6 \
--weight_decay 0.04 \
--weight_decay_end 0.4 \
--shared_head true \
--shared_head_teacher true \
--out_dim 8192 \
--patch_out_dim 8192 \
--local_crops_number 10 \
--global_crops_scale 0.32 1 \
--local_crops_scale 0.05 0.32 \
--pred_ratio 0 0.3 \
--pred_ratio_var 0 0.2 \
--pred_shape block \
--batch_size_per_gpu 128 \
--num_workers 6 \
--saveckp_freq 10 \
--freeze_last_layer 3 \
--clip_grad 0.3 \
--alpha 0.2 \
--beta 0.5 \
--use_fp16 true

Evaluation

We fully fine-tune the pre-trained models on ImageNet-1K by using the codebase of MAE.

For downstream tasks, e.g., semantic segmentation, PLease refer to iBOT.

Addentionally, we also use ImageNetSegModel to implement semi-supevised semantic segmentation on ImageNet-S dataset.

Citing SERE

If you find this repository useful, please consider giving a star and a citation:

@article{li2023sere,
  title={SERE: Exploring Feature Self-relation for Self-supervised Transformer},
  author={Zhong-Yu Li and Shanghua Gao and Ming-Ming Cheng},
  journal=TPAMI,
  year={2023}
}

License

The code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Noncommercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is built using the DINO repository, the iBOT repository, and the MAE repository.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
framework.png		framework.png
loader.py		loader.py
main_sere.py		main_sere.py
transforms_overlap.py		transforms_overlap.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

framework.png

framework.png

loader.py

loader.py

main_sere.py

main_sere.py

transforms_overlap.py

transforms_overlap.py

utils.py

utils.py

Repository files navigation

SERE: Exploring Feature Self-relation for Self-supervised Transformer (TPAMI 2023)

Introduction

Installation

Training and Pre-trained Models

Evaluation

Citing SERE

License

Acknowledgement

About

Releases

Packages

Languages

License

MCG-NKU/SERE

Folders and files

Latest commit

History

Repository files navigation

SERE: Exploring Feature Self-relation for Self-supervised Transformer (TPAMI 2023)

Introduction

Installation

Training and Pre-trained Models

Evaluation

Citing SERE

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages