This is the official PyTorch implementation of BatchFormer for Long-Tailed Recognition, Domain Generalization, Compositional Zero-Shot Learning, Contrastive Learning.
Sample Relationship Exploration for Robust Representation Learning
Please also refer to BatchFormerV2, in which we introduce a BatchFormerV2 module for vision Transformers.
All(R10) | Many(R10) | Med(R10) | Few(R10) | All(R50) | Many(R50) | Med(R50) | Few(R50) | |
---|---|---|---|---|---|---|---|---|
RIDE(3 experts)[1] | 44.7 | 57.0 | 40.3 | 25.5 | 53.6 | 64.9 | 50.4 | 33.2 |
+BatchFormer | 45.7 | 56.3 | 42.1 | 28.3 | 54.1 | 64.3 | 51.4 | 35.1 |
PaCo[2] | - | - | - | - | 57.0 | 64.8 | 55.9 | 39.1 |
+BatchFormer | - | - | - | - | 57.4 | 62.7 | 56.7 | 42.1 |
Here, we demonstrate the result on one-stage RIDE (ResNext-50)
All | Many | Medium | Few | |
---|---|---|---|---|
RIDE(3 experts)* | 55.9 | 67.3 | 52.8 | 34.6 |
+BatchFormer | 56.5 | 66.6 | 54.2 | 36.0 |
All | Many | Medium | Few | |
---|---|---|---|---|
RIDE(3 experts) | 72.5 | 68.1 | 72.7 | 73.2 |
+BatchFormer | 74.1 | 65.5 | 74.5 | 75.8 |
Epochs | Top-1 | Pretrained | |
---|---|---|---|
MoCo-v2[3] | 200 | 67.5 | |
+BatchFormer | 200 | 68.4 | download |
MoCo-v3[4] | 100 | 68.9 | |
+BatchFormer | 100 | 69.6 | download |
Here, we provide the pretrained MoCo-V3 model corresponding to this strategy.
PACS | VLCS | OfficeHome | Terra | |
---|---|---|---|---|
SWAD[5] | 82.9 | 76.3 | 62.1 | 42.1 |
+BatchFormer | 83.7 | 76.9 | 64.3 | 44.8 |
MIT-States(AUC) | MIT-States(HM) | UT-Zap50K(AUC) | UT-Zap50K(HM) | C-GQA(AUC) | C-GQA(HM) | |
---|---|---|---|---|---|---|
CGE*[6] | 6.3 | 20.0 | 31.5 | 46.5 | 3.7 | 14.9 |
+BatchFormer | 6.7 | 20.6 | 34.6 | 49.0 | 3.8 | 15.5 |
Experiments on CUB.
Unseen | Seen | Harmonic mean | |
---|---|---|---|
CUB[7]* | 67.5 | 65.1 | 66.3 |
+BatchFormer | 68.2 | 65.8 | 67.0 |
- Long-tailed recognition by routing diverse distribution-aware experts. In ICLR, 2021
- Parametric contrastive learning. In ICCV, 2021
- Improved baselines with momentum contrastive learning.
- An empirical study of training self-supervised vision transformers. In CVPR, 2021
- Domain generalization by seeking flat minima. In NeurIPS, 2021.
- Learning graph embeddings for compositional zero-shot learning. In CVPR, 2021
- Contrastive learning based hybrid networks for long- tailed image classification. In CVPR, 2021
The proposed BatchFormer can be implemented with a few lines as follows,
def BatchFormer(x, y, encoder, is_training):
# x: input features with the shape [N, C]
# encoder: TransformerEncoderLayer(C,4,C,0.5)
if not is_training:
return x, y
pre_x = x
x = encoder(x.unsqueeze(1)).squeeze(1)
x = torch.cat([pre_x, x], dim=0)
y = torch.cat([y, y], dim=0)
return x, y
If you find this repository helpful, please consider cite:
@inproceedings{hou2022batch,
title={BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning},
author={Hou, Zhi and Yu, Baosheng and Tao, Dacheng},
booktitle={CVPR},
year={2022}
}
Feel free to contact "zhou9878 at uni dot sydney dot edu dot au" if you have any questions.