NeurIPS 2023: This repository is the official implementation of BaCon.
Data imbalance and open-ended distribution are two intrinsic characteristics of the real visual world. Though encouraging progress has been made in tackling each challenge separately, few works dedicated to combining them towards real-world scenarios. While several previous works have focused on classifying close-set samples and detecting open-set samples during testing, it's still essential to be able to classify unknown subjects as human beings. In this paper, we formally define a more realistic task as distribution-agnostic generalized category discovery (DA-GCD): generating fine-grained predictions for both close- and open-set classes in a long-tailed open-world setting. To tackle the challenging problem, we propose a Self-Balanced Co-Advice contrastive framework (BaCon), which consists of a contrastive-learning branch and a pseudo-labeling branch, working collaboratively to provide interactive supervision to resolve the DA-GCD task. In particular, the contrastive-learning branch provides reliable distribution estimation to regularize the predictions of the pseudo-labeling branch, which in turn guides contrastive learning through self-balanced knowledge transfer and a proposed novel contrastive loss. We compare BaCon with state-of-the-art methods from two closely related fields: imbalanced semi-supervised learning and generalized category discovery. The effectiveness of BaCon is demonstrated with superior performance over all baselines and comprehensive analysis across various datasets.
Overview of the self-balanced co-advice contrastive framework (BaCon).
Requirements:
loguru
numpy
pandas
scikit_learn
scipy
torch==1.10.0
torchvision==0.11.1
tqdm
We provide the specific train split of CIFAR-10 and CIFAR-100 with different imbalance ratios, please refer to data_uq_idxs
for details. We also provide the source code in data/imagenet.py
for splitting data to the proposed DA-GCD setting.
bash run_cifar10.sh
bash run_cifar100.sh
bash run_imagenet100.sh
The codebase is largely built on GCD and SimGCD. Thanks for their great work!
@article{bai2023towards,
title={Towards Distribution-Agnostic Generalized Category Discovery},
author={Bai, Jianhong and Liu, Zuozhu and Wang, Hualiang and Chen, Ruizhe and Mu, Lianrui and Li, Xiaomeng and Zhou, Joey Tianyi and Feng, Yang and Wu, Jian and Hu, Haoji},
journal={arXiv preprint arXiv:2310.01376},
year={2023}
}
Our work on self-supervised long-tail learning: On the Effectiveness of Out-of-Distribution Data on Self-Supervised Long-Tail Learning.