Skip to content

XuZhengzhuo/LiVT

Repository files navigation

Learning Imbalanced Data with Vision Transformers

Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai and Chun Yuan

This repository is the official PyTorch implementation of the paper LiVT in CVPR 2023.  

 

Environments

python == 3.7
pytorch >= 1.7.0
torchvision >= 0.8.1
timm == 0.3.2
tensorboardX >= 2.1
  1. We recommand to install PyTorch 1.7.0+, torchvision 0.8.1+ and pytorch-image-models 0.3.2.
  2. If your PyTorch is 1.8.1+, a fix is needed to work with timm.
  3. See requirements.txt for detailed requirements. You don't have to be in strict agreement with it, just for reference.

Data preparation

We adopt torchvision.datasets.ImageFolder to build our dataloaders. Hence, we resort all datasets (ImageNet-LT, iNat18, Places-LT, CIFAR) as follows:

/path/to/ImageNet-LT/
    train/
        class1/
            img1.jpeg
        class2/
            img2.jpeg
    val/
        class1/
            img3.jpeg
        class2/
            img4.jpeg

You can follow the prepare.py to construct your dataset.

The detailed information of these datasets are shown as follows:  

 

Usage

  1. Please set the DATA_PATH and WORK_PATH in util.trainer.py Line 6-7.

  2. Typically, make sure 4 or 8 GPUs and >12GB per GPU Memory are available.

  3. Keep the settings consistent with the follows.

 

 

 

You can see all args in Class Trainer in util/trainer.py.

Specially, for different stage, the commands are:

# MGP stage
python script/pretrain.py
# BFT stage
python script/finetune.py
# evaluate stage
python script/evaluate.py

Results and Models

Balanced Finetuned Models and Masked Generative Pretrained Models.

Dataset Resolution Many Med. Few Acc args log ckpt MGP ckpt
ImageNet-LT 224*224 73.6 56.4 41.0 60.9 download download download Res_224
ImageNet-LT 384*384 76.4 59.7 42.7 63.8 download download download
iNat18 224*224 78.9 76.5 74.8 76.1 download download download Res_128
iNat18 384*384 83.2 81.5 79.7 81.0 download download download

Citation

If you find our idea or code inspiring, please cite our paper:

@inproceedings{LiVT,
  title={Learning Imbalanced Data with Vision Transformers},
  author={Xu, Zhengzhuo and Liu, Ruikang and Yang, Shuo and Chai, Zenghao and Yuan, Chun},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

This code is partially based on Prior-LT, if you use our code, please also cite:

@inproceedings{PriorLT,
  title={Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective},
  author={Xu, Zhengzhuo and Chai, Zenghao and Yuan, Chun},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

Acknowledgements

This project is highly based on DeiT and MAE.

The CIFAR code is based on LDAM and Prior-LT.

The loss implementations are based on CB, LDAM, LADE, PriorLT and MiSLAS.

About

LiVT PyTorch Implementation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages