“Complementary Parts Contrastive Learning for Fine-grained Weakly Supervised Object Co-localization” has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT, 2023).
The framework of CPCL. Please refer to Paper Link for details.
- Download the following datasets
Training the model as below:
cd classifier
python train_c.py # train the classification model
python test_c.py # keep the classification result to top1top5.npy
cd ../gengration
python train_g.py # train the Pseudo-label Generation Network
python test_g.py # keep the pseudo masks
cd ../localization
python train_l.py # train the class-agnostic co-localization Network
python test_l.py # evaluate the localization accuracy
- If you want to train your own model, please download the pretrained model into
resource
folder.
Testing the trained model as below:
cd localization
python test_l.py # evaluate the model performance
-
If you want to evaluate the performance of CPCL, please download our trained model:
- CUB-200-2011 extract code:79ap
- Stanford Cars extract code:oqge
- FGVC-Aircraft extract code:42do
- Stanford Dogs extract code:ihc2
put it into the folder
localization/out
. - CUB-200-2011 extract code:79ap
Comparison with state-of-the-art methods on the CUB-200-2011 dataset. The methods in the upper part are based on the unified localization framework and the methods in the lower part are based on the separated localization framework. 'Cls Backbone' is the backbone network used for classification. 'Loc Backbone' is the backbone network used for the pseudo-label generation network and class-agnostic co-localization network. Parameter numbers and FLOPs are shown in the third and fourth columns. The best results are highlighted in bold.
Comparison with state-of-the-art methods on the Stanford Cars, FGVC-Aircraft, and Stanford Dogs dataset. The best results are highlighted in bold.
Visualization of the CUB-200-2011 dataset. (a) Input images. (b) The initial CAMs of the fused feature map output by the SMFF module. (c) The attention maps are generated after Gaussian enhancement. (d) pseudo-labels generated by the pseudo-label generation network. (e) The predicted masks of class-agnostic co-localization network. (f) The predicted masks of the SPOL network.
Many thanks to Shallow Feature Matters for Weakly Supervised Object Localization
If you find the code helpful in your resarch or work, please cite the following paper.
@ARTICLE{CPCL2023,
author={Ma, Lei and Zhao, Fan and Hong, Hanyu and Wang, Lei and Zhu, Ying},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Complementary Parts Contrastive Learning for Fine-grained Weakly Supervised Object Co-localization},
year={2023}
}