by Jiajin Tang*, Zhengxuan Wei*, Ge Zheng, Sibei Yang†
*Equal contribution; †Corresponding Author
Humans can perform previously unexperienced interactions with novel objects simply by observing others engage with them. Weakly-supervised affordance grounding mimics this process by learning to locate object regions that enable actions on egocentric images, using exocentric interaction images with image-level annotations. However, extracting affordance knowledge solely from exocentric images and transferring it one-way to egocentric images limits the applicability of previous works in complex interaction scenarios. Instead, this study introduces LoopTrans, a novel closed-loop framework that not only transfers knowledge from exocentric to egocentric but also transfers back to enhance exocentric knowledge extraction. Within LoopTrans, several innovative mechanisms are introduced, including unified cross-modal localization and denoising knowledge distillation, to bridge domain gaps between object-centered egocentric and interaction-centered exocentric images while enhancing knowledge transfer. Experiments show that LoopTrans achieves consistent improvements across all metrics on image and video benchmarks, even handling challenging scenarios where object interaction regions are fully occluded by the human body.
git clone git clone https://github.com/SooLab/LoopTrans.git
cd LoopTrans
Please create a conda environment and install the required packages:
conda create -n looptrans python=3.7 -y
conda activate looptrans
pip install -r requirements.txt
Download the AGD20K dataset from [ Google Drive | Baidu Pan (g23n) ] .
We use ACSeg as our clustering module. We have pre-trained two modules via unsupervised learning on the AGD20K "Seen" and "Unseen" training sets, respectively. You can download these models from Google Drive and place the files in the ./ckpts directory.
Run following commands to start training:
python train.py --data_root <PATH_TO_DATA>
Run following commands to start testing:
python test.py --data_root <PATH_TO_DATA> --model_file <PATH_TO_MODEL>
@inproceedings{tang2025closed,
title={Closed-Loop Transfer for Weakly-supervised Affordance Grounding},
author={Tang, Jiajin and Wei, Zhengxuan and Zheng, Ge and Yang, Sibei},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={9530--9539},
year={2025}
}
This repo is based on LOCATE , Cross-View-AG. Thanks for their great work!
