AGILE

Hand-Object Interaction Reconstruction from Video via Agentic Generation

Jin-Chuan Shi^1*, Binhong Ye^1*, Tao Liu¹, Xiaoyang Liu¹, Yangjinhui Xu¹, Junzhe He¹, Zeju Li¹, Hao Chen¹, Chunhua Shen^1,2

¹State Key Lab of CAD & CG, Zhejiang University ²Zhejiang University of Technology

^*Equal contribution

ACM SIGGRAPH 2026, Conference Track

News

[2026.03] AGILE is conditionally accepted to ACM SIGGRAPH 2026 (Conference Track)!
[2026.03] Project page and paper released.
[Coming Soon] Code will be released incrementally, with the full codebase available by June 2026. Stay tuned!

Abstract

Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neural rendering often yields fragmented, non-simulation-ready geometries under heavy occlusion, and (2) dependence on brittle Structure-from-Motion (SfM) initialization leads to frequent failures on in-the-wild footage.

We introduce AGILE, a robust framework that shifts the paradigm from reconstruction to agentic generation for interaction learning. Our method employs an agentic pipeline where a Vision-Language Model (VLM) guides a generative model to synthesize complete, watertight object meshes with high-fidelity textures, independent of video occlusions. Bypassing fragile SfM entirely, we propose a robust anchor-and-track strategy that initializes the object pose at a single interaction onset frame and propagates it temporally. A contact-aware optimization integrates semantic, geometric, and interaction stability constraints to enforce physical plausibility.

Extensive experiments on HO3D, DexYCB, and in-the-wild videos show that AGILE outperforms baselines in global geometric accuracy while demonstrating exceptional robustness on challenging sequences where prior methods frequently collapse.

Citation

If you find this work useful, please consider citing:

@article{shi2026agile,
  title={AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation},
  author={Shi, Jin-Chuan and Ye, Binhong and Liu, Tao and Liu, Xiaoyang and Xu, Yangjinhui and He, Junzhe and Li, Zeju and Chen, Hao and Shen, Chunhua},
  journal={arXiv preprint arXiv:2602.04672},
  year={2026}
}

Acknowledgements

We thank the authors of HOLD, MagicHOI, WiLoR, FoundationPose, MoGe, MegaSAM, and Viser for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AGILE

Hand-Object Interaction Reconstruction from Video via Agentic Generation

News

Abstract

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AGILE

Hand-Object Interaction Reconstruction from Video via Agentic Generation

News

Abstract

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages