This repo is the official implementation of EgoThinker at NeurIPS 2025
"Unveiling Egocentric Reasoning with Spatio-Temporal CoT"
Baoqi Pei, Yifei Huang, Jilan Xu, Yuping He, Guo Chen,
Fei Wu, Yu Qiao, Jiangmiao Pang
⭐️: We are also working on a updated version for spatial understanding and embodied QA, stay tuned!
Egocentric video reasoning focuses on the unseen, egocentric agent who shapes the scene, demanding inference of hidden intentions and fine-grained interactions—areas where current MLLMs struggle. We present EgoThinker, a framework that equips MLLMs with strong egocentric reasoning via spatio-temporal chain-of-thought supervision and a two-stage curriculum. We build EgoRe-5M, a large-scale QA dataset derived from 13M egocentric clips, featuring multi-minute segments with detailed rationales and dense hand–object grounding. Trained with SFT on EgoRe-5M and refined with RFT for better spatio-temporal localization, EgoThinker outperforms prior methods on multiple egocentric benchmarks and yields substantial gains in fine-grained localization tasks.
- 2025-10-29: We released EgoThinker-v1 ckpt and training data.
- 2025-10-28: We released our paper and code.
This repo contains three parts:
- EgoThinker-SFT: SFT training code for EgoThinker.
- EgoThinker-RFT: RFT training code for EgoThinker.
- lmms-eval: Evaluation code for egocentric and embodied QA benchmarks.
We welcome feedback and issues. Thank you for trying our EgoThinker!
@misc{pei2025egothinkerunveilingegocentricreasoning,
      title={EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT}, 
      author={Baoqi Pei and Yifei Huang and Jilan Xu and Yuping He and Guo Chen and Fei Wu and Yu Qiao and Jiangmiao Pang},
      year={2025},
      eprint={2510.23569},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.23569}, 
}Our code is built projects:
- Qwen-VL — https://github.com/QwenLM/Qwen3-VL
- VideoChat-R1 — https://github.com/OpenGVLab/VideoChat-R1
- lmms-eval — https://github.com/EvolvingLMMs-Lab/lmms-eval
