EgoThinker

This repo is the official implementation of EgoThinker at NeurIPS 2025

"Unveiling Egocentric Reasoning with Spatio-Temporal CoT"
Baoqi Pei, Yifei Huang, Jilan Xu, Yuping He, Guo Chen,
Fei Wu, Yu Qiao, Jiangmiao Pang

⭐️: We are also working on a updated version for spatial understanding and embodied QA, stay tuned!

Introduction

Egocentric video reasoning focuses on the unseen, egocentric agent who shapes the scene, demanding inference of hidden intentions and fine-grained interactions—areas where current MLLMs struggle. We present EgoThinker, a framework that equips MLLMs with strong egocentric reasoning via spatio-temporal chain-of-thought supervision and a two-stage curriculum. We build EgoRe-5M, a large-scale QA dataset derived from 13M egocentric clips, featuring multi-minute segments with detailed rationales and dense hand–object grounding. Trained with SFT on EgoRe-5M and refined with RFT for better spatio-temporal localization, EgoThinker outperforms prior methods on multiple egocentric benchmarks and yields substantial gains in fine-grained localization tasks.

📰 News

2025-10-29: We released EgoThinker-v1 ckpt and training data.
2025-10-28: We released our paper and code.

🛠️ Method

This repo contains three parts:

EgoThinker-SFT: SFT training code for EgoThinker.
EgoThinker-RFT: RFT training code for EgoThinker.
lmms-eval: Evaluation code for egocentric and embodied QA benchmarks.

🤗 Feedback & Support

We welcome feedback and issues. Thank you for trying our EgoThinker!

📜 Citation

@misc{pei2025egothinkerunveilingegocentricreasoning,
      title={EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT}, 
      author={Baoqi Pei and Yifei Huang and Jilan Xu and Yuping He and Guo Chen and Fei Wu and Yu Qiao and Jiangmiao Pang},
      year={2025},
      eprint={2510.23569},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.23569}, 
}

📄 Acknowledgments

Our code is built projects:

Qwen-VL — https://github.com/QwenLM/Qwen3-VL
VideoChat-R1 — https://github.com/OpenGVLab/VideoChat-R1
lmms-eval — https://github.com/EvolvingLMMs-Lab/lmms-eval

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
EgoThinker-RFT		EgoThinker-RFT
EgoThinker-SFT		EgoThinker-SFT
assets		assets
lmms-eval		lmms-eval
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EgoThinker

Introduction

📰 News

🛠️ Method

🤗 Feedback & Support

📜 Citation

📄 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

InternRobotics/EgoThinker

Folders and files

Latest commit

History

Repository files navigation

EgoThinker

Introduction

📰 News

🛠️ Method

🤗 Feedback & Support

📜 Citation

📄 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages