📢 Note:This project is still under active development, and the benchmark will be continuously maintained.
TeleEgo is a comprehensive omni benchmark designed for multi-person, multi-scene, multi-task, and multimodal long-term memory reasoning in egocentric video streams. It reflects realistic personal assistant scenarios where continuous egocentric video data is collected across hours or even days, requiring models to maintain and reason over memory, understanding, and cross-memory reasoning. Omni here means that TeleEgo covers the full spectrum of roles, scenes, tasks, modalities, and memory horizons, offering all-round evaluation for egocentric AI assistants.
TeleEgo provides:
- 🧠 Omni-scale, diverse egocentric data from 5 roles across 4 daily scenarios.
- 🎤 Multi-modal annotations: video, narration, and speech transcripts.
- ❓ Fine-grained QA benchmark: 3 cognitive dimensions, 12 subcategories.
- Participants: 5 (balanced gender)
- Scenarios:
- Work & Study
- Lifestyle & Routines
- Social Activities
- Outings & Culture
- Recording: 3 days/participant (~14.4 hours each)
- Modalities:
- Egocentric video streams
- Speech & conversations
- Narration and event descriptions
TeleEgo-QA evaluates models along three main dimensions:
-
Memory
- Short-term / Long-term / Ultra-long Memory
- Entity Tracking
- Temporal Comparison & Interval
-
Understanding
- Causal Understanding
- Intent Inference
- Multi-step Reasoning
- Cross-modal Understanding
-
Cross-Memory Reasoning
- Cross-temporal Causality
- Cross-entity Relation
- Temporal Chain Understanding
Each QA instance includes:
- Question type: Single-choice, Multi-choice, Binary, Open-ended
TeleEgo/
│
├── teleego_data/ # Dataset samples / metadata (link provided separately)
├── weights/ # Pre-trained weights (MiniCPM-o, Qwen2.5-Omni, ...)
├── TeleEgo_gemini25_pro_eval.py # Evaluation scripts
├── TeleEgo_gpt4o_eval.py # Evaluation scripts
├── TeleEgo_minicpm_eval.py # Evaluation scripts
├── TeleEgo_qwen25_eval.py # Evaluation scripts
├── TeleEgo_qweno25_eval.py # Evaluation scripts
├── TeleEgo_videochat_eval.py # Evaluation scripts
└── README.md # This file
Due to privacy and licensing constraints, please request access here: 📝 Dataset Access Form.
python TeleEgo_gpt4o_eval.pySubmit your results to our 🏆 Online Leaderboard.
If you find our TeleEgo in your research, please cite:
@misc{yan2025teleegobenchmarkingegocentricai,
title={TeleEgo: Benchmarking Egocentric AI Assistants in the Wild},
author={Jiaqi Yan and Ruilong Ren and Jingren Liu and Shuning Xu and Ling Wang and Yiheng Wang and Yun Wang and Long Zhang and Xiangyu Chen and Changzhi Sun and Jixiang Luo and Dell Zhang and Hao Sun and Chi Zhang and Xuelong Li},
year={2025},
eprint={2510.23981},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.23981},
}This project is licensed under the MIT License. Dataset usage is restricted under a research-only license.
If you have any questions, please feel free to reach out: chxy95@gmail.com.
✨ TeleEgo is an Omni benchmark, a step toward building personalized AI assistants with true long-term memory, reasoning and decision-making in real-world wearable scenarios. ✨

