Skip to content

TeleAI-UAGI/TeleEgo

Repository files navigation

TeleEgo:
Benchmarking Egocentric AI Assistants in the Wild

Hugging Face arXiv Page

Teaser

📢 Note:This project is still under active development, and the benchmark will be continuously maintained.

📌 Introduction

TeleEgo is a comprehensive omni benchmark designed for multi-person, multi-scene, multi-task, and multimodal long-term memory reasoning in egocentric video streams. It reflects realistic personal assistant scenarios where continuous egocentric video data is collected across hours or even days, requiring models to maintain and reason over memory, understanding, and cross-memory reasoning. Omni here means that TeleEgo covers the full spectrum of roles, scenes, tasks, modalities, and memory horizons, offering all-round evaluation for egocentric AI assistants.

TeleEgo provides:

  • 🧠 Omni-scale, diverse egocentric data from 5 roles across 4 daily scenarios.
  • 🎤 Multi-modal annotations: video, narration, and speech transcripts.
  • Fine-grained QA benchmark: 3 cognitive dimensions, 12 subcategories.

📊 Dataset Overview

  • Participants: 5 (balanced gender)
  • Scenarios:
    • Work & Study
    • Lifestyle & Routines
    • Social Activities
    • Outings & Culture
  • Recording: 3 days/participant (~14.4 hours each)
  • Modalities:
    • Egocentric video streams
    • Speech & conversations
    • Narration and event descriptions

🧪 Benchmark Tasks

TeleEgo-QA evaluates models along three main dimensions:

  1. Memory

    • Short-term / Long-term / Ultra-long Memory
    • Entity Tracking
    • Temporal Comparison & Interval
  2. Understanding

    • Causal Understanding
    • Intent Inference
    • Multi-step Reasoning
    • Cross-modal Understanding
  3. Cross-Memory Reasoning

    • Cross-temporal Causality
    • Cross-entity Relation
    • Temporal Chain Understanding

Each QA instance includes:

  • Question type: Single-choice, Multi-choice, Binary, Open-ended

🗂️ Repository Structure

TeleEgo/
│
├── teleego_data/                # Dataset samples / metadata (link provided separately)
├── weights/                     # Pre-trained weights (MiniCPM-o, Qwen2.5-Omni, ...)
├── TeleEgo_gemini25_pro_eval.py # Evaluation scripts
├── TeleEgo_gpt4o_eval.py        # Evaluation scripts
├── TeleEgo_minicpm_eval.py      # Evaluation scripts
├── TeleEgo_qwen25_eval.py       # Evaluation scripts
├── TeleEgo_qweno25_eval.py      # Evaluation scripts
├── TeleEgo_videochat_eval.py    # Evaluation scripts
└── README.md                    # This file

🚀 Usage

📥 Dataset Access

Due to privacy and licensing constraints, please request access here: 📝 Dataset Access Form.

🧪 Running Evaluations

python TeleEgo_gpt4o_eval.py

Submit your results to our 🏆 Online Leaderboard.


📜 Citation

If you find our TeleEgo in your research, please cite:

@misc{yan2025teleegobenchmarkingegocentricai,
      title={TeleEgo: Benchmarking Egocentric AI Assistants in the Wild}, 
      author={Jiaqi Yan and Ruilong Ren and Jingren Liu and Shuning Xu and Ling Wang and Yiheng Wang and Yun Wang and Long Zhang and Xiangyu Chen and Changzhi Sun and Jixiang Luo and Dell Zhang and Hao Sun and Chi Zhang and Xuelong Li},
      year={2025},
      eprint={2510.23981},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.23981}, 
}

🪪 License

This project is licensed under the MIT License. Dataset usage is restricted under a research-only license.


📬 Contact

If you have any questions, please feel free to reach out: chxy95@gmail.com.


✨ TeleEgo is an Omni benchmark, a step toward building personalized AI assistants with true long-term memory, reasoning and decision-making in real-world wearable scenarios. ✨

TeleAI Logo     TeleEgo Logo

About

The official repo of TeleEgo - A Benchmark for Egocentric AI Assistants.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages