Skip to content

InternRobotics/EBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English | 简体中文

EBench — Elemental Mobile Manipulation Benchmark

EBench: Elemental Mobile Manipulation Benchmark

From a single success rate to a multi-axis capability profile. Released by Shanghai AI Laboratory.

Project Page arXiv Leaderboard License

Simulation Server Client CLI

HF Assets HF Dataset ModelScope


What is EBench?

EBench is an indoor VLA manipulation benchmark built on NVIDIA Isaac Sim. Instead of compressing a model's behaviour into a single overall success rate, it produces a multi-axis capability profile that exposes what a model is good at — and where it overfits.

This repository is the project entry point. It hosts reference baselines and convenience scripts; the simulation runtime, the gmp CLI, and the datasets each live in their own repositories (linked in the badges above).

What makes EBench different

  • Three manipulation regimes in one benchmark — covers long-horizon, dexterous & precise, and mobile manipulation, regimes most benchmarks address in isolation.
  • 5-axis atomic diagnostic — every task is labelled by Scene · Atomic Skill · Horizon · Precision · Mobility, so a black-box score becomes a readable strength/weakness map.
  • 4-axis generalization tests — controlled perturbations along Object · Background · Instruction · Mixed, attributing OOD drops to a specific axis.
  • Strict train / test isolationval_train and val_unseen are open for tuning; the held-out test (Test-Mini) drives the leaderboard, so numbers reflect real generalization rather than fitting to the eval distribution.

Two evaluation tracks are exposed: Specialist (Tabletop or Mobile Manip) and Generalist (both at once).

For the full methodology, task taxonomy, and per-axis rationale, see the project documentation.

Project layout

EBench is split across a small constellation of repositories. This repo is the front door:

Component Where it lives What it provides
EBench (this repo) InternRobotics/EBench Reference baselines, scripts, project entry point
GenManip InternRobotics/GenManip Isaac Sim evaluation server, task configs
genmanip-client InternRobotics/genmanip-client gmp CLI + EvalClient Python API
EBench-Assets 🤗 EBench-Assets Scenes, objects, and task assets
EBench-Dataset 🤗 EBench-Dataset Training trajectories (LeRobot format)
Docs site internrobotics.github.io/EBench-doc Setup, evaluation workflow, CLI reference
Online Challenge internrobotics.shlab.org.cn/eval Remote evaluation, leaderboard, diagnostic reports
EBench/
├── baselines/       # Reference policies (one sub-folder per baseline)
├── scripts/         # Evaluation and analysis scripts
├── assets/          # Static assets used by this README
├── LICENSE
└── README.md

Quickstart

EBench runs as a client–server system. The server runs Isaac Sim; the client (gmp CLI) is a tiny package that drops into your model's Python environment.

# 1. Bring up the server  →  see Environment Setup
#    https://internrobotics.github.io/EBench-doc/getting-started/environment/

# 2. Install the client in your model env
git clone https://github.com/InternRobotics/genmanip-client.git
cd genmanip-client && pip install -e .

# 3. Run an evaluation
gmp submit ebench/generalist/test --run_id my_first_run
gmp eval  -a r5a -g lift2 --worker_ids 0
gmp status

A full validation pass takes roughly 30 minutes on 8× RTX 4090. Detailed setup, asset download, and the complete gmp reference are in the docs site.

Tasks

26 task types across Long-Horizon, Pick-and-Place, and Dexterous & Precise, expanded with the four generalization axes and three splits into 794 evaluation task instances. Browse the video gallery at Task Showcase.

Baselines

Reference policies live under baselines/<name>/, each with its own README and a gmp eval-compatible entry point. EBench has been validated on π0, π0.5, X-VLA, and InternVLA-A1 — see the leaderboard for current standings and per-axis diagnostic reports.

Each baseline ships its upstream code as a third_party/ git submodule and layers an EBench-specific entry point on top. Initialize the submodules first:

git submodule update --init --recursive

openpi (π0 / π0.5)

Submodule lives at baselines/openpi/third_party/openpi; EBench-specific configs and the eval client are layered under baselines/openpi/{src,scripts}/. See baselines/openpi/README.md for the full walkthrough.

# After configuring paths/tokens in scripts/launch_pi_onlineeval.sh:
bash scripts/launch_pi_onlineeval.sh

X-VLA

# 1. Install deps
pip install -r baselines/X-VLA/requirements.txt

# 2. Run eval (each WORKER_ID is a separate inference worker)
MODEL_PATH=/path/to/xvla \
BASE_URL=https://internrobotics.shlab.org.cn/eval \
RUN_ID=my_first_run \
TOKEN=<your-token> \
WORKER_IDS=0 \
  bash scripts/run_xvla_eval.sh

InternVLA-A1

Submodule lives at baselines/InternVLA-A1/third_party/InternVLA-A1. See baselines/InternVLA-A1/README.md for the full walkthrough.

# 1. Install upstream deps from third_party/InternVLA-A1/README.md
# 2. Pull the checkpoint
huggingface-cli download hxma/EBench-Generalist-InternVLA-A1 \
  --repo-type model \
  --local-dir baselines/InternVLA-A1/checkpoints/EBench-Generalist-InternVLA-A1

# 3. Run eval
cd baselines/InternVLA-A1 && bash eval_pjsim.sh

To plug your own model in, follow the contract documented at Integrate Your Own Model.

Online challenge

The 24/7 evaluation platform at internrobotics.shlab.org.cn/eval runs every submission on the held-out Test-Mini split and produces an automatic diagnostic report (capability radar, validation→test transfer curve, generalization radar, task heatmap). Submission flow: see the Challenge page.

Citation

A preprint is forthcoming. In the meantime:

@misc{ebench2026,
  title  = {EBench: Elemental Mobile Manipulation Benchmark},
  author = {Shanghai AI Laboratory},
  year   = {2026},
  note   = {Preprint coming soon},
  url    = {https://internrobotics.github.io/EBench-doc/}
}

License

MIT — see LICENSE. Built on NVIDIA Isaac Sim, cuRobo, and the LeRobot data format. Issues and pull requests are welcome.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors