feat(envs): add RoboCasa365 simulated-eval environment alongside LIBERO by shuheng-liu · Pull Request #386 · TensorAuto/OpenTau

shuheng-liu · 2026-06-03T06:40:58Z

What this does

Adds RoboCasa365 (kitchen manipulation, robosuite/MuJoCo) as a second simulated-eval environment alongside LIBERO, and makes it a first-class uv extra. RoboCasa plugs into the same environment-agnostic eval pipeline as LIBERO — parallel vectorized envs, per-task success rates, and grid_summary eval videos to wandb — with no changes to eval.py / train.py.

Highlights:

src/opentau/envs/robocasa.py — RoboCasaEnv(gym.Env) wrapper + create_robocasa_envs(), ported from upstream LeRobot's RoboCasa and reshaped to OpenTau's LIBERO multi-task pattern: the 3 PandaOmron cameras map to camera0/1/2, info["success"] is bridged to is_success, tasks shard round-robin across accelerator ranks, and the robocasa/robosuite import is deferred so the module (and the CPU suite) loads without the sim installed. 12-D action / 16-D state / 3 cameras.
Config + factory — RoboCasaEnv draccus config (env.type=robocasa, task-group shortcuts + split) and make_envs dispatch. Both LIBERO and RoboCasa return dict[group][task_id] -> VectorEnv, so the eval pipeline gives per-task success + grid videos for free.
Packaging — uv sync --extra robocasa co-installs with libero on a shared robosuite stack. Two non-obvious things make it resolve: robocasa needs MujocoEnv(load_model_on_init=...), added on robosuite master after the 1.5.2 PyPI release, so [tool.uv.sources] repins robosuite to a master commit that still self-reports 1.5.2 (LIBERO re-validated on it); and robocasa is pulled from the shuheng-liu/robocasa packaging fork that drops upstream's lerobot==0.3.3 / tianshou / opencv-python / hidapi deps and loosens its numpy/numba/scipy/mujoco pins + import-time version asserts (uv can't --no-deps a single package in a lock). Kitchen assets remain a separate runtime download.
fix(eval) — eval_policy_all now raises a clear error if eval.recording_root (the LIBERO-only dataset recorder) is set with a non-LIBERO env, instead of silently mislabeling rollouts. Fixes RoboCasa: eval.recording_root uses the LIBERO-hardcoded dataset recorder #381.

RoboCasa is eval-validated, but the train→eval loop and a few fidelity/parity features aren't there yet — tracked as follow-ups: #379 (training data + 12-DoF projection for meaningful eval), #380 (fps→control_freq wiring), #382 (per-task horizons), #383 (eval seed/split protocol parity), #384 (subgoal conditioning), #385 (GPU/regression CI).

How it was tested

CPU suite (CI-equivalent, -m "not gpu"): tests/envs/ + tests/scripts/test_eval.py → 142 passed. New: tests/envs/test_robocasa.py (config registration, factory dispatch shape, per-rank task sharding, version shim, pure helpers — all mock-based, no sim) and tests/scripts/test_eval.py::test_eval_policy_all_rejects_recording_root_for_non_libero_env.
Packaging: regenerated uv.lock (resolved 274 packages, no conflicts); verified a fresh uv sync --extra robocasa installs and imports robosuite + robocasa + opentau.envs.robocasa on Linux.
Real sim (a Linux GPU box, headless EGL): end-to-end against the actual robosuite/MuJoCo sim — RoboCasaEnv construct/reset/step/render (CloseFridge: language, camera0/1/2 256x256x3, 16-D agent_pos, success bridge), parallel vectorized envs via make_envs, and grid_summary.mp4 + the Eval Videos/CloseFridge_0 wandb key produced via eval's own create_grid_summary_video / collect_grid_summary_videos. Also confirmed LIBERO still constructs+steps on the shared robosuite-master commit.

How to checkout & try? (for the reviewer)

CPU, no sim needed:

uv run --extra dev pytest -sx tests/envs/test_robocasa.py
uv run --extra dev pytest -sx tests/scripts/test_eval.py::test_eval_policy_all_rejects_recording_root_for_non_libero_env

Full sim eval (Linux + GPU; the kitchen assets are a separate ~5-10GB download):

uv sync --extra robocasa
python -m robocasa.scripts.download_kitchen_assets
MUJOCO_GL=egl opentau-eval --accelerate-config configs/examples/accelerate_ddp_config.yaml --config_path=configs/examples/pi05_robocasa_eval_config.json

Checklist

I have added Google-style docstrings to important functions and ensured function parameters are typed.
My PR includes policy-related changes.
- If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Adds a RoboCasa365 kitchen-sim eval env that coexists with LIBERO on the shared robosuite 1.5.2 stack. Ports upstream LeRobot's RoboCasa wrapper into OpenTau's LIBERO-style multi-task pattern so it reuses the environment-agnostic eval pipeline (parallel vec envs, per-task success rates, grid_summary wandb videos) with no changes to eval.py/train.py. - envs/robocasa.py: RoboCasaEnv(gym.Env) wrapper + create_robocasa_envs(); maps the 3 PandaOmron cameras to camera0/1/2, bridges info["success"] -> is_success, shards tasks round-robin across ranks, and defers the robocasa/robosuite import so the module loads (and CPU tests run) without the sim installed. - envs/configs.py: RoboCasaEnv config (12-D action, 16-D state, task groups/split). - envs/factory.py: make_env_config / make_envs dispatch. - tests/envs/test_robocasa.py: CPU mock tests (registration, dispatch, sharding). - configs/examples/pi05_robocasa_eval_config.json: smoke eval config. The robocasa pip extra is intentionally deferred: upstream robocasa pins lerobot==0.3.3 plus over-strict numba/numpy/mujoco/tianshou that would conflict in the shared lock, so a packaging fork is needed first. Manual install is documented in CLAUDE.md.

robocasa 1.0.1 hard-asserts mujoco==3.3.1 and numpy==2.2.5 at package import. OpenTau shares LIBERO's stack (robosuite 1.5.2 + mujoco>=3.3.5 + numpy 2.2.6), and robosuite 1.5.2 itself only needs mujoco>=3.3.0 / numpy>=1.13.3, so those equality pins can't be satisfied here. _import_robocasa_with_version_shim() spoofs the two version strings only for the one-time package import and restores them immediately after, so the integration runs on a stock `pip install robocasa`. Called at both deferred-import sites (_resolve_tasks, RoboCasaEnv._ensure_env). A robocasa packaging fork that drops the asserts makes this shim unnecessary.

…master `uv sync --extra robocasa` now resolves + installs RoboCasa365 alongside LIBERO in one venv. Two things make it resolve: robocasa needs robosuite master (MujocoEnv(load_model_on_init=...), added after the 1.5.2 PyPI release), so [tool.uv.sources] repins robosuite to commit 85abee2 (still self-reports "1.5.2", matching the extras' pins; LIBERO re-validated constructing+stepping on it); and robocasa is pulled from the shuheng-liu/robocasa packaging fork that drops upstream's lerobot==0.3.3 / tianshou / opencv-python / hidapi deps and loosens its numpy/numba/scipy/mujoco pins + import-time version asserts (uv can't --no-deps a single package in a lock). Kitchen assets stay a separate runtime download. Validated on a Linux GPU box: a fresh `uv sync --extra robocasa --frozen` installs and imports robosuite (1.5.2/master) + robocasa (1.0.1/fork) + opentau.envs.robocasa.

eval.recording_root drives the LIBERO-only dataset recorder (libero_dataset_recorder, hardcoded LIBERO_TASKS), so eval_policy_all now raises a clear NotImplementedError when recording_root is set with any non-LIBERO env, instead of silently mislabeling the recorded rollouts. Fails fast before any rollout runs. Closes the RoboCasa footgun (#381).

claude

Automated review by Claude. Solid, well-tested addition — see two nit-level inline notes; nothing blocking.

claude · 2026-06-03T06:45:37Z

+
+        observation = self._format_raw_obs(raw_obs)
+        if terminated:
+            info["final_info"] = {


nit — info["final_info"] is set here but never read anywhere in the repo (only info["is_success"] is consumed in eval.py:195, and LIBERO's step() deliberately doesn't emit a final_info key). It looks like a vestige of the upstream LeRobot port. Under SyncVectorEnv/AsyncVectorEnv this nested dict still gets batched into an object array on every terminal step — harmless, but dead. Consider dropping it to stay aligned with LiberoEnv.step.

claude · 2026-06-03T06:45:37Z

+        self.observation_width = observation_width
+        self.observation_height = observation_height
+        self.visualization_width = visualization_width
+        self.visualization_height = visualization_height


nit — visualization_width/visualization_height are threaded all the way through (RoboCasaEnv.__init__ → _make_env_fns → create_robocasa_envs's gym_kwargs.pop) but never reach RoboCasaGymEnv in _ensure_env, and render() takes no size, so they're inert. This mirrors LiberoEnv (which also stores them unused), so it's a faithful port — flagging only so it's a conscious choice rather than an oversight.

claude · 2026-06-03T06:45:49Z

[claude-review] summary for commit 983f795

Re-reviewed after 983f795. The only change since the prior review is the final_info removal, which addresses my earlier nit cleanly (terminal auto-reset preserved; task/done/is_success are what eval consumes).

No blocking issues found. Action/state dims (12/16), the convert_action slice layout, the camera{i} remap feeding preprocess_observation's num_cams zero-fill, per-rank round-robin sharding, the is_success bridge, and the terminated = done or is_success + auto-reset flow are all consistent with the LiberoEnv pattern and covered by CPU mock tests. The eval_policy_all recording_root guard fails fast before any rollout and is tested.

Notes (not findings):

visualization_width/visualization_height remain plumbed-but-inert — author confirmed this is intentional parity with LiberoEnv/upstream; a real higher-res render path is a cross-env follow-up.
The version shim is redundant with the packaging fork that drops the import-time asserts, but remains a harmless safety net for stock pip install robocasa.
episode_length/_max_episode_steps is stored-but-unenforced — tracked as RoboCasa: per-task episode horizons (TASK_SUITE_MAX_STEPS equivalent) #382.
Real-sim paths (reset/step/render, group expansion) can't run in the CPU suite — validated manually on a GPU box per the PR description.

Addresses a review nit: info["final_info"] was a vestige of the upstream LeRobot port — never read anywhere (eval consumes info["is_success"]/"task"/"done", which are already set above) and divergent from LiberoEnv.step. Keep the auto-reset.

shuheng-liu · 2026-06-03T06:48:30Z

Thanks for the review! Addressed the two nits:

final_info (robocasa.py:394) — dropped in 983f795. It was a vestige of the upstream LeRobot port: never read anywhere (eval consumes info["is_success"]/"task"/"done", which are already set just above) and divergent from LiberoEnv.step. Kept the terminal auto-reset.
visualization_width/visualization_height (robocasa.py:235) — keeping as-is, for parity with LiberoEnv (same inert fields) and the upstream LeRobot RoboCasaEnv config. They're genuinely inert today because render() returns the obs-resolution camera; wiring a separate higher-res render path is worth doing for both envs but is out of scope here (good follow-up candidate).

shuheng-liu added 4 commits June 2, 2026 22:14

shuheng-liu added the feature New feature or request label Jun 3, 2026

shuheng-liu self-assigned this Jun 3, 2026

claude Bot reviewed Jun 3, 2026

View reviewed changes

shuheng-liu marked this pull request as ready for review June 3, 2026 07:08

shuheng-liu merged commit 0c55de8 into main Jun 3, 2026
16 of 18 checks passed

shuheng-liu deleted the claude/lucid-albattani-b33067 branch June 3, 2026 15:15

shuheng-liu mentioned this pull request Jun 3, 2026

feat(robocasa): integrate RoboCasa kitchen sim alongside LIBERO #289

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(envs): add RoboCasa365 simulated-eval environment alongside LIBERO#386

feat(envs): add RoboCasa365 simulated-eval environment alongside LIBERO#386
shuheng-liu merged 5 commits into
mainfrom
claude/lucid-albattani-b33067

shuheng-liu commented Jun 3, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Jun 3, 2026

Uh oh!

claude Bot Jun 3, 2026

Uh oh!

claude Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

shuheng-liu commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuheng-liu commented Jun 3, 2026

What this does

How it was tested

How to checkout & try? (for the reviewer)

Checklist

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shuheng-liu commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 3, 2026 •

edited

Loading