Skip to content

feat(envs): add RoboCasa365 simulated-eval environment alongside LIBERO#386

Merged
shuheng-liu merged 5 commits into
mainfrom
claude/lucid-albattani-b33067
Jun 3, 2026
Merged

feat(envs): add RoboCasa365 simulated-eval environment alongside LIBERO#386
shuheng-liu merged 5 commits into
mainfrom
claude/lucid-albattani-b33067

Conversation

@shuheng-liu
Copy link
Copy Markdown
Member

What this does

Adds RoboCasa365 (kitchen manipulation, robosuite/MuJoCo) as a second simulated-eval environment alongside LIBERO, and makes it a first-class uv extra. RoboCasa plugs into the same environment-agnostic eval pipeline as LIBERO — parallel vectorized envs, per-task success rates, and grid_summary eval videos to wandb — with no changes to eval.py / train.py.

Highlights:

  • src/opentau/envs/robocasa.pyRoboCasaEnv(gym.Env) wrapper + create_robocasa_envs(), ported from upstream LeRobot's RoboCasa and reshaped to OpenTau's LIBERO multi-task pattern: the 3 PandaOmron cameras map to camera0/1/2, info["success"] is bridged to is_success, tasks shard round-robin across accelerator ranks, and the robocasa/robosuite import is deferred so the module (and the CPU suite) loads without the sim installed. 12-D action / 16-D state / 3 cameras.
  • Config + factoryRoboCasaEnv draccus config (env.type=robocasa, task-group shortcuts + split) and make_envs dispatch. Both LIBERO and RoboCasa return dict[group][task_id] -> VectorEnv, so the eval pipeline gives per-task success + grid videos for free.
  • Packaging — uv sync --extra robocasa co-installs with libero on a shared robosuite stack. Two non-obvious things make it resolve: robocasa needs MujocoEnv(load_model_on_init=...), added on robosuite master after the 1.5.2 PyPI release, so [tool.uv.sources] repins robosuite to a master commit that still self-reports 1.5.2 (LIBERO re-validated on it); and robocasa is pulled from the shuheng-liu/robocasa packaging fork that drops upstream's lerobot==0.3.3 / tianshou / opencv-python / hidapi deps and loosens its numpy/numba/scipy/mujoco pins + import-time version asserts (uv can't --no-deps a single package in a lock). Kitchen assets remain a separate runtime download.
  • fix(eval)eval_policy_all now raises a clear error if eval.recording_root (the LIBERO-only dataset recorder) is set with a non-LIBERO env, instead of silently mislabeling rollouts. Fixes RoboCasa: eval.recording_root uses the LIBERO-hardcoded dataset recorder #381.

RoboCasa is eval-validated, but the train→eval loop and a few fidelity/parity features aren't there yet — tracked as follow-ups: #379 (training data + 12-DoF projection for meaningful eval), #380 (fpscontrol_freq wiring), #382 (per-task horizons), #383 (eval seed/split protocol parity), #384 (subgoal conditioning), #385 (GPU/regression CI).

How it was tested

  • CPU suite (CI-equivalent, -m "not gpu"): tests/envs/ + tests/scripts/test_eval.py → 142 passed. New: tests/envs/test_robocasa.py (config registration, factory dispatch shape, per-rank task sharding, version shim, pure helpers — all mock-based, no sim) and tests/scripts/test_eval.py::test_eval_policy_all_rejects_recording_root_for_non_libero_env.
  • Packaging: regenerated uv.lock (resolved 274 packages, no conflicts); verified a fresh uv sync --extra robocasa installs and imports robosuite + robocasa + opentau.envs.robocasa on Linux.
  • Real sim (a Linux GPU box, headless EGL): end-to-end against the actual robosuite/MuJoCo sim — RoboCasaEnv construct/reset/step/render (CloseFridge: language, camera0/1/2 256x256x3, 16-D agent_pos, success bridge), parallel vectorized envs via make_envs, and grid_summary.mp4 + the Eval Videos/CloseFridge_0 wandb key produced via eval's own create_grid_summary_video / collect_grid_summary_videos. Also confirmed LIBERO still constructs+steps on the shared robosuite-master commit.

How to checkout & try? (for the reviewer)

CPU, no sim needed:

uv run --extra dev pytest -sx tests/envs/test_robocasa.py
uv run --extra dev pytest -sx tests/scripts/test_eval.py::test_eval_policy_all_rejects_recording_root_for_non_libero_env

Full sim eval (Linux + GPU; the kitchen assets are a separate ~5-10GB download):

uv sync --extra robocasa
python -m robocasa.scripts.download_kitchen_assets
MUJOCO_GL=egl opentau-eval --accelerate-config configs/examples/accelerate_ddp_config.yaml --config_path=configs/examples/pi05_robocasa_eval_config.json

Checklist

  • I have added Google-style docstrings to important functions and ensured function parameters are typed.
  • My PR includes policy-related changes.
    • If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Adds a RoboCasa365 kitchen-sim eval env that coexists with LIBERO on the shared
robosuite 1.5.2 stack. Ports upstream LeRobot's RoboCasa wrapper into OpenTau's
LIBERO-style multi-task pattern so it reuses the environment-agnostic eval
pipeline (parallel vec envs, per-task success rates, grid_summary wandb videos)
with no changes to eval.py/train.py.

- envs/robocasa.py: RoboCasaEnv(gym.Env) wrapper + create_robocasa_envs(); maps
  the 3 PandaOmron cameras to camera0/1/2, bridges info["success"] -> is_success,
  shards tasks round-robin across ranks, and defers the robocasa/robosuite import
  so the module loads (and CPU tests run) without the sim installed.
- envs/configs.py: RoboCasaEnv config (12-D action, 16-D state, task groups/split).
- envs/factory.py: make_env_config / make_envs dispatch.
- tests/envs/test_robocasa.py: CPU mock tests (registration, dispatch, sharding).
- configs/examples/pi05_robocasa_eval_config.json: smoke eval config.

The robocasa pip extra is intentionally deferred: upstream robocasa pins
lerobot==0.3.3 plus over-strict numba/numpy/mujoco/tianshou that would conflict
in the shared lock, so a packaging fork is needed first. Manual install is
documented in CLAUDE.md.
robocasa 1.0.1 hard-asserts mujoco==3.3.1 and numpy==2.2.5 at package import.
OpenTau shares LIBERO's stack (robosuite 1.5.2 + mujoco>=3.3.5 + numpy 2.2.6),
and robosuite 1.5.2 itself only needs mujoco>=3.3.0 / numpy>=1.13.3, so those
equality pins can't be satisfied here. _import_robocasa_with_version_shim()
spoofs the two version strings only for the one-time package import and restores
them immediately after, so the integration runs on a stock `pip install robocasa`.
Called at both deferred-import sites (_resolve_tasks, RoboCasaEnv._ensure_env).
A robocasa packaging fork that drops the asserts makes this shim unnecessary.
…master

`uv sync --extra robocasa` now resolves + installs RoboCasa365 alongside LIBERO in
one venv. Two things make it resolve: robocasa needs robosuite master
(MujocoEnv(load_model_on_init=...), added after the 1.5.2 PyPI release), so
[tool.uv.sources] repins robosuite to commit 85abee2 (still self-reports "1.5.2",
matching the extras' pins; LIBERO re-validated constructing+stepping on it); and
robocasa is pulled from the shuheng-liu/robocasa packaging fork that drops
upstream's lerobot==0.3.3 / tianshou / opencv-python / hidapi deps and loosens its
numpy/numba/scipy/mujoco pins + import-time version asserts (uv can't --no-deps a
single package in a lock). Kitchen assets stay a separate runtime download.

Validated on a Linux GPU box: a fresh `uv sync --extra robocasa --frozen` installs
and imports robosuite (1.5.2/master) + robocasa (1.0.1/fork) + opentau.envs.robocasa.
eval.recording_root drives the LIBERO-only dataset recorder (libero_dataset_recorder,
hardcoded LIBERO_TASKS), so eval_policy_all now raises a clear NotImplementedError when
recording_root is set with any non-LIBERO env, instead of silently mislabeling the
recorded rollouts. Fails fast before any rollout runs. Closes the RoboCasa footgun (#381).
@shuheng-liu shuheng-liu added the feature New feature or request label Jun 3, 2026
@shuheng-liu shuheng-liu self-assigned this Jun 3, 2026
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review by Claude. Solid, well-tested addition — see two nit-level inline notes; nothing blocking.

Comment thread src/opentau/envs/robocasa.py Outdated

observation = self._format_raw_obs(raw_obs)
if terminated:
info["final_info"] = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitinfo["final_info"] is set here but never read anywhere in the repo (only info["is_success"] is consumed in eval.py:195, and LIBERO's step() deliberately doesn't emit a final_info key). It looks like a vestige of the upstream LeRobot port. Under SyncVectorEnv/AsyncVectorEnv this nested dict still gets batched into an object array on every terminal step — harmless, but dead. Consider dropping it to stay aligned with LiberoEnv.step.

self.observation_width = observation_width
self.observation_height = observation_height
self.visualization_width = visualization_width
self.visualization_height = visualization_height
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitvisualization_width/visualization_height are threaded all the way through (RoboCasaEnv.__init___make_env_fnscreate_robocasa_envs's gym_kwargs.pop) but never reach RoboCasaGymEnv in _ensure_env, and render() takes no size, so they're inert. This mirrors LiberoEnv (which also stores them unused), so it's a faithful port — flagging only so it's a conscious choice rather than an oversight.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 3, 2026

[claude-review] summary for commit 983f795

Re-reviewed after 983f795. The only change since the prior review is the final_info removal, which addresses my earlier nit cleanly (terminal auto-reset preserved; task/done/is_success are what eval consumes).

  • No blocking issues found. Action/state dims (12/16), the convert_action slice layout, the camera{i} remap feeding preprocess_observation's num_cams zero-fill, per-rank round-robin sharding, the is_success bridge, and the terminated = done or is_success + auto-reset flow are all consistent with the LiberoEnv pattern and covered by CPU mock tests. The eval_policy_all recording_root guard fails fast before any rollout and is tested.

Notes (not findings):

  • visualization_width/visualization_height remain plumbed-but-inert — author confirmed this is intentional parity with LiberoEnv/upstream; a real higher-res render path is a cross-env follow-up.
  • The version shim is redundant with the packaging fork that drops the import-time asserts, but remains a harmless safety net for stock pip install robocasa.
  • episode_length/_max_episode_steps is stored-but-unenforced — tracked as RoboCasa: per-task episode horizons (TASK_SUITE_MAX_STEPS equivalent) #382.
  • Real-sim paths (reset/step/render, group expansion) can't run in the CPU suite — validated manually on a GPU box per the PR description.

Addresses a review nit: info["final_info"] was a vestige of the upstream LeRobot
port — never read anywhere (eval consumes info["is_success"]/"task"/"done", which
are already set above) and divergent from LiberoEnv.step. Keep the auto-reset.
@shuheng-liu
Copy link
Copy Markdown
Member Author

Thanks for the review! Addressed the two nits:

  • final_info (robocasa.py:394) — dropped in 983f795. It was a vestige of the upstream LeRobot port: never read anywhere (eval consumes info["is_success"]/"task"/"done", which are already set just above) and divergent from LiberoEnv.step. Kept the terminal auto-reset.
  • visualization_width/visualization_height (robocasa.py:235) — keeping as-is, for parity with LiberoEnv (same inert fields) and the upstream LeRobot RoboCasaEnv config. They're genuinely inert today because render() returns the obs-resolution camera; wiring a separate higher-res render path is worth doing for both envs but is out of scope here (good follow-up candidate).

@shuheng-liu shuheng-liu marked this pull request as ready for review June 3, 2026 07:08
@shuheng-liu shuheng-liu merged commit 0c55de8 into main Jun 3, 2026
16 of 18 checks passed
@shuheng-liu shuheng-liu deleted the claude/lucid-albattani-b33067 branch June 3, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RoboCasa: eval.recording_root uses the LIBERO-hardcoded dataset recorder

1 participant