feat(envs): add RoboCasa365 simulated-eval environment alongside LIBERO#386
Conversation
Adds a RoboCasa365 kitchen-sim eval env that coexists with LIBERO on the shared robosuite 1.5.2 stack. Ports upstream LeRobot's RoboCasa wrapper into OpenTau's LIBERO-style multi-task pattern so it reuses the environment-agnostic eval pipeline (parallel vec envs, per-task success rates, grid_summary wandb videos) with no changes to eval.py/train.py. - envs/robocasa.py: RoboCasaEnv(gym.Env) wrapper + create_robocasa_envs(); maps the 3 PandaOmron cameras to camera0/1/2, bridges info["success"] -> is_success, shards tasks round-robin across ranks, and defers the robocasa/robosuite import so the module loads (and CPU tests run) without the sim installed. - envs/configs.py: RoboCasaEnv config (12-D action, 16-D state, task groups/split). - envs/factory.py: make_env_config / make_envs dispatch. - tests/envs/test_robocasa.py: CPU mock tests (registration, dispatch, sharding). - configs/examples/pi05_robocasa_eval_config.json: smoke eval config. The robocasa pip extra is intentionally deferred: upstream robocasa pins lerobot==0.3.3 plus over-strict numba/numpy/mujoco/tianshou that would conflict in the shared lock, so a packaging fork is needed first. Manual install is documented in CLAUDE.md.
robocasa 1.0.1 hard-asserts mujoco==3.3.1 and numpy==2.2.5 at package import. OpenTau shares LIBERO's stack (robosuite 1.5.2 + mujoco>=3.3.5 + numpy 2.2.6), and robosuite 1.5.2 itself only needs mujoco>=3.3.0 / numpy>=1.13.3, so those equality pins can't be satisfied here. _import_robocasa_with_version_shim() spoofs the two version strings only for the one-time package import and restores them immediately after, so the integration runs on a stock `pip install robocasa`. Called at both deferred-import sites (_resolve_tasks, RoboCasaEnv._ensure_env). A robocasa packaging fork that drops the asserts makes this shim unnecessary.
…master `uv sync --extra robocasa` now resolves + installs RoboCasa365 alongside LIBERO in one venv. Two things make it resolve: robocasa needs robosuite master (MujocoEnv(load_model_on_init=...), added after the 1.5.2 PyPI release), so [tool.uv.sources] repins robosuite to commit 85abee2 (still self-reports "1.5.2", matching the extras' pins; LIBERO re-validated constructing+stepping on it); and robocasa is pulled from the shuheng-liu/robocasa packaging fork that drops upstream's lerobot==0.3.3 / tianshou / opencv-python / hidapi deps and loosens its numpy/numba/scipy/mujoco pins + import-time version asserts (uv can't --no-deps a single package in a lock). Kitchen assets stay a separate runtime download. Validated on a Linux GPU box: a fresh `uv sync --extra robocasa --frozen` installs and imports robosuite (1.5.2/master) + robocasa (1.0.1/fork) + opentau.envs.robocasa.
eval.recording_root drives the LIBERO-only dataset recorder (libero_dataset_recorder, hardcoded LIBERO_TASKS), so eval_policy_all now raises a clear NotImplementedError when recording_root is set with any non-LIBERO env, instead of silently mislabeling the recorded rollouts. Fails fast before any rollout runs. Closes the RoboCasa footgun (#381).
|
|
||
| observation = self._format_raw_obs(raw_obs) | ||
| if terminated: | ||
| info["final_info"] = { |
There was a problem hiding this comment.
nit — info["final_info"] is set here but never read anywhere in the repo (only info["is_success"] is consumed in eval.py:195, and LIBERO's step() deliberately doesn't emit a final_info key). It looks like a vestige of the upstream LeRobot port. Under SyncVectorEnv/AsyncVectorEnv this nested dict still gets batched into an object array on every terminal step — harmless, but dead. Consider dropping it to stay aligned with LiberoEnv.step.
| self.observation_width = observation_width | ||
| self.observation_height = observation_height | ||
| self.visualization_width = visualization_width | ||
| self.visualization_height = visualization_height |
There was a problem hiding this comment.
nit — visualization_width/visualization_height are threaded all the way through (RoboCasaEnv.__init__ → _make_env_fns → create_robocasa_envs's gym_kwargs.pop) but never reach RoboCasaGymEnv in _ensure_env, and render() takes no size, so they're inert. This mirrors LiberoEnv (which also stores them unused), so it's a faithful port — flagging only so it's a conscious choice rather than an oversight.
|
[claude-review] summary for commit 983f795 Re-reviewed after 983f795. The only change since the prior review is the
Notes (not findings):
|
Addresses a review nit: info["final_info"] was a vestige of the upstream LeRobot port — never read anywhere (eval consumes info["is_success"]/"task"/"done", which are already set above) and divergent from LiberoEnv.step. Keep the auto-reset.
|
Thanks for the review! Addressed the two nits:
|
What this does
Adds RoboCasa365 (kitchen manipulation, robosuite/MuJoCo) as a second simulated-eval environment alongside LIBERO, and makes it a first-class
uvextra. RoboCasa plugs into the same environment-agnostic eval pipeline as LIBERO — parallel vectorized envs, per-task success rates, andgrid_summaryeval videos to wandb — with no changes toeval.py/train.py.Highlights:
src/opentau/envs/robocasa.py—RoboCasaEnv(gym.Env)wrapper +create_robocasa_envs(), ported from upstream LeRobot's RoboCasa and reshaped to OpenTau's LIBERO multi-task pattern: the 3 PandaOmron cameras map tocamera0/1/2,info["success"]is bridged tois_success, tasks shard round-robin across accelerator ranks, and therobocasa/robosuiteimport is deferred so the module (and the CPU suite) loads without the sim installed. 12-D action / 16-D state / 3 cameras.RoboCasaEnvdraccus config (env.type=robocasa, task-group shortcuts +split) andmake_envsdispatch. Both LIBERO and RoboCasa returndict[group][task_id] -> VectorEnv, so the eval pipeline gives per-task success + grid videos for free.uv sync --extra robocasaco-installs withliberoon a shared robosuite stack. Two non-obvious things make it resolve: robocasa needsMujocoEnv(load_model_on_init=...), added on robosuite master after the 1.5.2 PyPI release, so[tool.uv.sources]repinsrobosuiteto a master commit that still self-reports1.5.2(LIBERO re-validated on it); and robocasa is pulled from theshuheng-liu/robocasapackaging fork that drops upstream'slerobot==0.3.3/tianshou/opencv-python/hidapideps and loosens itsnumpy/numba/scipy/mujocopins + import-time version asserts (uv can't--no-depsa single package in a lock). Kitchen assets remain a separate runtime download.fix(eval)—eval_policy_allnow raises a clear error ifeval.recording_root(the LIBERO-only dataset recorder) is set with a non-LIBERO env, instead of silently mislabeling rollouts. Fixes RoboCasa: eval.recording_root uses the LIBERO-hardcoded dataset recorder #381.RoboCasa is eval-validated, but the train→eval loop and a few fidelity/parity features aren't there yet — tracked as follow-ups: #379 (training data + 12-DoF projection for meaningful eval), #380 (
fps→control_freqwiring), #382 (per-task horizons), #383 (eval seed/split protocol parity), #384 (subgoal conditioning), #385 (GPU/regression CI).How it was tested
-m "not gpu"):tests/envs/+tests/scripts/test_eval.py→ 142 passed. New:tests/envs/test_robocasa.py(config registration, factory dispatch shape, per-rank task sharding, version shim, pure helpers — all mock-based, no sim) andtests/scripts/test_eval.py::test_eval_policy_all_rejects_recording_root_for_non_libero_env.uv.lock(resolved 274 packages, no conflicts); verified a freshuv sync --extra robocasainstalls and importsrobosuite+robocasa+opentau.envs.robocasaon Linux.RoboCasaEnvconstruct/reset/step/render (CloseFridge: language,camera0/1/2256x256x3, 16-Dagent_pos, success bridge), parallel vectorized envs viamake_envs, andgrid_summary.mp4+ theEval Videos/CloseFridge_0wandb key produced via eval's owncreate_grid_summary_video/collect_grid_summary_videos. Also confirmed LIBERO still constructs+steps on the shared robosuite-master commit.How to checkout & try? (for the reviewer)
CPU, no sim needed:
Full sim eval (Linux + GPU; the kitchen assets are a separate ~5-10GB download):
Checklist