Skip to content

RoboCasa: eval.recording_root uses the LIBERO-hardcoded dataset recorder #381

@shuheng-liu

Description

@shuheng-liu

Problem

eval_policy_all records rollouts to a dataset unconditionally via the LIBERO recorder:

if cfg.eval.recording_root is not None:
    ...
    consolidate_task_result(aggregate_task_results(task_results), ...)

aggregate_task_results / consolidate_task_result come from src/opentau/utils/libero_dataset_recorder.py, which hardcodes a LIBERO_TASKS list (40 LIBERO task names) and LIBERO-shaped metadata. Setting eval.recording_root with a RoboCasa env therefore mislabels / misbuilds the recorded dataset (RoboCasa tasks aren't in LIBERO_TASKS, and the obs/action layout differs).

The shipped RoboCasa example config deliberately omits recording_root, so the default path is safe — but it's a latent footgun and blocks RECAP-style rollout capture for RoboCasa.

Suggested approach

  • Make the recorder env-aware: dispatch on cfg.env type, or guard the recorder to LIBERO and raise a clear error for other envs.
  • Longer term, add a RoboCasa rollout recorder analogous to libero_dataset_recorder if RoboCasa self-training is wanted.

References

  • src/opentau/scripts/eval.py:893 (the recording_root branch in eval_policy_all)
  • src/opentau/utils/libero_dataset_recorder.py (LIBERO_TASKS, aggregate_task_results, consolidate_task_result)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions