Skip to content

feat(datasets): override + require non-empty robot_type/control_mode#220

Merged
shuheng-liu merged 2 commits into
mainfrom
claude/setup-precommit-hooks-okNZD
May 1, 2026
Merged

feat(datasets): override + require non-empty robot_type/control_mode#220
shuheng-liu merged 2 commits into
mainfrom
claude/setup-precommit-hooks-okNZD

Conversation

@shuheng-liu
Copy link
Copy Markdown
Member

What this does

Refs #204.

Adds two related knobs for robot_type and control_mode metadata:

  • Dataset-level overridesDatasetConfig.robot_type and
    DatasetConfig.control_mode (both str | None, default None = no
    override). When set, they are written through to dataset.meta.info after
    the dataset is constructed in make_dataset. An explicit "" is treated as
    "clear the loaded value" (distinct from None).
  • Mixture-level non-empty requirementsDatasetMixtureConfig.require_non_empty_robot_type
    and require_non_empty_control_mode (both bool, default False).
    Defaults preserve today's behavior: empty values are tolerated. When opted
    into, make_dataset_mixture raises with a per-dataset breakdown of which
    field is missing.

When a control_mode override is provided, the "missing control_mode" warning
in LeRobotDataset.__init__ is pre-suppressed for the affected repo_id —
the user already knew it was missing and is providing a value on purpose.

How it was tested

  • Added TestRobotTypeControlModeOverridesAndRequirements in
    tests/datasets/test_dataset_mixture.py (15 tests):
    • DatasetConfig defaults / accepts overrides / accepts explicit "".
    • DatasetMixtureConfig require flags default False.
    • _apply_metadata_overrides: writes through, no-op on None, "" clears,
      skips dataset.control_mode attr update for VQA datasets that don't have
      one.
    • _validate_metadata_requirements: no-op when flags off; passes when
      populated; raises for empty / None / missing values; reports label +
      repo_id; aggregates multiple failures.
  • pre-commit run --all-files clean.
  • pytest -m "not gpu and not slow" tests/datasets/test_dataset_mixture.py tests/datasets/test_datasets.py tests/datasets/test_optional_keys.py all green.

How to checkout & try? (for the reviewer)

pytest -sx tests/datasets/test_dataset_mixture.py::TestRobotTypeControlModeOverridesAndRequirements

Example config snippet that opts in to the strict checks:

{
  "dataset_mixture": {
    "datasets": [
      {"repo_id": "lerobot/droid_100", "robot_type": "panda", "control_mode": "joint"}
    ],
    "require_non_empty_robot_type": true,
    "require_non_empty_control_mode": true
  }
}

Checklist

  • I have added Google-style docstrings to important functions and ensured function parameters are typed.
  • My PR includes policy-related changes.
    • If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Note: Before submitting this PR, please read the contributor guideline.


Generated by Claude Code

- DatasetConfig now exposes optional `robot_type` / `control_mode`
  overrides applied to `meta.info` in `make_dataset`. None means "do
  not override"; an explicit "" clears the loaded value.
- DatasetMixtureConfig adds `require_non_empty_{robot_type,control_mode}`
  flags (default False) that raise in `make_dataset_mixture` when any
  dataset still has an empty value after overrides.
- The control_mode override path also pre-suppresses the
  "missing control_mode" warning for the affected repo_id, since the
  user knew it was missing and is providing a value on purpose.

Refs #204
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline notes — see summary comment for rollup.

Comment thread src/opentau/datasets/factory.py Outdated
Comment thread src/opentau/datasets/factory.py Outdated
Comment thread src/opentau/datasets/factory.py Outdated
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 1, 2026

[claude-review] summary for commit 27d500a

  • suggestionsrc/opentau/datasets/factory.py:300 — silent length-mismatch early return bypasses the require-non-empty checks the caller opted into; branch is dead in practice — drop it or convert to assert.
  • nitsrc/opentau/datasets/factory.py:206 — pre-suppression must run before the LeRobotDataset(...) constructor on the next lines; add a one-line comment about the ordering invariant.
  • nitsrc/opentau/datasets/factory.py:81 — cross-module import of the private _CONTROL_MODE_WARNED set; consider exposing a public helper in lerobot_dataset.py instead.

@shuheng-liu
Copy link
Copy Markdown
Member Author

@claude fix per suggestion and nits if they make sense.

- addresses @claude[bot] (length-mismatch dead branch): converted the
  silent early-return in `_validate_metadata_requirements` to an
  `assert` so a future refactor that breaks the 1:1 dataset_cfgs:datasets
  invariant can't quietly bypass `require_non_empty_*` checks.
- addresses @claude[bot] (ordering invariant): added a one-line
  comment in `make_dataset` noting the warning-suppression must run
  before `LeRobotDataset(...)` to prevent a future hoist from breaking it.
- addresses @claude[bot] (private cross-module symbol): replaced the
  `_CONTROL_MODE_WARNED` import with a new public
  `suppress_control_mode_warning(repo_id)` helper exported from
  `lerobot_dataset.py`; the set itself stays private.

tests: passed — pytest -m "not gpu and not slow" tests/datasets/test_dataset_mixture.py tests/datasets/test_datasets.py tests/datasets/test_optional_keys.py

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shuheng-liu shuheng-liu marked this pull request as ready for review May 1, 2026 02:13
@shuheng-liu shuheng-liu merged commit 542eaf5 into main May 1, 2026
8 checks passed
@shuheng-liu shuheng-liu deleted the claude/setup-precommit-hooks-okNZD branch May 1, 2026 17:35
@shuheng-liu shuheng-liu self-assigned this May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants