fix(datasets): allow action_freq=None in WeightedDatasetMixture signature#325
Conversation
ReviewLGTM. Surgical annotation-only fix, claims verified against Verified
Nits (non-blocking)
Coverage PR template: all required sections present, docstrings/types box checked, policy boxes correctly unchecked (no policy-related changes here). Ready to merge once CI is green. Generated by Claude Code |
|
Thanks for the review. Acted on the nits:
Coverage note — already covered. This PR's new test is intentionally scoped to the constructor-level round-trip — that's the gap the wrong signature actually opened. |
Re-reviewSecond pass — code diff unchanged from the prior review, only the PR body was updated. All four prior items are resolved:
CI status: all 7 check runs are One small docstring nit I missed last pass (non-blocking): the new test docstring at One follow-up suggestion (out of scope, do separately if at all): the PR notes "The project doesn't run a type checker in pre-commit, so this would otherwise land silently." If signature/annotation drift becomes a recurring class of bugs, adding Ready to mark ready-for-review and merge. Generated by Claude Code |
What this does
Tightens the type annotation on
WeightedDatasetMixture.__init__so thataction_freqis correctly declared asfloat | Noneinstead offloat.Flagged while reviewing #324: after #323 made
DatasetMixtureConfig.action_freqfloat | None(mixed-frequencytraining, each dataset at its native fps), several call sites can legitimately
forward
Noneto the mixture constructor:datasets/factory.py::make_dataset_mixture(L376,L380) alreadypasses
cfg.dataset_mixture.action_freqthrough.scripts/fit_fast_tokenizer.py::_build_mixture_parallel(L714)does the same for the
--use-mixture-dataloaderpath covered by fix(scripts): support action_freq=None in fit_fast_tokenizer manual sampler #324.The attribute is stored as informational state and never used arithmetically
inside
WeightedDatasetMixture(downstream consumers already guard theNonecase:lerobot_dataset.py:1038checksif self._action_freq is not None,scripts/fit_fast_tokenizer.pyformatsNoneexplicitly), so there is noruntime bug — only the signature was lying. The project doesn't run a type
checker in pre-commit, so this would otherwise land silently.
Changes:
WeightedDatasetMixture.__init__:action_freq: float→action_freq: Optional[float],with a docstring entry explaining the
Nonesemantics.(
action_freq=30.0) and mixed-frequency (action_freq=None) cases.Optional[float].No call site needed
None-narrowing fixes — all downstream consumers alreadyhandle
None.Optional/Liststyle matches the rest ofdataset_mixture.py(the filealready imports both).
How it was tested
TestWeightedDatasetMixture::test_init_action_freq_noneintests/datasets/test_dataset_mixture.py: constructs the mixture withaction_freq=Noneand asserts the attribute round-trips asNone.lerobot_dataset.py:1038branch (emit_fps=True+action_freq=None→ emit nativemeta.fps) is already covered bytests/datasets/test_optional_keys.py::TestEmitFps::test_fps_reports_native_when_action_freq_none(added in feat(datasets): mixed-frequency training + fps metadata for pi07 #323), so this PR's new test is intentionally scoped to the
signature-level round-trip.
uv run pre-commit run --files src/opentau/datasets/dataset_mixture.py tests/datasets/test_dataset_mixture.py— all hooks pass.uv run pytest -m "not gpu" -n auto tests/datasets— 439 passed, 8 skipped.No policy / training-loop changes, so GPU pytests and nightly regression
tests are not required.
How to checkout & try? (for the reviewer)
Checklist