chore(datasets): collapse skip-timestamp-check warning to once-per-process#278
Conversation
LeRobotDataset is constructed once per rank, so the existing
``logging.warning`` at the skip-timestamp-check branch fires
``num_processes`` x ``num_datasets`` times. For a 392-dataset
pretraining mixture on 8 GPUs that's ~3K identical log lines
between accelerate init and the first training step, drowning out
everything else.
Reuse the existing ``get_proc_accelerator`` pattern (already used
elsewhere in this file at L352 and L1287) to gate the warning to
``acc.is_main_process``, with a fall-through to logging when no
Accelerator is set so single-process dev/test runs are unchanged.
Behavior in practice:
- 8-rank distributed run: 1x per dataset (was 8x)
- single-process / CPU dev: unchanged
- the underlying choice the user opted into (skip_timestamp_check
on a heterogeneous mixture) and its safety implication are both
unchanged
|
[claude-review] summary for commit 043c968 No blocking issues found. The 043c968 commit addresses the prior nit by wrapping the new test body in Confirmed correctness end-to-end:
|
…ocess skip_timestamp_check is a mixture-wide decision and the warning text is identical for every dataset, so the previous "rank-0 only" gate still left ~392 lines per run for a wide pretraining mixture. Add a module-level _SKIP_TIMESTAMP_WARNED flag (mirroring the existing _CONTROL_MODE_WARNED dedup pattern) so the warning fires once per process, on the main rank only — once per run total instead of num_processes × num_datasets. The warning text now references the *first* dataset hit as an example rather than the only one, since the flag is mixture-wide; users who want a per-dataset listing can grep the resolved config.
|
@claude fix the missing test(s). |
- addresses @shuheng-liu (missing test): added test_skip_timestamp_warning_emitted_once_per_process to tests/datasets/test_datasets.py mirroring the _CONTROL_MODE_WARNED warn-once test — constructs two datasets with skip_timestamp_check=True and asserts exactly one matching log record via caplog, with an upfront ld_mod._SKIP_TIMESTAMP_WARNED = False reset to make the test independent of prior state. tests: passed — pytest -xvs tests/datasets/test_datasets.py::test_skip_timestamp_warning_emitted_once_per_process tests/datasets/test_datasets.py::test_control_mode_warning_emitted_once_per_repo Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude fix the nit |
- addresses @shuheng-liu (nit): wrapped test_skip_timestamp_warning_emitted_once_per_process body in try/finally so the module-level _SKIP_TIMESTAMP_WARNED flag is restored to its original value after the test runs, even on failure. Prevents cross-test bleed-through within the same pytest-xdist worker if a future test wants to assert the warning fires again. tests: passed — pytest -m "not gpu" -n auto tests/datasets/test_datasets.py (2 pre-existing failures in test_do_not_use_imagenet_stats unrelated to this change — HF cache misses for lerobot/droid_100 and lerobot/aloha_mobile_cabinet parquet files) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
[claude-fix] Addressed in 043c968: wrapped the test body in tests: passed — |
What this does
The
Skipping timestamp sync check for ...warning atsrc/opentau/datasets/lerobot_dataset.py:1348previously firednum_processes × num_datasetstimes — for a heterogeneouspretraining mixture (~392 datasets) on 8 GPUs that's ~3,136 identical
lines between accelerate init and the first training step, drowning
the startup log.
skip_timestamp_checkis fundamentally a mixture-wide decision(with optional per-dataset override), and the warning text is identical
for every dataset that has it set. So the right collapse is once per
process — combined with the rank-0 gate that's one line per run,
not per dataset.
Implementation mirrors the existing
_CONTROL_MODE_WARNEDdeduppattern in the same file:
_SKIP_TIMESTAMP_WARNED: boolflag.LeRobotDataset.__init__, the warning is gated by bothrank (
acc.is_main_process) and the flag — fire once, then flip.logging.warningwhen no Accelerator is set sosingle-process dev / pytest runs are unchanged.
than the only one, since the flag is mixture-wide.
How it was tested
pre-commit run --files src/opentau/datasets/lerobot_dataset.py— clean.pytest -m "not gpu" -n auto tests/datasets/— 375 passed, 7skipped, no regressions.
Behavior comparison:
The opt-in semantic and safety implication of
skip_timestamp_checkitself are unchanged — only the per-rank × per-dataset duplication
is suppressed.
How to checkout & try? (for the reviewer)
git fetch origin claude/skip-timestamp-warning-rank0 git checkout claude/skip-timestamp-warning-rank0 pytest -m "not gpu" -n auto tests/datasets/Checklist