chore(datasets): collapse skip-timestamp-check warning to once-per-process by shuheng-liu · Pull Request #278 · TensorAuto/OpenTau

shuheng-liu · 2026-05-06T18:41:04Z

What this does

The Skipping timestamp sync check for ... warning at
src/opentau/datasets/lerobot_dataset.py:1348 previously fired
num_processes × num_datasets times — for a heterogeneous
pretraining mixture (~392 datasets) on 8 GPUs that's ~3,136 identical
lines between accelerate init and the first training step, drowning
the startup log.

skip_timestamp_check is fundamentally a mixture-wide decision
(with optional per-dataset override), and the warning text is identical
for every dataset that has it set. So the right collapse is once per
process — combined with the rank-0 gate that's one line per run,
not per dataset.

Implementation mirrors the existing _CONTROL_MODE_WARNED dedup
pattern in the same file:

New module-level _SKIP_TIMESTAMP_WARNED: bool flag.
Inside LeRobotDataset.__init__, the warning is gated by both
rank (acc.is_main_process) and the flag — fire once, then flip.
Falls through to logging.warning when no Accelerator is set so
single-process dev / pytest runs are unchanged.
Message now mentions the first dataset hit as an example rather
than the only one, since the flag is mixture-wide.

How it was tested

pre-commit run --files src/opentau/datasets/lerobot_dataset.py — clean.
pytest -m "not gpu" -n auto tests/datasets/ — 375 passed, 7
skipped, no regressions.

Behavior comparison:

Setup	Lines emitted (before)	Lines emitted (after)
8 ranks × 392 datasets	~3,136	1
1 rank (CPU dev) × 5 datasets	5	1
1 rank × 1 dataset	1	1

The opt-in semantic and safety implication of skip_timestamp_check
itself are unchanged — only the per-rank × per-dataset duplication
is suppressed.

How to checkout & try? (for the reviewer)

git fetch origin claude/skip-timestamp-warning-rank0
git checkout claude/skip-timestamp-warning-rank0
pytest -m "not gpu" -n auto tests/datasets/

Checklist

I have added Google-style docstrings to important functions and ensured function parameters are typed.
My PR includes policy-related changes.
- If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

LeRobotDataset is constructed once per rank, so the existing ``logging.warning`` at the skip-timestamp-check branch fires ``num_processes`` x ``num_datasets`` times. For a 392-dataset pretraining mixture on 8 GPUs that's ~3K identical log lines between accelerate init and the first training step, drowning out everything else. Reuse the existing ``get_proc_accelerator`` pattern (already used elsewhere in this file at L352 and L1287) to gate the warning to ``acc.is_main_process``, with a fall-through to logging when no Accelerator is set so single-process dev/test runs are unchanged. Behavior in practice: - 8-rank distributed run: 1x per dataset (was 8x) - single-process / CPU dev: unchanged - the underlying choice the user opted into (skip_timestamp_check on a heterogeneous mixture) and its safety implication are both unchanged

claude · 2026-05-06T18:42:25Z

[claude-review] summary for commit 043c968

No blocking issues found. The 043c968 commit addresses the prior nit by wrapping the new test body in try/finally so _SKIP_TIMESTAMP_WARNED is captured up-front and restored in the finally clause — robust to test failure and prevents cross-test bleed-through within a pytest-xdist worker if a future test wants the warning to fire again.

Confirmed correctness end-to-end:

Module-level flag + rank-0 gate (acc is None or acc.is_main_process) at src/opentau/datasets/lerobot_dataset.py:1366 cleanly mirrors the established _CONTROL_MODE_WARNED dedup at line 1326. The acc is None fallthrough preserves single-process dev / pytest behavior.
global _SKIP_TIMESTAMP_WARNED is correctly declared before the assignment on line 1367.
No distributed-collective concern: each rank has its own module-level flag, only rank 0 ever logs / flips, and the flag isn't communicated across ranks (it doesn't need to be — non-rank-0 ranks short-circuit on is_main_process regardless of flag state).
The test at tests/datasets/test_datasets.py:1030 exercises the per-process dedup with two distinct repo_ids and the try/finally save/restore is the right shape — matches the convention requested in the prior round.

…ocess skip_timestamp_check is a mixture-wide decision and the warning text is identical for every dataset, so the previous "rank-0 only" gate still left ~392 lines per run for a wide pretraining mixture. Add a module-level _SKIP_TIMESTAMP_WARNED flag (mirroring the existing _CONTROL_MODE_WARNED dedup pattern) so the warning fires once per process, on the main rank only — once per run total instead of num_processes × num_datasets. The warning text now references the *first* dataset hit as an example rather than the only one, since the flag is mixture-wide; users who want a per-dataset listing can grep the resolved config.

claude

[claude-review] inline finding for commit 3a36dce

See the summary comment on this PR for the rollup.

shuheng-liu · 2026-05-06T22:21:41Z

@claude fix the missing test(s).

@shuheng-liu

- addresses @shuheng-liu (missing test): added test_skip_timestamp_warning_emitted_once_per_process to tests/datasets/test_datasets.py mirroring the _CONTROL_MODE_WARNED warn-once test — constructs two datasets with skip_timestamp_check=True and asserts exactly one matching log record via caplog, with an upfront ld_mod._SKIP_TIMESTAMP_WARNED = False reset to make the test independent of prior state. tests: passed — pytest -xvs tests/datasets/test_datasets.py::test_skip_timestamp_warning_emitted_once_per_process tests/datasets/test_datasets.py::test_control_mode_warning_emitted_once_per_repo Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

shuheng-liu · 2026-05-07T03:29:23Z

@claude fix the nit

@shuheng-liu

- addresses @shuheng-liu (nit): wrapped test_skip_timestamp_warning_emitted_once_per_process body in try/finally so the module-level _SKIP_TIMESTAMP_WARNED flag is restored to its original value after the test runs, even on failure. Prevents cross-test bleed-through within the same pytest-xdist worker if a future test wants to assert the warning fires again. tests: passed — pytest -m "not gpu" -n auto tests/datasets/test_datasets.py (2 pre-existing failures in test_do_not_use_imagenet_stats unrelated to this change — HF cache misses for lerobot/droid_100 and lerobot/aloha_mobile_cabinet parquet files) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-05-07T03:34:56Z

[claude-fix] Addressed in 043c968: wrapped the test body in try/finally so _SKIP_TIMESTAMP_WARNED is saved and restored to its original value, even on failure. This prevents cross-test bleed-through within the same pytest-xdist worker if a future test wants to assert the warning fires again.

tests: passed — pytest -m "not gpu" -n auto tests/datasets/test_datasets.py (42 passed; 2 pre-existing test_do_not_use_imagenet_stats failures are HF cache misses unrelated to this change).

shuheng-liu added the optimization Optimizes the performance of something label May 6, 2026

shuheng-liu self-assigned this May 6, 2026

shuheng-liu changed the title ~~chore(datasets): gate skip-timestamp-check warning to rank 0~~ chore(datasets): collapse skip-timestamp-check warning to once-per-process May 6, 2026

claude Bot reviewed May 6, 2026

View reviewed changes

WilliamYue37 self-requested a review May 6, 2026 23:09

shuheng-liu marked this pull request as ready for review May 7, 2026 03:28

WilliamYue37 approved these changes May 7, 2026

View reviewed changes

shuheng-liu merged commit 4451ebe into main May 7, 2026
7 checks passed

shuheng-liu deleted the claude/skip-timestamp-warning-rank0 branch May 7, 2026 21:05

claude Bot mentioned this pull request May 7, 2026

chore(claude): learn from #278 #284

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(datasets): collapse skip-timestamp-check warning to once-per-process#278

chore(datasets): collapse skip-timestamp-check warning to once-per-process#278
shuheng-liu merged 4 commits into
mainfrom
claude/skip-timestamp-warning-rank0

shuheng-liu commented May 6, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 6, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

shuheng-liu commented May 6, 2026

Uh oh!

shuheng-liu commented May 7, 2026

Uh oh!

claude Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shuheng-liu commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

How it was tested

How to checkout & try? (for the reviewer)

Checklist

Uh oh!

claude Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

shuheng-liu commented May 6, 2026

Uh oh!

shuheng-liu commented May 7, 2026

Uh oh!

claude Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shuheng-liu commented May 6, 2026 •

edited

Loading

claude Bot commented May 6, 2026 •

edited

Loading