feat(datasets): mixed-frequency training + fps metadata for pi07 by shuheng-liu · Pull Request #323 · TensorAuto/OpenTau

shuheng-liu · 2026-05-23T02:17:51Z

What this does

Adds opt-in mixed-frequency training for the dataset mixture and surfaces the dataset's native frame rate as an fps metadata field that pi07 / pi07_paligemma tokenize into the prefix prompt. The four pieces:

DatasetMixtureConfig.action_freq: float | None = None (was float = 30.0). When None (new default), each dataset is sampled at its native fps with no resampling — resolve_delta_timestamps substitutes ds_meta.fps so a chunk of N actions becomes N consecutive native frames regardless of the dataset's rate. This is also a bug fix for sub-30-Hz single-dataset runs: the old 30 Hz default produced silently duplicated/skipped action targets for 10 / 15 / 20 Hz datasets via nearest-neighbor resampling. Setting a positive float preserves the legacy "resample everything to this rate" behaviour.
DatasetMixtureConfig.emit_fps: bool = False (opt-in). When True, __getitem__ returns the effective per-sample frame rate under the fps key (torch.long scalar) with fps_is_pad=False — equal to action_freq when set, else meta.fps. VQA samples (no temporal axis) emit fps=0, fps_is_pad=True so heterogeneous VLA + VQA batches stay schema-aligned. fps is not rolled by metadata_drop_*_prob — it's an intrinsic property of the chunk, not a noisy label.
pi07 / pi07_paligemma prepare_metadata × 4 files. All four implementations now tokenize fps with the "FPS: N, " header (uppercase to match the sibling Speed:/Quality:/Mistake:/Robot:/Control: labels), slotted between Robot: and Control: (final order: Speed → Quality → Mistake → Robot → FPS → Control). Missing or padded fps drops the segment entirely (no header, no comma).
EnvMetadataConfig.emit_fps: bool = False (opt-in, parallels the training-side toggle). When True, add_eval_metadata broadcasts cfg.env.fps as the fps metadata key, matching the training-side contract.

Backward-compat notes for reviewers

action_freq default changed from 30.0 to None. Sub-30-Hz single-dataset runs (10 / 15 / 20 Hz) are silently improved (no more duplicate action targets via nearest-neighbor). Single-dataset runs at exactly 30 Hz are byte-identical. Above-30-Hz runs (50 / 60 Hz) see their prediction horizon shrink because the chunk now covers fewer real seconds at the higher native rate — pin action_freq=30.0 explicitly to preserve the old horizon.
emit_fps defaults to False on both DatasetMixtureConfig and EnvMetadataConfig. Pre-PR checkpoints resume cleanly because no FPS: segment is added to the policy prefix. New training runs that want fps conditioning set emit_fps=True on both sides (training + eval) explicitly.

How it was tested

Added unit tests covering the new contract:
- tests/configs/test_default.py — action_freq=None accepted, default is None, emit_fps defaults False.
- tests/datasets/test_delta_timestamps.py — resolve_delta_timestamps substitutes native fps when action_freq=None; two datasets with different fps produce different per-action timestamps.
- tests/datasets/test_optional_keys.py — _emit_optional_keys is a no-op under default emit_fps=False; when explicitly enabled, emits the effective fps (action_freq if set, else meta.fps) as torch.long with fps_is_pad=False; fps stays non-pad even with metadata_drop_*_prob=1.0; VQA samples emit fps=0, fps_is_pad=True so heterogeneous default-collate produces aligned (B,) tensors.
- tests/policies/test_pi07_cpu.py, tests/policies/test_pi07_paligemma_{high_level_planner,low_level}.py — prepare_metadata content + ordering: "FPS: N, " appears in the prefix between Robot: and Control: when present, segment omitted entirely when padded or absent, no stray ", ," artefacts. Mixed-fps_is_pad batches (LeRobot + VQA) honour per-sample padding row-by-row.
- tests/envs/test_configs.py, tests/envs/test_utils.py — EnvMetadataConfig.emit_fps defaults False; add_eval_metadata broadcasts cfg.env.fps as torch.long only when emit_fps=True is opted in.
pre-commit run --all-files — clean.
pytest -m "not gpu" -n auto — full suite passes (modulo pre-existing LIBERO-assets / timing-flaky failures unrelated to this branch).
pytest -m "gpu" -n 0 (RTX 5090) — 19 passed, 10 skipped, 0 failed (~2:46). All skips are pre-existing unconditional @pytest.mark.skip markers (memory or environment), unaffected by this branch.

Nightly regression tests still need to land in the scheduled job.

How to checkout & try? (for the reviewer)

git fetch origin claude/awesome-liskov-941c29
git checkout claude/awesome-liskov-941c29
uv sync --extra dev --extra libero

Run the focused test slice:

pytest -sx -m "not gpu" \
  tests/configs/test_default.py \
  tests/datasets/test_delta_timestamps.py \
  tests/datasets/test_optional_keys.py \
  tests/envs/test_configs.py tests/envs/test_utils.py \
  tests/policies/test_pi07_cpu.py \
  tests/policies/test_pi07_paligemma_high_level_planner.py \
  tests/policies/test_pi07_paligemma_low_level.py

Eyeball the new prefix string in a Python REPL (no model weights needed — stubbed tokenizer):

from types import SimpleNamespace
import torch
from opentau.policies.pi07.low_level.modeling_pi07_low_level import PI07LowLevelPolicy

captured = []
def stub(metadata, **kw):
    captured.extend(metadata)
    return {"input_ids": torch.zeros((len(metadata), 4), dtype=torch.long),
            "attention_mask": torch.zeros((len(metadata), 4), dtype=torch.long)}
tok = SimpleNamespace(); tok.__call__ = stub
fake = SimpleNamespace(language_tokenizer=tok, config=SimpleNamespace(metadata_max_length=4))
batch = {
    "state": torch.zeros(1, 1),
    "fps": torch.tensor([20], dtype=torch.long),
    "fps_is_pad": torch.tensor([False]),
    "robot_type": ["franka"],
    "control_mode": ["joint"],
}
PI07LowLevelPolicy.prepare_metadata(fake, batch)
print(captured[0])
# -> 'Metadata: Robot: franka,  FPS: 20,  Control: joint, '

Checklist

I have added Google-style docstrings to important functions and ensured function parameters are typed.
My PR includes policy-related changes.
- If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Note: Before submitting this PR, please read the contributor guideline.

- DatasetMixtureConfig.action_freq is now Optional[float] and defaults to None, disabling resampling so each dataset is sampled at its native fps. resolve_delta_timestamps substitutes ds_meta.fps when action_freq is None so a chunk of N actions is N consecutive native frames regardless of the dataset's rate. - New DatasetMixtureConfig.emit_fps (default True) makes __getitem__ emit the dataset's native meta.fps as a torch.long sample key (always non-pad, no participation in metadata_drop_*_prob). - All four pi07 / pi07_paligemma prepare_metadata implementations now tokenize fps with the lowercase "fps: N, " header between Robot: and Control:, omitted entirely when fps_is_pad is True. - New EnvMetadataConfig.emit_fps (default True) makes add_eval_metadata broadcast cfg.env.fps as the inference-time fps metadata key, matching the training contract end-to-end.

shuheng-liu

Reviewed the diff end-to-end against 28db260 (true PR base). Implementation looks correct and tests are thorough — the four prepare_metadata updates are consistent (Speed → Quality → Mistake → Robot → fps → Control), _emit_optional_keys correctly bypasses the dropout rolls for fps, and add_eval_metadata mirrors the training contract. A few design-level concerns worth flagging before un-drafting:

1. Two defaults silently flip behaviour for existing configs / resumed checkpoints (main concern)

The PR makes two backward-incompatible defaults at once:

DatasetMixtureConfig.action_freq: 30.0 → None
DatasetMixtureConfig.emit_fps / EnvMetadataConfig.emit_fps: new, both default True

The combined effect on any existing config that doesn't explicitly set these:

A heterogeneous mixture that was implicitly being resampled to 30 Hz now uses each dataset's native rate, so a single batch can mix samples spanning different real-time horizons (chunk_size consecutive native frames). For a 10 Hz + 30 Hz mixture, this is a 3× shift in the supervised time window for half the batch — not "no-op when you don't opt in".
The PaliGemma / Gemma 3 prefix gains a new "fps: N, " segment the model has never seen. The PR body acknowledges this but treats it as a footnote: resuming any pre-PR checkpoint requires remembering to set both emit_fps toggles to False.

Suggest making at least one of these opt-in:

Cheapest fix: default emit_fps=False on both DatasetMixtureConfig and EnvMetadataConfig. New runs that want fps tokenization flip the flag explicitly; old configs are byte-identical.
Or: keep action_freq=30.0 default and make None an explicit "I know I'm doing mixed-frequency" opt-in.

Doing both — flipping the resample default and injecting a new prefix segment — without an opt-in step is a footgun for anyone with in-flight runs.

2. `fps` is emitted on every sample for policies that never read it

emit_fps lives on the mixture, so a pi0 / pi05 / pi05_mem / pi06 run gets fps + fps_is_pad in every sample dict even though only pi07 / pi07_paligemma tokenize them. The cost is negligible (one long + one bool), and the policy-side batch.get(..., default_pad) makes it safe — just noting the asymmetry. Tying emission to the policy type would be over-engineered; the current "emit always, ignore if unused" is fine if the default is False per concern #1.

3. Minor

int(self.meta.fps) in _emit_optional_keys: meta.fps is typed int (lerobot_dataset.py:487), so the cast is a no-op. No real bug, but if a future dataset ever stores a non-integer rate in info.json (29.97, 59.94) the tokenized prefix and resolve_delta_timestamps would diverge by a few percent. Probably fine to leave; the type contract is integer.
Lowercase "fps:" vs capitalized siblings (Speed:, Robot:) — deliberate per the PR body, and "fps" is conventionally lowercase, so reads fine. Just confirming the choice is intentional.
f"fps: {str(fps.item())}, " — redundant str() inside f-string, but matches the existing speed/quality/mistake style. Don't change for consistency.

Concern #1 is the only one I'd block on. The rest are nits / FYI.

Generated by Claude Code

`_emit_optional_keys` previously emitted `meta.fps` regardless of the mixture's `action_freq`. When the user pinned `action_freq=30.0` on a mixture containing a 50 Hz dataset, `resolve_delta_timestamps` resampled the chunk to 30 Hz but the prefix still said `fps: 50, ` — telling the policy a different rate than the chunk actually carried. Capture `_action_freq` on `BaseDataset` and emit `action_freq if set else meta.fps` so the tokenized fps always matches the chunk's effective rate.

shuheng-liu · 2026-05-23T02:38:16Z

Thanks for the careful review. Responses inline.

#1 — defaults

Both defaults (action_freq=None, emit_fps=True on both sides) were intentional design choices fixed during planning, with the trade-off documented in the PR body's "Backward-compat note". The reasoning: mixed-frequency is the primary new use case, and making the intended configuration the default avoids requiring two opt-in flags for the new feature to work end-to-end.

That said, the resume-from-checkpoint footgun is real. I see two viable knobs and want your call before flipping:

A. emit_fps=False default on both sides — keeps action_freq=None (new training picks the rate up automatically once the user flips emit_fps=True), but old checkpoints resume cleanly because the prefix doesn't gain a new segment by default. The cost is that silently using mixed-freq without fps conditioning is now reachable (no opt-in barrier for the resample change alone).
B. action_freq=30.0 default — old configs are byte-identical (both the resample behaviour and, indirectly, the prefix content: every dataset gets resampled to 30 Hz, so emit_fps=True produces a constant fps: 30, segment that pre-PR checkpoints still haven't seen). Doesn't actually solve the resume problem unless we also flip emit_fps.

The combo "A and B both flipped" makes the whole feature opt-in (safest, but two extra config lines per user). Happy to push whichever you pick; my mild preference is A if I have to choose unilaterally — it preserves the planning-conversation intent that mixed-freq is the new default while still letting old checkpoints resume.

#2 — fps emitted for non-pi07 policies

Confirmed. Cost is one torch.long + one torch.bool per sample, gated by emit_fps. Policy-type-aware emission would over-engineer for ~16 bytes. No change.

#3 — nits

int(self.meta.fps): leaving as defensive coding. If a future dataset ever stores 29.97 in info.json, int() floors it to 29 — predictably wrong rather than mysteriously wrong via tensor-cast.
Lowercase fps:: intentional per the original task description.
Redundant str() in f-string: matches the existing speed/quality/mistake lines. Leaving for consistency.

Separate bug surfaced while re-reading — fixed in `7236b5a`

_emit_optional_keys was emitting meta.fps regardless of action_freq. When a user pins action_freq=30.0 on a mixture containing a 50 Hz dataset, resolve_delta_timestamps resamples the chunk to 30 Hz — but the prefix would still say fps: 50, , telling the policy a different rate than the chunk actually carries. Fixed by capturing _action_freq on BaseDataset and emitting action_freq if set else meta.fps. Two new tests pin the behaviour (test_fps_reports_action_freq_when_resampling, test_fps_reports_native_when_action_freq_none).

shuheng-liu

Second-pass review found one concrete bug, plus some details to confirm. Posting separately from the first review for clarity.

Bug: `emit_fps=True` crashes any mixture containing a VQA dataset

VQADataset.__getitem__ calls self._to_standard_data_format(item) (vqa/base.py:142), which calls _emit_optional_keys. The new fps block runs unconditionally:

if self.emit_fps:
    standard_item["fps"] = torch.tensor(int(self.meta.fps), dtype=torch.long)

But:

VQADatasetMetadata(DatasetMetadata) (lerobot_dataset.py:305-308) is pass.
DatasetMetadata (lerobot_dataset.py:253-302) defines features / image_keys / video_keys / camera_keys / names / shapes — no fps property.
VQADataset.create_meta builds an info dict with no "fps" key (vqa/base.py:82-91).

So self.meta.fps is a hard AttributeError for VQA samples. Combined with emit_fps=True being the new default and applied through the mixture's dm.emit_fps, any heterogeneous mixture that includes a VQA dataset crashes at the first sample fetch with the default config.

This isn't covered by the new tests: _DummyBaseDataset in tests/datasets/test_optional_keys.py:84-92 stubs meta = SimpleNamespace(fps=meta_fps, info=...), so it never exercises the real VQADatasetMetadata path.

Heterogeneous co-training (VLA + VQA) is one of the headline OpenTau features called out in CLAUDE.md ("heterogeneous-dataset co-training") — this would break the default flow for any team using that recipe.

Fix options:

Cheapest: gate the fps emission on the metadata actually having an fps attribute:
```
if self.emit_fps and hasattr(self.meta, "fps"):
    ...
```
Downside: silent — a VQA-in-mixture run produces samples without fps, and after collate that breaks default_collate when mixed with LeRobot samples in the same batch.
Better: VQA explicitly emits a pad row (fps=torch.tensor(0, dtype=torch.long), fps_is_pad=torch.tensor(True)) so the batch always has consistent keys and the policy's *_is_pad path drops the segment. Either by overriding _emit_optional_keys in VQA, or by defining VQADatasetMetadata.fps = None and checking-None in the base emitter.
Best — but ties into concern #1 from the first review: default emit_fps=False, then the VQA path is a no-op and existing configs don't change. The bug becomes opt-in.

Confirmed (not bugs, just noting)

_to_standard_data_format → _emit_optional_keys → bf16 cast order is fine: fps is torch.long, fps_is_pad is torch.bool, neither is is_floating_point, so the cast loop skips them.
fps_is_pad = torch.tensor(False) is a 0-d scalar matching the other _is_pad flags emitted in the same method (speed_is_pad, quality_is_pad, mistake_is_pad). default_collate will stack them to (B,) consistently — no shape mismatch with the eval-side (B,) emission in add_eval_metadata.
resolve_delta_timestamps with action_freq=ds_meta.fps does produce per-dataset native-frame timestamps as advertised — confirmed against tests/datasets/test_delta_timestamps.py::test_resolve_delta_timestamps_native_fps_differs_per_dataset. The semantics change (frame-stride vs real-time-stride) for subsampled action_delta_indices was already implicit in concern #1.
Test coverage for the four prepare_metadata updates is good (Robot < fps < Control ordering, padded-omit, missing-key, partial-segments) — the only gap is the VQA-mixture path above.

Generated by Claude Code

`_emit_optional_keys` accessed `self.meta.fps` unconditionally when `emit_fps=True`, raising AttributeError on VQA datasets — whose `VQADatasetMetadata` (inheriting `pass` from `DatasetMetadata`) has no `fps` property. Any heterogeneous VLA + VQA mixture trained under the default config crashed at the first VQA sample fetch. Surface `fps -> None` on the `DatasetMetadata` base (overridden to int on `LeRobotDatasetMetadata`), then have `_emit_optional_keys` pad out the row (`fps=0, fps_is_pad=True`) when meta.fps is None. The policy's `prepare_metadata` already drops the `FPS:` segment for padded samples, so heterogeneous batches stay schema-aligned and the prefix omits the fps token for VQA rows.

shuheng-liu · 2026-05-23T03:00:13Z

Good catch — real bug, fixed in 090b436.

What broke

_emit_optional_keys accessed self.meta.fps unconditionally when emit_fps=True. VQADatasetMetadata inherits pass from DatasetMetadata, neither of which defined fps, so every VQA sample fetch raised AttributeError. Heterogeneous VLA + VQA mixtures (one of OpenTau's headline use cases) crashed at the first sample under the default config.

The fix — your option 2

Surfaced fps: int | None as a property on DatasetMetadata (returns None by default; LeRobotDatasetMetadata.fps already overrode it to return the int from info["fps"], no change there).
_emit_optional_keys branches on native_fps is None and emits fps=0, fps_is_pad=True for VQA samples. The policy's prepare_metadata already drops the FPS: segment when fps_is_pad is True, so VQA rows get no fps token in the prefix while LeRobot rows in the same batch get their real rate. The pad row keeps the batch schema-aligned, which is the bit default_collate cares about.
Test coverage added in TestFpsEmissionVQA (4 tests): base + VQA metadata return None, VQA sample emits pad, VQA pad ignores action_freq (the rate would be meaningless for a sample with no actions), and heterogeneous VLA + VQA default-collate produces fps=[30, 0] / fps_is_pad=[False, True]. These exercise the real VQADatasetMetadata class, not the SimpleNamespace stub — closing the gap you flagged.

On the other findings

Your three "confirmed" items match what I'd verified locally — the bf16-cast loop skips torch.long/torch.bool so fps survives intact, the 0-d fps_is_pad collates to (B,) consistently with the eval-side emission, and resolve_delta_timestamps does land on per-dataset native frames.

The default question from your first review (concern #1) is still open — I haven't flipped anything yet, waiting on your call between A (emit_fps=False default), B (action_freq=30.0 default), or both.

shuheng-liu · 2026-05-23T03:10:49Z

Decision on the defaults question (review concern #1): keep both as-is.

DatasetMixtureConfig.action_freq stays None (mixed-frequency is the default).
DatasetMixtureConfig.emit_fps and EnvMetadataConfig.emit_fps both stay True.

Rationale: the new defaults reflect the intended new behaviour — mixed-freq training with per-sample fps conditioning works end-to-end with zero config ceremony. The resume-from-checkpoint footgun for pre-PR runs is documented in the PR body's "Backward-compat note for reviewers"; users in that case set both emit_fps=False explicitly.

No code change needed for this. Closing the loop so reviewers know the trade-off was deliberate.

shuheng-liu

Third pass — three smaller observations on top of the two reviews above. None are blockers if concern #1 (default emit_fps=False) is adopted; they apply mostly to the emit_fps=True future state.

1. `has_metadata` is now always True under `emit_fps=True` — silently changes ablation semantics

The pi07 low-level uses _global_or_branch_decisions over has_metadata_local = bool(metadata_masks.any()) (pi07/low_level/modeling_pi07_low_level.py:1387-1390), and the HL planner gates on metadata_masks.any() (pi07/high_level_planner/modeling_pi07_high_level.py:1005). Pre-PR, a run with metadata_drop_all_prob=1.0 (or just no metadata in the dataset) produced batches where every sample's metadata was fully padded, the has_metadata branch evaluated to False, and the metadata block + trailing ";\n " were dropped entirely.

With emit_fps=True, fps tokens are always non-padded, so metadata_masks.any() is always True. The metadata block (now containing only "fps: N, ") is always included.

Consequences:

metadata_drop_all_prob=1.0 no longer means "no metadata at all" — it means "fps-only metadata". Anyone running this as a no-metadata ablation needs to additionally set emit_fps=False on the mixture (and EnvMetadataConfig.emit_fps=False at eval) to recover the previous semantics.
The CLAUDE.md rule #5 OR-reduction in _global_or_branch_decisions still fires correctly — it just always agrees True across ranks now. Not a correctness regression, just dead branch for the True-always case.

Worth a sentence in the PR body's "Backward-compat note" section.

2. `metadata_max_length=52` is tighter now — worth a sanity-check

All four pi07 / pi07_paligemma configs default to metadata_max_length: int = 52. Pre-PR worst-case prefix: "Metadata: Speed: 100, Quality: 5, Mistake: True, Robot: <name>, Control: <mode>, " — roughly 22-26 BPE tokens with Gemma 3 / PaliGemma tokenizers depending on robot name. Post-PR adds " fps: NN, " for another 3-4 tokens. Still fits in 52, but the headroom for long robot_type strings (e.g. "so100_follower_arm") is narrower.

Suggest a one-line note in the config docstring or, better, a runtime warning when the tokenized prefix hits truncation=True with truncation=True, max_length=52 (the tokenizer silently drops tail tokens today — fps would survive since it slots before Control:, but Control: would be the first thing truncated).

3. Test gap: mixed-`fps_is_pad` batches

The new tests cover:

All samples non-pad (test_fps_present_emits_lowercase_segment)
All samples pad (test_fps_padded_omits_segment)
Key absent (test_fps_key_absent_omits_segment)

Not covered: a single batch where sample 0 has fps_is_pad=False and sample 1 has fps_is_pad=True. Today's production paths don't produce this mix (dataloader emits non-pad unconditionally, add_eval_metadata emits non-pad for all envs at once), so the gap is theoretical. But the policy code's per-sample zip(...) iteration is the only thing keeping it correct — a regression that broadcasts a batch-wide fps_is_pad would slip through current tests. One mixed-batch assertion would close that hole cheaply.

That's the end of my passes. Summary across all three reviews:

#	Severity	Finding
1	Blocker on default	`emit_fps=True` + `action_freq=None` defaults silently change behaviour for resumed checkpoints and existing configs
2	Blocker bug	VQA mixtures crash with `AttributeError` on `self.meta.fps`
3	Note in PR body	`has_metadata` becomes always-True; ablation semantics shift
4	Sanity check	`metadata_max_length=52` headroom narrower with fps
5	Test gap	No mixed-`fps_is_pad` batch coverage

Findings 1 and 2 want addressing before this lands. 3-5 are nice-to-haves.

Generated by Claude Code

… shift - Add `test_fps_mixed_pad_across_batch` in all three fps test classes — closes the per-sample fps_is_pad coverage gap, now reached in production by heterogeneous LeRobot + VQA mixtures where the VQA pad row (fps=0, fps_is_pad=True) sits next to a LeRobot row in the same batch. - concepts.rst: document the metadata_drop_all_prob=1.0 semantic shift under emit_fps=True (no longer "no metadata at all" — it's "fps-only metadata") and note metadata_max_length=52 headroom.

shuheng-liu · 2026-05-23T03:18:20Z

Addressed in 1467b57:

#1 — `metadata_drop_all_prob=1.0` semantics under `emit_fps=True`

Real semantic shift. Added a .. note:: block to concepts.rst (in the dropout table section) explaining that under the new default, metadata_drop_all_prob=1.0 means "fps-only metadata segment" instead of "no metadata segment at all". The note tells ablation users to also flip emit_fps=False on both sides to recover pre-PR behaviour. The has_metadata branch and the OR-reduction are still correct — they just always agree True for LeRobot samples now.

#2 — `metadata_max_length=52` headroom

Added a one-line note in concepts.rst flagging that the new FPS: N, segment costs ~3-4 tokens, that the underlying tokenizer call uses truncation=True silently, and that users should bump metadata_max_length if they start seeing Control: get clipped. Skipped a runtime warning for now — too aggressive for a feature that "still fits in 52" as you noted.

#3 — Mixed-`fps_is_pad` batch coverage

Added test_fps_mixed_pad_across_batch to all three fps test classes (test_pi07_cpu.py ×2 via the [low, high] parametrize, test_pi07_paligemma_high_level_planner.py, test_pi07_paligemma_low_level.py). Each test builds a 2-sample batch with sample 0 at fps_is_pad=False and sample 1 at fps_is_pad=True and asserts:

Sample 0 keeps its FPS: 30, segment.
Sample 1's FPS: substring is absent.

This is the exact production path your second-pass review fix triggered — a heterogeneous LeRobot + VQA batch where the VQA pad row sits next to a LeRobot real-rate row.

22 fps tests pass; pre-commit clean.

shuheng-liu

Fourth pass — one substantive new finding and a verification of the resume-checkpoint scenario.

Material finding: switching to native fps silently changes prediction-horizon semantics for non-30-Hz datasets

resolve_delta_timestamps with the new action_freq=None substitutes ds_meta.fps, then video sampling does nearest-neighbor at frame_indices_soft.round() (lerobot_dataset.py:1906). Comparing the three regimes against the old action_freq=30.0 default for chunk_size=50:

Native fps	Old default (`action_freq=30`)	New default (`action_freq=None`)	Effect
20 Hz	`delta_ts = [i/30]` → `round(i20/30)` = `[0,1,1,2,3,3,4,5,5,…]` (duplicates*)	`delta_ts = [i/20]` → `[0,1,2,3,…]` consecutive	Bug fix — old default produced silently duplicated action targets
30 Hz	`[0,1,2,…]`	`[0,1,2,…]`	identical
60 Hz	`delta_ts = [i/30]` → `round(i*60/30)` = `[0,2,4,6,…]` (subsampled) → 1.67 s horizon	`delta_ts = [i/60]` → `[0,1,2,3,…]` consecutive → 0.83 s horizon	Prediction horizon shrinks 2×

Two consequences the PR body doesn't acknowledge:

The PR is also a bug fix for sub-30-Hz datasets. Anyone training on 10/15/20 Hz data with the old default was getting silently duplicated/skipped action targets. The "mixed-frequency" framing in the description undersells this — even single-dataset 20-Hz runs benefit. Worth a sentence in the PR body.
For >30-Hz datasets, the prediction horizon collapses. A 60-Hz dataset run that used to predict 1.67 s of future motion now predicts 0.83 s. That's a different task — the model's action distribution, evaluation rollout length, and downstream planner timing all shift. This is a real semantic break for anyone using high-rate datasets (industrial arms, force sensors, anything ≥50 Hz), distinct from concern #1.

Together with concerns #1 and #2 (defaults + VQA crash), this strengthens the case for an opt-in default. A user who explicitly wrote action_freq=30.0 in their config gets that value preserved on resume — but a user who never set it (relied on the implicit 30 Hz default) now gets a horizon shift they didn't ask for.

Verified: resumed pre-PR checkpoints silently flip `emit_fps` on

Checked cfg.save_pretrained → _save_pretrained (configs/train.py:310-319) — it uses draccus.dump(self, f), which serialises only fields present on the dataclass at save time. So a checkpoint saved before this PR has no emit_fps key in its JSON. When that JSON is reloaded after the PR, draccus instantiates DatasetMixtureConfig with the new default emit_fps=True, and EnvMetadataConfig.emit_fps=True similarly. The resumed run starts injecting "fps: N, " into the prefix on step 1 even though the previous training loop never tokenized fps.

This is the exact failure mode CLAUDE.md flags as "ML bugs hide in stochasticity" — a loss curve that looks plausible because the model adapts, masking that the policy is being silently retrained on a slightly different input distribution. The fix is mechanical: default emit_fps=False, document opt-in.

No new findings on

Key collisions: standard_item["fps"] doesn't conflict with anything else in _to_standard_data_format (built fresh per sample at lerobot_dataset.py:832).
bf16 cast: confirmed once more that the cast loop at lerobot_dataset.py:861-864 only touches is_floating_point, so fps (long) and fps_is_pad (bool) pass through unchanged.

Updated summary across all four passes

#	Severity	Finding
1	Blocker (default)	`emit_fps=True` + `action_freq=None` flip behaviour silently for any existing config or resumed checkpoint
2	Blocker (bug)	VQA mixtures crash with `AttributeError` on `self.meta.fps`
3	Note in PR body	The PR is also a bug fix for sub-30-Hz datasets — currently un-credited
4	Note in PR body	Prediction-horizon shrinks for >30-Hz datasets — distinct from #1, not in the description
5	Nice-to-have	`has_metadata` always True → ablation semantics shift
6	Nice-to-have	`metadata_max_length=52` headroom narrower
7	Test gap	No mixed-`fps_is_pad` batch coverage

That's the natural stopping point. Findings #1 and #2 still need addressing; #3 and #4 would tighten the PR description; #5-7 are polish.

Generated by Claude Code

`DatasetMixtureConfig.emit_fps` and `EnvMetadataConfig.emit_fps` now default to `False` so pre-PR checkpoints resume cleanly — the policy's metadata prefix doesn't gain an unfamiliar `FPS:` segment until the user explicitly opts in. `action_freq=None` (mixed-frequency training) stays the default; opt-in to `emit_fps=True` to surface per-sample fps conditioning to the policy. BaseDataset's dm-None fallback matched to False for the "no mixture config" path (VQA-only / unit tests). Docs + tests updated to reflect the new default — existing fps tests now set `emit_fps=True` explicitly when they exercise the emission path.

shuheng-liu · 2026-05-23T03:36:29Z

Flipped both emit_fps defaults to False in 60bc00c — your fourth-pass case (sub-30Hz silently-improves + 60Hz horizon shrinks + resume implicitly enables fps tokenization) was compelling enough that the previous "keep True" call was wrong. action_freq=None stays the new default (mixed-frequency is the intended new behaviour); emit_fps is the opt-in barrier.

Recap of what's now opt-in vs. silently changed

Knob	Pre-PR default	Post-PR default	Resumed-checkpoint behaviour
`action_freq`	implicit `30.0`	`None`	Sub-30Hz: silently improved (no more duplicate-target bug). 30Hz: byte-identical. >30Hz: horizon shrinks — users pin `action_freq=30.0` to preserve.
`DatasetMixtureConfig.emit_fps`	n/a	`False` (new)	No `FPS:` segment added to prefix. Resume is clean.
`EnvMetadataConfig.emit_fps`	n/a	`False` (new)	Same — eval prefix unchanged.

PR body's "Backward-compat notes" section now spells out the sub-30Hz fix and the >30Hz horizon-shrink as distinct items (your findings #3 and #4).

Tests updated

_DummyBaseDataset fixture default flipped to emit_fps=False to match production; tests that exercise the emission path now set emit_fps=True explicitly.
test_emit_fps_defaults_to_false / test_emit_fps_can_be_enabled replace the pre-flip counterparts.
add_eval_metadata tests: the "all-None default" now asserts no keys are injected (was "only fps/fps_is_pad"); the fps-emission tests opt in to emit_fps=True explicitly.

764 CPU tests pass; pre-commit clean. Picking up resumed pre-PR checkpoints is now a true no-op for the prefix.

That should close the third-pass items (#3 ablation note still in concepts.rst — still applies under explicit emit_fps=True runs) and the fourth-pass blockers.

shuheng-liu

Re-reviewed after the new commits. Both blockers from earlier passes are properly addressed, plus one extra correctness fix I missed.

Finding	Commit	Resolution
#1 Default-flip silent backward incompat	`60bc00c`	`emit_fps=False` on both `DatasetMixtureConfig` and `EnvMetadataConfig`; `BaseDataset` `dm-None` fallback matched to `False`
#2 VQA crash	`090b436`	`DatasetMetadata.fps -> None` base property, `LeRobotDatasetMetadata` overrides to `int`, `_emit_optional_keys` pads (`fps=0, fps_is_pad=True`) when meta.fps is None — heterogeneous batches stay schema-aligned
#5 `has_metadata` always-True semantic shift	`1467b57`	Documented in `concepts.rst`
#6 `metadata_max_length=52` headroom	`1467b57`	Documented
#7 Mixed-pad batch coverage	`1467b57`	`test_fps_mixed_pad_across_batch` added in all three fps test classes
Cosmetic: lowercase `fps:`	`fe04ffd`	Capitalized to `FPS:` for sibling consistency

Extra fix 7236b5a (caught by author, not by me): _emit_optional_keys now emits the effective per-sample fps (action_freq if set else meta.fps) rather than always the native rate. Previously, pinning action_freq=30.0 on a 50-Hz mixture would tokenize FPS: 50 while the chunk had been resampled to 30 Hz — a real prefix-vs-chunk-timing misalignment. Nice find.

action_freq=None default stays. With emit_fps=False now opt-in, the prefix doesn't change for resumed checkpoints, and the prediction-horizon shift on >30-Hz datasets (finding #4 from pass 4) becomes a deliberate config decision rather than a silent default flip — acceptable.

CI: pre-commit and check-checklist green; CPU tests still running. Lifting my hold — looks good to merge once CPU tests pass.

Generated by Claude Code

shuheng-liu added the feature New feature or request label May 23, 2026

shuheng-liu self-assigned this May 23, 2026

shuheng-liu commented May 23, 2026

View reviewed changes

style(pi07): capitalize fps header to FPS for sibling consistency

fe04ffd

shuheng-liu commented May 23, 2026

View reviewed changes

shuheng-liu added 2 commits May 22, 2026 19:54

docs(concepts): document fps in standard data format

07ce696

shuheng-liu commented May 23, 2026

View reviewed changes

shuheng-liu added the bug Something isn't working label May 23, 2026

shuheng-liu commented May 23, 2026

View reviewed changes

shuheng-liu marked this pull request as ready for review May 23, 2026 03:45

shuheng-liu merged commit b1e1cde into main May 23, 2026
8 checks passed

shuheng-liu deleted the claude/awesome-liskov-941c29 branch May 23, 2026 03:46

This was referenced May 23, 2026

fix(scripts): support action_freq=None in fit_fast_tokenizer manual sampler #324

Merged

fix(datasets): allow action_freq=None in WeightedDatasetMixture signature #325

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datasets): mixed-frequency training + fps metadata for pi07#323

feat(datasets): mixed-frequency training + fps metadata for pi07#323
shuheng-liu merged 7 commits into
mainfrom
claude/awesome-liskov-941c29

shuheng-liu commented May 23, 2026 •

edited

Loading

Uh oh!

shuheng-liu left a comment

Uh oh!

shuheng-liu commented May 23, 2026

Uh oh!

shuheng-liu left a comment

Uh oh!

shuheng-liu commented May 23, 2026

Uh oh!

shuheng-liu commented May 23, 2026

Uh oh!

shuheng-liu left a comment

Uh oh!

shuheng-liu commented May 23, 2026

Uh oh!

shuheng-liu left a comment

Uh oh!

shuheng-liu commented May 23, 2026

Uh oh!

shuheng-liu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuheng-liu commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Backward-compat notes for reviewers

How it was tested

How to checkout & try? (for the reviewer)

Checklist

Note: Before submitting this PR, please read the contributor guideline.

Uh oh!

shuheng-liu left a comment

Choose a reason for hiding this comment

1. Two defaults silently flip behaviour for existing configs / resumed checkpoints (main concern)

2. fps is emitted on every sample for policies that never read it

3. Minor

Uh oh!

shuheng-liu commented May 23, 2026

#1 — defaults

#2 — fps emitted for non-pi07 policies

#3 — nits

Separate bug surfaced while re-reading — fixed in 7236b5a

Uh oh!

shuheng-liu left a comment

Choose a reason for hiding this comment

Bug: emit_fps=True crashes any mixture containing a VQA dataset

Confirmed (not bugs, just noting)

Uh oh!

shuheng-liu commented May 23, 2026

What broke

The fix — your option 2

On the other findings

Uh oh!

shuheng-liu commented May 23, 2026

Uh oh!

shuheng-liu left a comment

Choose a reason for hiding this comment

1. has_metadata is now always True under emit_fps=True — silently changes ablation semantics

2. metadata_max_length=52 is tighter now — worth a sanity-check

3. Test gap: mixed-fps_is_pad batches

Uh oh!

shuheng-liu commented May 23, 2026

#1 — metadata_drop_all_prob=1.0 semantics under emit_fps=True

#2 — metadata_max_length=52 headroom

#3 — Mixed-fps_is_pad batch coverage

Uh oh!

shuheng-liu left a comment

Choose a reason for hiding this comment

Material finding: switching to native fps silently changes prediction-horizon semantics for non-30-Hz datasets

Verified: resumed pre-PR checkpoints silently flip emit_fps on

No new findings on

Updated summary across all four passes

Uh oh!

shuheng-liu commented May 23, 2026

Recap of what's now opt-in vs. silently changed

Tests updated

Uh oh!

shuheng-liu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shuheng-liu commented May 23, 2026 •

edited

Loading

2. `fps` is emitted on every sample for policies that never read it

Separate bug surfaced while re-reading — fixed in `7236b5a`

Bug: `emit_fps=True` crashes any mixture containing a VQA dataset

1. `has_metadata` is now always True under `emit_fps=True` — silently changes ablation semantics

2. `metadata_max_length=52` is tighter now — worth a sanity-check

3. Test gap: mixed-`fps_is_pad` batches

#1 — `metadata_drop_all_prob=1.0` semantics under `emit_fps=True`

#2 — `metadata_max_length=52` headroom

#3 — Mixed-`fps_is_pad` batch coverage

Verified: resumed pre-PR checkpoints silently flip `emit_fps` on