feat(datasets): mixed-frequency training + fps metadata for pi07#323
Conversation
- DatasetMixtureConfig.action_freq is now Optional[float] and defaults to None, disabling resampling so each dataset is sampled at its native fps. resolve_delta_timestamps substitutes ds_meta.fps when action_freq is None so a chunk of N actions is N consecutive native frames regardless of the dataset's rate. - New DatasetMixtureConfig.emit_fps (default True) makes __getitem__ emit the dataset's native meta.fps as a torch.long sample key (always non-pad, no participation in metadata_drop_*_prob). - All four pi07 / pi07_paligemma prepare_metadata implementations now tokenize fps with the lowercase "fps: N, " header between Robot: and Control:, omitted entirely when fps_is_pad is True. - New EnvMetadataConfig.emit_fps (default True) makes add_eval_metadata broadcast cfg.env.fps as the inference-time fps metadata key, matching the training contract end-to-end.
shuheng-liu
left a comment
There was a problem hiding this comment.
Reviewed the diff end-to-end against 28db260 (true PR base). Implementation looks correct and tests are thorough — the four prepare_metadata updates are consistent (Speed → Quality → Mistake → Robot → fps → Control), _emit_optional_keys correctly bypasses the dropout rolls for fps, and add_eval_metadata mirrors the training contract. A few design-level concerns worth flagging before un-drafting:
1. Two defaults silently flip behaviour for existing configs / resumed checkpoints (main concern)
The PR makes two backward-incompatible defaults at once:
DatasetMixtureConfig.action_freq:30.0→NoneDatasetMixtureConfig.emit_fps/EnvMetadataConfig.emit_fps: new, both defaultTrue
The combined effect on any existing config that doesn't explicitly set these:
- A heterogeneous mixture that was implicitly being resampled to 30 Hz now uses each dataset's native rate, so a single batch can mix samples spanning different real-time horizons (
chunk_sizeconsecutive native frames). For a 10 Hz + 30 Hz mixture, this is a 3× shift in the supervised time window for half the batch — not "no-op when you don't opt in". - The PaliGemma / Gemma 3 prefix gains a new
"fps: N, "segment the model has never seen. The PR body acknowledges this but treats it as a footnote: resuming any pre-PR checkpoint requires remembering to set bothemit_fpstoggles toFalse.
Suggest making at least one of these opt-in:
- Cheapest fix: default
emit_fps=Falseon bothDatasetMixtureConfigandEnvMetadataConfig. New runs that want fps tokenization flip the flag explicitly; old configs are byte-identical. - Or: keep
action_freq=30.0default and makeNonean explicit "I know I'm doing mixed-frequency" opt-in.
Doing both — flipping the resample default and injecting a new prefix segment — without an opt-in step is a footgun for anyone with in-flight runs.
2. fps is emitted on every sample for policies that never read it
emit_fps lives on the mixture, so a pi0 / pi05 / pi05_mem / pi06 run gets fps + fps_is_pad in every sample dict even though only pi07 / pi07_paligemma tokenize them. The cost is negligible (one long + one bool), and the policy-side batch.get(..., default_pad) makes it safe — just noting the asymmetry. Tying emission to the policy type would be over-engineered; the current "emit always, ignore if unused" is fine if the default is False per concern #1.
3. Minor
int(self.meta.fps)in_emit_optional_keys:meta.fpsis typedint(lerobot_dataset.py:487), so the cast is a no-op. No real bug, but if a future dataset ever stores a non-integer rate ininfo.json(29.97, 59.94) the tokenized prefix andresolve_delta_timestampswould diverge by a few percent. Probably fine to leave; the type contract is integer.- Lowercase
"fps:"vs capitalized siblings (Speed:,Robot:) — deliberate per the PR body, and "fps" is conventionally lowercase, so reads fine. Just confirming the choice is intentional. f"fps: {str(fps.item())}, "— redundantstr()inside f-string, but matches the existing speed/quality/mistake style. Don't change for consistency.
Concern #1 is the only one I'd block on. The rest are nits / FYI.
Generated by Claude Code
`_emit_optional_keys` previously emitted `meta.fps` regardless of the mixture's `action_freq`. When the user pinned `action_freq=30.0` on a mixture containing a 50 Hz dataset, `resolve_delta_timestamps` resampled the chunk to 30 Hz but the prefix still said `fps: 50, ` — telling the policy a different rate than the chunk actually carried. Capture `_action_freq` on `BaseDataset` and emit `action_freq if set else meta.fps` so the tokenized fps always matches the chunk's effective rate.
|
Thanks for the careful review. Responses inline. #1 — defaultsBoth defaults ( That said, the resume-from-checkpoint footgun is real. I see two viable knobs and want your call before flipping:
The combo "A and B both flipped" makes the whole feature opt-in (safest, but two extra config lines per user). Happy to push whichever you pick; my mild preference is A if I have to choose unilaterally — it preserves the planning-conversation intent that mixed-freq is the new default while still letting old checkpoints resume. #2 — fps emitted for non-pi07 policiesConfirmed. Cost is one #3 — nits
Separate bug surfaced while re-reading — fixed in 7236b5a
|
shuheng-liu
left a comment
There was a problem hiding this comment.
Second-pass review found one concrete bug, plus some details to confirm. Posting separately from the first review for clarity.
Bug: emit_fps=True crashes any mixture containing a VQA dataset
VQADataset.__getitem__ calls self._to_standard_data_format(item) (vqa/base.py:142), which calls _emit_optional_keys. The new fps block runs unconditionally:
if self.emit_fps:
standard_item["fps"] = torch.tensor(int(self.meta.fps), dtype=torch.long)But:
VQADatasetMetadata(DatasetMetadata)(lerobot_dataset.py:305-308) ispass.DatasetMetadata(lerobot_dataset.py:253-302) definesfeatures/image_keys/video_keys/camera_keys/names/shapes— nofpsproperty.VQADataset.create_metabuilds aninfodict with no"fps"key (vqa/base.py:82-91).
So self.meta.fps is a hard AttributeError for VQA samples. Combined with emit_fps=True being the new default and applied through the mixture's dm.emit_fps, any heterogeneous mixture that includes a VQA dataset crashes at the first sample fetch with the default config.
This isn't covered by the new tests: _DummyBaseDataset in tests/datasets/test_optional_keys.py:84-92 stubs meta = SimpleNamespace(fps=meta_fps, info=...), so it never exercises the real VQADatasetMetadata path.
Heterogeneous co-training (VLA + VQA) is one of the headline OpenTau features called out in CLAUDE.md ("heterogeneous-dataset co-training") — this would break the default flow for any team using that recipe.
Fix options:
-
Cheapest: gate the fps emission on the metadata actually having an
fpsattribute:if self.emit_fps and hasattr(self.meta, "fps"): ...
Downside: silent — a VQA-in-mixture run produces samples without
fps, and after collate that breaks default_collate when mixed with LeRobot samples in the same batch. -
Better: VQA explicitly emits a pad row (
fps=torch.tensor(0, dtype=torch.long),fps_is_pad=torch.tensor(True)) so the batch always has consistent keys and the policy's*_is_padpath drops the segment. Either by overriding_emit_optional_keysin VQA, or by definingVQADatasetMetadata.fps = Noneand checking-None in the base emitter. -
Best — but ties into concern #1 from the first review: default
emit_fps=False, then the VQA path is a no-op and existing configs don't change. The bug becomes opt-in.
Confirmed (not bugs, just noting)
_to_standard_data_format→_emit_optional_keys→ bf16 cast order is fine: fps istorch.long, fps_is_pad istorch.bool, neither isis_floating_point, so the cast loop skips them.fps_is_pad = torch.tensor(False)is a 0-d scalar matching the other_is_padflags emitted in the same method (speed_is_pad,quality_is_pad,mistake_is_pad).default_collatewill stack them to(B,)consistently — no shape mismatch with the eval-side(B,)emission inadd_eval_metadata.resolve_delta_timestampswithaction_freq=ds_meta.fpsdoes produce per-dataset native-frame timestamps as advertised — confirmed againsttests/datasets/test_delta_timestamps.py::test_resolve_delta_timestamps_native_fps_differs_per_dataset. The semantics change (frame-stride vs real-time-stride) for subsampledaction_delta_indiceswas already implicit in concern #1.- Test coverage for the four
prepare_metadataupdates is good (Robot < fps < Controlordering, padded-omit, missing-key, partial-segments) — the only gap is the VQA-mixture path above.
Generated by Claude Code
`_emit_optional_keys` accessed `self.meta.fps` unconditionally when `emit_fps=True`, raising AttributeError on VQA datasets — whose `VQADatasetMetadata` (inheriting `pass` from `DatasetMetadata`) has no `fps` property. Any heterogeneous VLA + VQA mixture trained under the default config crashed at the first VQA sample fetch. Surface `fps -> None` on the `DatasetMetadata` base (overridden to int on `LeRobotDatasetMetadata`), then have `_emit_optional_keys` pad out the row (`fps=0, fps_is_pad=True`) when meta.fps is None. The policy's `prepare_metadata` already drops the `FPS:` segment for padded samples, so heterogeneous batches stay schema-aligned and the prefix omits the fps token for VQA rows.
|
Good catch — real bug, fixed in 090b436. What broke
The fix — your option 2
On the other findingsYour three "confirmed" items match what I'd verified locally — the bf16-cast loop skips The default question from your first review (concern #1) is still open — I haven't flipped anything yet, waiting on your call between A ( |
|
Decision on the defaults question (review concern #1): keep both as-is.
Rationale: the new defaults reflect the intended new behaviour — mixed-freq training with per-sample fps conditioning works end-to-end with zero config ceremony. The resume-from-checkpoint footgun for pre-PR runs is documented in the PR body's "Backward-compat note for reviewers"; users in that case set both No code change needed for this. Closing the loop so reviewers know the trade-off was deliberate. |
shuheng-liu
left a comment
There was a problem hiding this comment.
Third pass — three smaller observations on top of the two reviews above. None are blockers if concern #1 (default emit_fps=False) is adopted; they apply mostly to the emit_fps=True future state.
1. has_metadata is now always True under emit_fps=True — silently changes ablation semantics
The pi07 low-level uses _global_or_branch_decisions over has_metadata_local = bool(metadata_masks.any()) (pi07/low_level/modeling_pi07_low_level.py:1387-1390), and the HL planner gates on metadata_masks.any() (pi07/high_level_planner/modeling_pi07_high_level.py:1005). Pre-PR, a run with metadata_drop_all_prob=1.0 (or just no metadata in the dataset) produced batches where every sample's metadata was fully padded, the has_metadata branch evaluated to False, and the metadata block + trailing ";\n " were dropped entirely.
With emit_fps=True, fps tokens are always non-padded, so metadata_masks.any() is always True. The metadata block (now containing only "fps: N, ") is always included.
Consequences:
metadata_drop_all_prob=1.0no longer means "no metadata at all" — it means "fps-only metadata". Anyone running this as a no-metadata ablation needs to additionally setemit_fps=Falseon the mixture (andEnvMetadataConfig.emit_fps=Falseat eval) to recover the previous semantics.- The CLAUDE.md rule #5 OR-reduction in
_global_or_branch_decisionsstill fires correctly — it just always agreesTrueacross ranks now. Not a correctness regression, just dead branch for the True-always case.
Worth a sentence in the PR body's "Backward-compat note" section.
2. metadata_max_length=52 is tighter now — worth a sanity-check
All four pi07 / pi07_paligemma configs default to metadata_max_length: int = 52. Pre-PR worst-case prefix: "Metadata: Speed: 100, Quality: 5, Mistake: True, Robot: <name>, Control: <mode>, " — roughly 22-26 BPE tokens with Gemma 3 / PaliGemma tokenizers depending on robot name. Post-PR adds " fps: NN, " for another 3-4 tokens. Still fits in 52, but the headroom for long robot_type strings (e.g. "so100_follower_arm") is narrower.
Suggest a one-line note in the config docstring or, better, a runtime warning when the tokenized prefix hits truncation=True with truncation=True, max_length=52 (the tokenizer silently drops tail tokens today — fps would survive since it slots before Control:, but Control: would be the first thing truncated).
3. Test gap: mixed-fps_is_pad batches
The new tests cover:
- All samples non-pad (
test_fps_present_emits_lowercase_segment) - All samples pad (
test_fps_padded_omits_segment) - Key absent (
test_fps_key_absent_omits_segment)
Not covered: a single batch where sample 0 has fps_is_pad=False and sample 1 has fps_is_pad=True. Today's production paths don't produce this mix (dataloader emits non-pad unconditionally, add_eval_metadata emits non-pad for all envs at once), so the gap is theoretical. But the policy code's per-sample zip(...) iteration is the only thing keeping it correct — a regression that broadcasts a batch-wide fps_is_pad would slip through current tests. One mixed-batch assertion would close that hole cheaply.
That's the end of my passes. Summary across all three reviews:
| # | Severity | Finding |
|---|---|---|
| 1 | Blocker on default | emit_fps=True + action_freq=None defaults silently change behaviour for resumed checkpoints and existing configs |
| 2 | Blocker bug | VQA mixtures crash with AttributeError on self.meta.fps |
| 3 | Note in PR body | has_metadata becomes always-True; ablation semantics shift |
| 4 | Sanity check | metadata_max_length=52 headroom narrower with fps |
| 5 | Test gap | No mixed-fps_is_pad batch coverage |
Findings 1 and 2 want addressing before this lands. 3-5 are nice-to-haves.
Generated by Claude Code
… shift - Add `test_fps_mixed_pad_across_batch` in all three fps test classes — closes the per-sample fps_is_pad coverage gap, now reached in production by heterogeneous LeRobot + VQA mixtures where the VQA pad row (fps=0, fps_is_pad=True) sits next to a LeRobot row in the same batch. - concepts.rst: document the metadata_drop_all_prob=1.0 semantic shift under emit_fps=True (no longer "no metadata at all" — it's "fps-only metadata") and note metadata_max_length=52 headroom.
|
Addressed in 1467b57: #1 —
|
shuheng-liu
left a comment
There was a problem hiding this comment.
Fourth pass — one substantive new finding and a verification of the resume-checkpoint scenario.
Material finding: switching to native fps silently changes prediction-horizon semantics for non-30-Hz datasets
resolve_delta_timestamps with the new action_freq=None substitutes ds_meta.fps, then video sampling does nearest-neighbor at frame_indices_soft.round() (lerobot_dataset.py:1906). Comparing the three regimes against the old action_freq=30.0 default for chunk_size=50:
| Native fps | Old default (action_freq=30) |
New default (action_freq=None) |
Effect |
|---|---|---|---|
| 20 Hz | delta_ts = [i/30] → round(i*20/30) = [0,1,1,2,3,3,4,5,5,…] (duplicates) |
delta_ts = [i/20] → [0,1,2,3,…] consecutive |
Bug fix — old default produced silently duplicated action targets |
| 30 Hz | [0,1,2,…] |
[0,1,2,…] |
identical |
| 60 Hz | delta_ts = [i/30] → round(i*60/30) = [0,2,4,6,…] (subsampled) → 1.67 s horizon |
delta_ts = [i/60] → [0,1,2,3,…] consecutive → 0.83 s horizon |
Prediction horizon shrinks 2× |
Two consequences the PR body doesn't acknowledge:
- The PR is also a bug fix for sub-30-Hz datasets. Anyone training on 10/15/20 Hz data with the old default was getting silently duplicated/skipped action targets. The "mixed-frequency" framing in the description undersells this — even single-dataset 20-Hz runs benefit. Worth a sentence in the PR body.
- For >30-Hz datasets, the prediction horizon collapses. A 60-Hz dataset run that used to predict 1.67 s of future motion now predicts 0.83 s. That's a different task — the model's action distribution, evaluation rollout length, and downstream planner timing all shift. This is a real semantic break for anyone using high-rate datasets (industrial arms, force sensors, anything ≥50 Hz), distinct from concern #1.
Together with concerns #1 and #2 (defaults + VQA crash), this strengthens the case for an opt-in default. A user who explicitly wrote action_freq=30.0 in their config gets that value preserved on resume — but a user who never set it (relied on the implicit 30 Hz default) now gets a horizon shift they didn't ask for.
Verified: resumed pre-PR checkpoints silently flip emit_fps on
Checked cfg.save_pretrained → _save_pretrained (configs/train.py:310-319) — it uses draccus.dump(self, f), which serialises only fields present on the dataclass at save time. So a checkpoint saved before this PR has no emit_fps key in its JSON. When that JSON is reloaded after the PR, draccus instantiates DatasetMixtureConfig with the new default emit_fps=True, and EnvMetadataConfig.emit_fps=True similarly. The resumed run starts injecting "fps: N, " into the prefix on step 1 even though the previous training loop never tokenized fps.
This is the exact failure mode CLAUDE.md flags as "ML bugs hide in stochasticity" — a loss curve that looks plausible because the model adapts, masking that the policy is being silently retrained on a slightly different input distribution. The fix is mechanical: default emit_fps=False, document opt-in.
No new findings on
- Key collisions:
standard_item["fps"]doesn't conflict with anything else in_to_standard_data_format(built fresh per sample atlerobot_dataset.py:832). - bf16 cast: confirmed once more that the cast loop at
lerobot_dataset.py:861-864only touchesis_floating_point, so fps (long) and fps_is_pad (bool) pass through unchanged.
Updated summary across all four passes
| # | Severity | Finding |
|---|---|---|
| 1 | Blocker (default) | emit_fps=True + action_freq=None flip behaviour silently for any existing config or resumed checkpoint |
| 2 | Blocker (bug) | VQA mixtures crash with AttributeError on self.meta.fps |
| 3 | Note in PR body | The PR is also a bug fix for sub-30-Hz datasets — currently un-credited |
| 4 | Note in PR body | Prediction-horizon shrinks for >30-Hz datasets — distinct from #1, not in the description |
| 5 | Nice-to-have | has_metadata always True → ablation semantics shift |
| 6 | Nice-to-have | metadata_max_length=52 headroom narrower |
| 7 | Test gap | No mixed-fps_is_pad batch coverage |
That's the natural stopping point. Findings #1 and #2 still need addressing; #3 and #4 would tighten the PR description; #5-7 are polish.
Generated by Claude Code
`DatasetMixtureConfig.emit_fps` and `EnvMetadataConfig.emit_fps` now default to `False` so pre-PR checkpoints resume cleanly — the policy's metadata prefix doesn't gain an unfamiliar `FPS:` segment until the user explicitly opts in. `action_freq=None` (mixed-frequency training) stays the default; opt-in to `emit_fps=True` to surface per-sample fps conditioning to the policy. BaseDataset's dm-None fallback matched to False for the "no mixture config" path (VQA-only / unit tests). Docs + tests updated to reflect the new default — existing fps tests now set `emit_fps=True` explicitly when they exercise the emission path.
|
Flipped both Recap of what's now opt-in vs. silently changed
PR body's "Backward-compat notes" section now spells out the sub-30Hz fix and the >30Hz horizon-shrink as distinct items (your findings #3 and #4). Tests updated
764 CPU tests pass; pre-commit clean. Picking up resumed pre-PR checkpoints is now a true no-op for the prefix. That should close the third-pass items (#3 ablation note still in concepts.rst — still applies under explicit |
shuheng-liu
left a comment
There was a problem hiding this comment.
Re-reviewed after the new commits. Both blockers from earlier passes are properly addressed, plus one extra correctness fix I missed.
| Finding | Commit | Resolution |
|---|---|---|
| #1 Default-flip silent backward incompat | 60bc00c |
emit_fps=False on both DatasetMixtureConfig and EnvMetadataConfig; BaseDataset dm-None fallback matched to False |
| #2 VQA crash | 090b436 |
DatasetMetadata.fps -> None base property, LeRobotDatasetMetadata overrides to int, _emit_optional_keys pads (fps=0, fps_is_pad=True) when meta.fps is None — heterogeneous batches stay schema-aligned |
#5 has_metadata always-True semantic shift |
1467b57 |
Documented in concepts.rst |
#6 metadata_max_length=52 headroom |
1467b57 |
Documented |
| #7 Mixed-pad batch coverage | 1467b57 |
test_fps_mixed_pad_across_batch added in all three fps test classes |
Cosmetic: lowercase fps: |
fe04ffd |
Capitalized to FPS: for sibling consistency |
Extra fix 7236b5a (caught by author, not by me): _emit_optional_keys now emits the effective per-sample fps (action_freq if set else meta.fps) rather than always the native rate. Previously, pinning action_freq=30.0 on a 50-Hz mixture would tokenize FPS: 50 while the chunk had been resampled to 30 Hz — a real prefix-vs-chunk-timing misalignment. Nice find.
action_freq=None default stays. With emit_fps=False now opt-in, the prefix doesn't change for resumed checkpoints, and the prediction-horizon shift on >30-Hz datasets (finding #4 from pass 4) becomes a deliberate config decision rather than a silent default flip — acceptable.
CI: pre-commit and check-checklist green; CPU tests still running. Lifting my hold — looks good to merge once CPU tests pass.
Generated by Claude Code
What this does
Adds opt-in mixed-frequency training for the dataset mixture and surfaces the dataset's native frame rate as an
fpsmetadata field that pi07 / pi07_paligemma tokenize into the prefix prompt. The four pieces:DatasetMixtureConfig.action_freq: float | None = None(wasfloat = 30.0). WhenNone(new default), each dataset is sampled at its native fps with no resampling —resolve_delta_timestampssubstitutesds_meta.fpsso a chunk ofNactions becomesNconsecutive native frames regardless of the dataset's rate. This is also a bug fix for sub-30-Hz single-dataset runs: the old 30 Hz default produced silently duplicated/skipped action targets for 10 / 15 / 20 Hz datasets via nearest-neighbor resampling. Setting a positive float preserves the legacy "resample everything to this rate" behaviour.DatasetMixtureConfig.emit_fps: bool = False(opt-in). WhenTrue,__getitem__returns the effective per-sample frame rate under thefpskey (torch.longscalar) withfps_is_pad=False— equal toaction_freqwhen set, elsemeta.fps. VQA samples (no temporal axis) emitfps=0, fps_is_pad=Trueso heterogeneous VLA + VQA batches stay schema-aligned.fpsis not rolled bymetadata_drop_*_prob— it's an intrinsic property of the chunk, not a noisy label.pi07 / pi07_paligemma
prepare_metadata× 4 files. All four implementations now tokenize fps with the"FPS: N, "header (uppercase to match the siblingSpeed:/Quality:/Mistake:/Robot:/Control:labels), slotted betweenRobot:andControl:(final order:Speed → Quality → Mistake → Robot → FPS → Control). Missing or padded fps drops the segment entirely (no header, no comma).EnvMetadataConfig.emit_fps: bool = False(opt-in, parallels the training-side toggle). WhenTrue,add_eval_metadatabroadcastscfg.env.fpsas thefpsmetadata key, matching the training-side contract.Backward-compat notes for reviewers
action_freqdefault changed from30.0toNone. Sub-30-Hz single-dataset runs (10 / 15 / 20 Hz) are silently improved (no more duplicate action targets via nearest-neighbor). Single-dataset runs at exactly 30 Hz are byte-identical. Above-30-Hz runs (50 / 60 Hz) see their prediction horizon shrink because the chunk now covers fewer real seconds at the higher native rate — pinaction_freq=30.0explicitly to preserve the old horizon.emit_fpsdefaults toFalseon bothDatasetMixtureConfigandEnvMetadataConfig. Pre-PR checkpoints resume cleanly because noFPS:segment is added to the policy prefix. New training runs that want fps conditioning setemit_fps=Trueon both sides (training + eval) explicitly.How it was tested
tests/configs/test_default.py—action_freq=Noneaccepted, default isNone,emit_fpsdefaultsFalse.tests/datasets/test_delta_timestamps.py—resolve_delta_timestampssubstitutes native fps whenaction_freq=None; two datasets with different fps produce different per-action timestamps.tests/datasets/test_optional_keys.py—_emit_optional_keysis a no-op under defaultemit_fps=False; when explicitly enabled, emits the effective fps (action_freqif set, elsemeta.fps) astorch.longwithfps_is_pad=False; fps stays non-pad even withmetadata_drop_*_prob=1.0; VQA samples emitfps=0, fps_is_pad=Trueso heterogeneous default-collate produces aligned(B,)tensors.tests/policies/test_pi07_cpu.py,tests/policies/test_pi07_paligemma_{high_level_planner,low_level}.py—prepare_metadatacontent + ordering:"FPS: N, "appears in the prefix betweenRobot:andControl:when present, segment omitted entirely when padded or absent, no stray", ,"artefacts. Mixed-fps_is_padbatches (LeRobot + VQA) honour per-sample padding row-by-row.tests/envs/test_configs.py,tests/envs/test_utils.py—EnvMetadataConfig.emit_fpsdefaultsFalse;add_eval_metadatabroadcastscfg.env.fpsastorch.longonly whenemit_fps=Trueis opted in.pre-commit run --all-files— clean.pytest -m "not gpu" -n auto— full suite passes (modulo pre-existing LIBERO-assets / timing-flaky failures unrelated to this branch).pytest -m "gpu" -n 0(RTX 5090) — 19 passed, 10 skipped, 0 failed (~2:46). All skips are pre-existing unconditional@pytest.mark.skipmarkers (memory or environment), unaffected by this branch.Nightly regression tests still need to land in the scheduled job.
How to checkout & try? (for the reviewer)
Run the focused test slice:
pytest -sx -m "not gpu" \ tests/configs/test_default.py \ tests/datasets/test_delta_timestamps.py \ tests/datasets/test_optional_keys.py \ tests/envs/test_configs.py tests/envs/test_utils.py \ tests/policies/test_pi07_cpu.py \ tests/policies/test_pi07_paligemma_high_level_planner.py \ tests/policies/test_pi07_paligemma_low_level.pyEyeball the new prefix string in a Python REPL (no model weights needed — stubbed tokenizer):
Checklist
Note: Before submitting this PR, please read the contributor guideline.