fix(pi07): gate optional prefix tokens in low- and high-level planners by shuheng-liu · Pull Request #229 · TensorAuto/OpenTau

shuheng-liu · 2026-05-02T05:49:54Z

What this does

Backports the #226-equivalent prefix-gating fixes from pi07_paligemma (commit b7d3a82 on claude/fix-issue-226-gt8z3, merging into fix/pi07_paligemma_fixes then into main) to pi07's low- and high-level planners.

Low-level planner (pi07/low_level_planner/modeling_pi07_low_level.py::embed_prefix) had three remaining divergences when all optional middle blocks (response / subgoal / metadata) are dropped:

State-end was unconditionally ", " (2 tokens) instead of ":\n".
";\n " prefix-end was unconditionally appended even with no optional content to terminate, dangling 2 spurious tokens after the state-end.
"Action: " indicator att_masks were [1] + [0]*(N-1) (single bidirectional block); pi05 uses [1]*N (one causal block per token), which shifts the cumsum at the indicator → first-discrete boundary and changes the discrete-action CE loss.

The fix computes has_any_optional once at the top of embed_prefix and gates the state-end string + the prefix-end block on it. Indicator att_masks switched to [1]*N to match pi05. (The earlier 2949e73 commit only ported the response/subgoal/metadata content gating, not the surrounding separators or indicator att_masks.)

High-level planner (pi07/high_level_planner/modeling_pi07_high_level.py::embed_prefix) had one of the three issues — ";\n " was emitted unconditionally, but it is the metadata → "Updated Memory:" separator. With no metadata there's nothing to terminate. Wrapped under if metadata_tokens is not None:. The "Updated Memory: " AR anchor itself stays unconditional because inference relies on it (memory_tokens is None at inference by design).

A byte-identity test against pi06 (analogous to the pi07_paligemma vs pi05 test in #226) is deliberately deferred to a follow-up issue (#230): pi07_low_level uses continuous state with separate State:/Action: indicators while pi06 inlines discretized state in the language prompt, and pi07_high_level uses a different prompt template ("Task: {task}, Past Memory: {past_mem}, State: {state}, " vs pi06's "Task: {task}<eos>State: {state}<eos>Actions:"), so byte-equivalence isn't feasible without aligning pi06's prompts (or building pi06_mem).

Drive-by fixes pulled from `main` to unblock CI

feat(ci): pin Claude workflows to Opus 4.7 1M, allow test-skip on doc fixes (#208) — workflow file must match main to satisfy GitHub workflow validation; was failing the Claude PR Review check.
fix(ci): gate extract-claude-lessons on head branch, not PR author (#212) — same reason.
ci: skip pi07_paligemma low-level test (broken at import) to unblock CPU CI (#209) — feat/pi07 was missing the --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py flag in cpu_test.yml; the test file fails at import (VJEPA2VideoEncoder removed in pi05_mem) and that crashed CPU CI.
ci: ignore broken pi07 test in GPU CI; dedupe TODO in cpu_test (#217) — same flag missing in gpu_test.yml + dedup.
fix(test): override actions stats shape in pi06 GPU smoke (PR #214) — test_complete_pi06_pipeline_integration_smoke was failing on real GPU because the fixture stats shape (50, 32) didn't match chunk_size=10. Pre-existing latent bug; surfaced when running the GPU suite end-to-end on mlbox.

Test fix surfaced by the GPU run

_verify_position_ids and _verify_action_expert_attention_mask in test_pi07_low_level_planner.py only excluded DISCRETE_ACTION_MAX_LENGTH from the cross-attention prefix slice. The model (post-2949e73) already excludes the "Action: " indicator length (3 tokens via Gemma 3 tokenizer) too, matching pi05's discrete_action_indicator_max_length logic — so the action expert sees the same prefix length at train and inference. The 3-token gap fails the suffix_position_ids assertion as soon as the GPU integration test runs. Latent break since 2949e73; never caught because the test is gated by @pytest.mark.gpu.

How it was tested

pre-commit run --all-files — all hooks pass.
CPU subset (pytest -m "not gpu" -n auto tests/policies/): 125 passed, 2 skipped, 1 pre-existing collection error in test_pi07_paligemma_low_level_planner.py (now excluded from CI by the --ignore flag pulled from ci: skip pi07_paligemma low-level test (broken at import) to unblock CPU CI #209/ci: ignore broken pi07 test in GPU CI; dedupe TODO in cpu_test #217).
Full GPU suite on mlbox (RTX 5090, 32 GB): pytest -m gpu -n 0 -v --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py tests/ → 20 passed, 2 skipped, 786 deselected, 0 failures, no OOM, 6m10s. Includes tests/policies/test_pi07_low_level_planner.py::test_complete_pi07_low_level_pipeline, test_pi07_high_level_planner.py::test_complete_pi07_high_level_planner_pipeline, the regression class, and pi05 / pi05_mem / pi06 / pi0 / value smoke tests.
Specifically tests/policies/test_pi07_cpu.py::TestEmbedPrefixConditionalGuards exercises the gating: test_all_optional_blocks_absent_skips_emission was updated to match the new layout (11 tokens, ":\n" state-end, no trailing prefix-end); test_mixed_subgoal_availability_zeros_pad_only_samples and test_response_mask_any_true_emits_block still pass because they populate at least one optional, so has_any_optional == True and the layout matches the old expectations.
Nightly regression tests (regression_test.yml) are deferred to the existing schedule.

How to checkout & try? (for the reviewer)

git fetch origin
git checkout claude/fix-pi07-prefix-gating
uv sync --extra dev
pytest -m "not gpu" -n auto tests/policies/test_pi07_cpu.py tests/policies/test_pi07_low_level_planner.py tests/policies/test_pi07_high_level_planner.py

GPU subset:

pytest -m gpu -n 0 -v --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py tests/

To re-verify the diff against the canonical pi07_paligemma fix:

git diff origin/feat/pi07..claude/fix-pi07-prefix-gating -- src/opentau/policies/pi07/low_level_planner/modeling_pi07_low_level.py
git show b7d3a82 -- src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py

Checklist

I have added Google-style docstrings to important functions and ensured function parameters are typed.
My PR includes policy-related changes.
- If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

GPU pytests run on mlbox (RTX 5090) — see "How it was tested" above. Nightly regression tests are deferred to the existing CI schedule. A follow-up issue (#230) is filed for pi06 prompt alignment + future byte-identity tests against pi07 planners.

Note: Before submitting this PR, please read the contributor guideline.

Backports the #226-equivalent prefix-gating fixes from pi07_paligemma (commit b7d3a82) to pi07's low- and high-level planners, plus updates the affected CPU regression test to match the new layout. Low-level (modeling_pi07_low_level.py::embed_prefix) had three remaining divergences in the all-optionals-dropped case: - state-end was unconditionally ", " (2 tokens) instead of pi05's ":\n". - ";\n " prefix-end was unconditionally appended even with no optional content to terminate, dangling 2 spurious tokens after the state-end. - "Action: " indicator att_masks were [1] + [0]*(N-1), making the indicator a single bidirectional block; pi05 uses [1]*N (one causal block per token), which shifts the cumsum at the indicator -> first- discrete boundary and changes the discrete-action CE loss. Now compute has_any_optional once at the top of embed_prefix and gate the state-end string + the prefix-end block on it. Indicator att_masks switched to [1]*N to match pi05. High-level (modeling_pi07_high_level.py::embed_prefix) had one of the three: ";\n " was emitted unconditionally as if a free-floating block, but it is the metadata -> "Updated Memory:" separator. With no metadata there is nothing to terminate. Wrapped under `if metadata_tokens is not None:`. The "Updated Memory: " AR anchor itself stays unconditional — inference relies on it (memory_tokens is None at inference by design). The existing CPU regression test `test_all_optional_blocks_absent_skips _emission` had the OLD (buggy) layout baked into its assertions; updated to match the new gated layout (11 tokens total, ":\n" state-end, no trailing prefix-end). A byte-identity test against pi06 (analogous to pi07_paligemma vs pi05 in #226) is deliberately deferred to a follow-up issue: pi07_low_level uses continuous state with separate State:/Action: indicators while pi06 inlines discretized state in the language prompt, and pi07_high_level uses a different prompt template, so byte-equivalence isn't feasible without aligning pi06's prompts (or building pi06_mem).

… fixes (#208) (cherry picked from commit b9b3841)

) (cherry picked from commit 35629df)

claude

Backport of #226's pi07_paligemma fix to pi07's planners. Logic mirrors the canonical fix and the corresponding pi05 reference. A few suggestions below — none blocking.

claude · 2026-05-02T06:03:24Z

[claude-review] summary for commit bb94735

No blocking issues found.

Re-reviewed after two test-only follow-up commits since the prior clean review at 1b0e3a9 — no changes to fix logic in modeling_pi07_low_level.py / modeling_pi07_high_level.py.

bb94735 updates _verify_position_ids and _verify_action_expert_attention_mask in tests/policies/test_pi07_low_level_planner.py to subtract action_lead_len + DISCRETE_ACTION_MAX_LENGTH from the cross-attention prefix slice. Mirrors the model's training-path logic at modeling_pi07_low_level.py:1576 and :1616 (matches pi05's discrete_action_indicator_max_length convention). action_lead_len is pulled from the existing _indicator_lens helper, so the test tracks future tokenizer changes automatically. Inference branch (inference_mode=True) correctly leaves the entire prefix in cross-attention scope, matching the model at :1815-1816. Tokenizer is plumbed through both call sites in forward_loss/generate_one_step paths.
080c190 cherry-picks a pre-existing pi06 GPU-smoke fix (test_complete_pi06_pipeline_integration_smoke): shared lerobot_dataset_metadata provides actions stats shaped (50, 32) for the default PI06Config(chunk_size=50), but this test uses chunk_size=10. The fix deep-copies the fixture stats and reshapes actions.{max,mean,min,std} to (chunk_size, 32) using a constant scalar from the original stats — same numeric values, correct shape. Test is @pytest.mark.gpu-gated, doesn't run in CPU CI.

All earlier findings remain resolved (per-token causal [1]*N indicator att_mask, has_subgoal style aligned to bool(m.any()), HL/LL gating-asymmetry note acknowledged as pre-existing and internally consistent with the existing metadata-block emission gate at :977). GPU + nightly regression tests are deferred to scheduled CI as documented in the PR body. Prior inline findings are all on commit 844c39e / dda4446 and have been addressed by the author's review-fix commits; GitHub's API does not allow dismissing "COMMENTED" reviews so they remain visible but stale.

shuheng-liu · 2026-05-02T06:05:29Z

@claude fix per review, except for the .github/workflows/claude-implement-fixes.yml comment

@claude

- addresses @claude (LL embed_prefix discrete_actions test): added test_discrete_actions_indicator_uses_per_token_causal_blocks pinning the [1]*N att_masks pattern on the "Action: " indicator + discrete action span, so a regression to [1]+[0]*(N-1) fails immediately. - addresses @claude (has_subgoal style nit): wrapped m.any() in bool(...) inside the any(...) generator so has_subgoal materializes Python bools matching has_response/has_metadata. tests: passed - pytest -m "not gpu" -n auto tests/policies/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…CPU CI (#209) (cherry picked from commit a9dbc4a)

(cherry picked from commit 16c46e8)

`test_complete_pi06_pipeline_integration_smoke` was added in #178 with chunk_size=10 but the shared `lerobot_dataset_metadata` fixture provides actions stats shaped (50, 32) — matching the default PI06Config. The Normalize buffer was therefore (50, 32) while the test batch's actions were (B, 10, 32), and MIN_MAX normalization in normalize.py:232 raised ``RuntimeError: The size of tensor a (10) must match the size of tensor b (50) at non-singleton dimension 1``. Pre-existing bug — never caught in CI because the test is gated by @pytest.mark.gpu and skipped in CPU runs. Surfaced now while validating this PR's SDPA + grad-ckpt port on a real GPU. Fix by deep-copying the fixture stats and reshaping the actions max/mean/min/std arrays from (50, 32) to (chunk_size, 32) before constructing the policy. Same numeric values, just the right shape. (cherry picked from commit 6425cb4)

`_verify_position_ids` and `_verify_action_expert_attention_mask` in test_pi07_low_level_planner.py only excluded DISCRETE_ACTION_MAX_LENGTH from the cross-attention prefix slice. The model's forward path (modeling_pi07_low_level.py:1575-1620) excludes both the "Action: " indicator len (3 tokens via Gemma 3 tokenizer) AND discrete_action_max_length, matching pi05's discrete_action_indicator_max_length logic — so the action expert sees the same prefix length at train and inference. The test was a latent break since 2949e73 ("Applying pi07_paligemma fixes to pi07") added the indicator subtraction to the model but never updated the test. Surfaces as soon as the GPU test test_complete_pi07_low_level_pipeline runs and asserts suffix_position_ids — actual was prefix_offset+(token), expected was prefix_offset+indicator_len+(token), 3-token gap on Gemma 3. Pull the indicator length from the existing _indicator_lens helper so the fix tracks future tokenizer changes automatically. Plumb the tokenizer through the two _verify_action_expert_attention_mask call sites.

Brings 16 main commits onto feat/pi07 in preparation for opening the feat/pi07 → main PR. One conflict (tests/policies/test_pi06.py: np.tile vs np.full for chunk-size action stats) — kept HEAD's np.tile fix from claude review feedback on #197, which preserves per-feature variance instead of collapsing every step to stats.flatten()[0]. Also dropped stale --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py from .github/workflows/cpu_test.yml and gpu_test.yml: main deleted the file in #234, but the --ignore line on feat/pi07 (added in #229) was on a different part of the file so git auto-merged without flagging it. Removing both now keeps the workflows clean.

shuheng-liu added bug Something isn't working model new model or model request labels May 2, 2026

shuheng-liu self-assigned this May 2, 2026

shuheng-liu mentioned this pull request May 2, 2026

pi06: align prompt template + add future byte-identity tests against pi07 planners #230

Open

shuheng-liu added 2 commits May 1, 2026 22:56

feat(ci): pin Claude workflows to Opus 4.7 1M, allow test-skip on doc…

ef68a93

… fixes (#208) (cherry picked from commit b9b3841)

fix(ci): gate extract-claude-lessons on head branch, not PR author (#212

844c39e

) (cherry picked from commit 35629df)

claude Bot reviewed May 2, 2026

View reviewed changes

github-actions Bot and others added 5 commits May 2, 2026 06:13

ci: skip pi07_paligemma low-level test (broken at import) to unblock …

23c0389

…CPU CI (#209) (cherry picked from commit a9dbc4a)

ci: ignore broken pi07 test in GPU CI; dedupe TODO in cpu_test (#217)

1b0e3a9

(cherry picked from commit 16c46e8)

shuheng-liu marked this pull request as ready for review May 2, 2026 06:37

shuheng-liu merged commit 62a5677 into feat/pi07 May 2, 2026
6 of 7 checks passed

shuheng-liu deleted the claude/fix-pi07-prefix-gating branch May 2, 2026 06:39

claude Bot mentioned this pull request May 2, 2026

chore(claude): learn from #229 #231

Closed

3 tasks

This was referenced May 2, 2026

ci: cpu_test.yml races itself on PRs (push + pull_request triggers, no concurrency group) #234

Closed

test(pi07): drop VJEPA2 test + fix cpu_test.yml self-race (#234) #232

Merged

shuheng-liu mentioned this pull request May 4, 2026

feat(pi07): π0.7 policy with Gemma 3 backbone and SpaceTime SigLIP video encoder #244

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pi07): gate optional prefix tokens in low- and high-level planners#229

fix(pi07): gate optional prefix tokens in low- and high-level planners#229
shuheng-liu merged 8 commits into
feat/pi07from
claude/fix-pi07-prefix-gating

shuheng-liu commented May 2, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented May 2, 2026 •

edited

Loading

Uh oh!

shuheng-liu commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuheng-liu commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Drive-by fixes pulled from main to unblock CI

Test fix surfaced by the GPU run

How it was tested

How to checkout & try? (for the reviewer)

Checklist

Note: Before submitting this PR, please read the contributor guideline.

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shuheng-liu commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shuheng-liu commented May 2, 2026 •

edited

Loading

Drive-by fixes pulled from `main` to unblock CI

claude Bot commented May 2, 2026 •

edited

Loading