Skip to content

fix(pi07): align text attention with paper §VI.B + Fig. 19#236

Merged
shuheng-liu merged 3 commits into
feat/pi07from
claude/stupefied-roentgen-0c1d8f
May 2, 2026
Merged

fix(pi07): align text attention with paper §VI.B + Fig. 19#236
shuheng-liu merged 3 commits into
feat/pi07from
claude/stupefied-roentgen-0c1d8f

Conversation

@shuheng-liu
Copy link
Copy Markdown
Member

What this does

Fixes the same family of attention-mask bugs in both pi07 planners that PR #235 addressed for pi0.6, plus an order divergence from Fig. 19 of the pi0.7 paper.

The pi0.7 paper §VI.B says:

"We employ a block-causal masking scheme, such that the observation tokens and the subgoal image tokens use bidirectional attention within themselves, and goal-image tokens can additionally attend the observations. The following text tokens use causal attention (see attention mask visualization in the appendix)."

Fig. 19 (Appendix B) further specifies:

"When image goals are present, we include them as an additional block-causal bidirectional block, after the text prompt."

The current implementation violates both rules in several places.

Bugs fixed

  1. Language tokens were bidirectional in both embed_prefixs — paper requires causal. Same bug as PR fix(pi06): align with π0.6 model card paper #235. The 256-token prompt was emitted as [0] * num_lang_embs (continues the bidirectional image block), so every prompt token saw every other prompt token. Fixed to [1] * num_lang_embs.

  2. Other text spans had the same class of bug. Across the two planners, 13 additional text spans (metadata, ";\n " separator, "Updated Memory: " / "Subtask: " / "State: " / "Subgoal: " / "Action: " indicators, response/Subtask content, commas) were emitted as either fully-bidirectional [0] * N or prefix-LM [1] + [0] * (N-1). Most behaviorally significant: the response (Subtask) span in the low-level planner was prefix-LM, silently leaking future-token information into the response loss. All 13 spans now use causal [1] * N.

  3. Inference-time "Subtask: " injection in the high-level planner mirrors training. The autoregressive injection used i == 0 to open a new block only on the first token; switched to True for every token so inference matches the training pattern.

  4. Subgoal-images block moved to the tail of the prefix per Fig. 19. Previously sat between the response and metadata (i.e. inside the text prompt); now sits right before the optional discrete-action block, after all text. State tokens get their own bidirectional block ([1] + [0]*(T-1)) so they don't bleed into the now-causal "State: " indicator.

Out of scope (intentionally not fixed here)

How it was tested

  • pre-commit run --files <changed> — clean.
  • New tests/policies/test_pi07_paligemma_attention_layout.py (3 CPU tests, all pass):
    • TestPI07HighLevelPlannerAttentionLayout::test_embed_prefix_layout_has_causal_language_block
    • TestPI07LowLevelPlannerAttentionLayout::test_embed_prefix_layout_has_causal_language_block
    • TestPI07LowLevelPlannerAttentionLayout::test_embed_prefix_layout_has_causal_response_block
  • The new test file imports make_att_2d_masks from the high-level module (byte-identical implementation in both planners) so it runs on CPU CI without depending on the pre-existing low-level import bug.
  • Updated _train_prefix_total / _infer_prefix_total / _verify_pad_masks in tests/policies/test_pi07_paligemma_low_level_planner.py for the reordered prefix layout (gpu-marked integration test).
  • Full CPU suite: pre-existing failures only (verified by re-running with these changes stashed; counts identical).
  • GPU pytests + regression tests: deferred to nightly gpu_test.yml / regression_test.yml.

How to checkout & try? (for the reviewer)

git fetch origin claude/stupefied-roentgen-0c1d8f && git checkout claude/stupefied-roentgen-0c1d8f
pre-commit run --files \
  src/opentau/policies/pi07_paligemma/high_level_planner/modeling_pi07_high_level.py \
  src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py \
  tests/policies/test_pi07_paligemma_low_level_planner.py \
  tests/policies/test_pi07_paligemma_attention_layout.py
pytest -sx tests/policies/test_pi07_paligemma_attention_layout.py

GPU smoke (when on a CUDA box):

pytest -m gpu -n 0 tests/policies/test_pi07_paligemma_high_level_planner.py
pytest -m gpu -n 0 tests/policies/test_pi07_paligemma_low_level_planner.py

Checklist

  • I have added Google-style docstrings to important functions and ensured function parameters are typed.
  • My PR includes policy-related changes.
    • If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

The pi0.7 paper §VI.B says "the following text tokens use causal attention"
(same rule fixed in PR #235 for pi0.6). Both pi07 planners violated this for
language tokens (treated as bidirectional, lumped into the image block) and
for several other text spans (metadata, response, indicators, separators).
Fix every text span to open one causal block per token.

Also reorder the low-level prefix per Fig. 19's caption "image goals come
after the text prompt": the subgoal-images block now sits at the tail, just
before the optional discrete-action block, instead of in the middle of the
text prompt. State tokens get their own bidirectional block so they don't
bleed into the now-causal "State: " indicator.

Add CPU-runnable locking tests in a new attention-layout test file
(imports make_att_2d_masks from the working high-level module to side-step
the unrelated pre-existing VJEPA2VideoEncoder import bug in the low-level
module, tracked in #232 / #234).
@shuheng-liu shuheng-liu added the bug Something isn't working label May 2, 2026
@shuheng-liu shuheng-liu self-assigned this May 2, 2026
@shuheng-liu shuheng-liu added the bug Something isn't working label May 2, 2026
@shuheng-liu shuheng-liu marked this pull request as ready for review May 2, 2026 20:17
@shuheng-liu shuheng-liu merged commit e24ba88 into feat/pi07 May 2, 2026
5 of 6 checks passed
@shuheng-liu shuheng-liu deleted the claude/stupefied-roentgen-0c1d8f branch May 2, 2026 20:18
@claude claude Bot mentioned this pull request May 4, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant