fix(pi07): align text attention with paper §VI.B + Fig. 19 by shuheng-liu · Pull Request #236 · TensorAuto/OpenTau

shuheng-liu · 2026-05-02T19:47:35Z

What this does

Fixes the same family of attention-mask bugs in both pi07 planners that PR #235 addressed for pi0.6, plus an order divergence from Fig. 19 of the pi0.7 paper.

The pi0.7 paper §VI.B says:

"We employ a block-causal masking scheme, such that the observation tokens and the subgoal image tokens use bidirectional attention within themselves, and goal-image tokens can additionally attend the observations. The following text tokens use causal attention (see attention mask visualization in the appendix)."

Fig. 19 (Appendix B) further specifies:

"When image goals are present, we include them as an additional block-causal bidirectional block, after the text prompt."

The current implementation violates both rules in several places.

Bugs fixed

Language tokens were bidirectional in both embed_prefixs — paper requires causal. Same bug as PR fix(pi06): align with π0.6 model card paper #235. The 256-token prompt was emitted as [0] * num_lang_embs (continues the bidirectional image block), so every prompt token saw every other prompt token. Fixed to [1] * num_lang_embs.
Other text spans had the same class of bug. Across the two planners, 13 additional text spans (metadata, ";\n " separator, "Updated Memory: " / "Subtask: " / "State: " / "Subgoal: " / "Action: " indicators, response/Subtask content, commas) were emitted as either fully-bidirectional [0] * N or prefix-LM [1] + [0] * (N-1). Most behaviorally significant: the response (Subtask) span in the low-level planner was prefix-LM, silently leaking future-token information into the response loss. All 13 spans now use causal [1] * N.
Inference-time "Subtask: " injection in the high-level planner mirrors training. The autoregressive injection used i == 0 to open a new block only on the first token; switched to True for every token so inference matches the training pattern.
Subgoal-images block moved to the tail of the prefix per Fig. 19. Previously sat between the response and metadata (i.e. inside the text prompt); now sits right before the optional discrete-action block, after all text. State tokens get their own bidirectional block ([1] + [0]*(T-1)) so they don't bleed into the now-causal "State: " indicator.

Out of scope (intentionally not fixed here)

Classifier-free guidance on metadata at inference (paper §VII) — inference-time enhancement, not a model-correctness bug.
Component-dropout policies (paper §V) — belong in the dataset / training loop.
High-level state-as-discretized-text — paper §VI.B mandates linear-projection state for the low-level model only; the high-level extension here is an OpenTau choice, not a clear bug.
Pre-existing VJEPA2VideoEncoder import bug in modeling_pi07_low_level.py (tracked in test(pi07): drop VJEPA2 test + fix cpu_test.yml self-race (#234) #232 / ci: cpu_test.yml races itself on PRs (push + pull_request triggers, no concurrency group) #234).

How it was tested

pre-commit run --files <changed> — clean.
New tests/policies/test_pi07_paligemma_attention_layout.py (3 CPU tests, all pass):
- TestPI07HighLevelPlannerAttentionLayout::test_embed_prefix_layout_has_causal_language_block
- TestPI07LowLevelPlannerAttentionLayout::test_embed_prefix_layout_has_causal_language_block
- TestPI07LowLevelPlannerAttentionLayout::test_embed_prefix_layout_has_causal_response_block
The new test file imports make_att_2d_masks from the high-level module (byte-identical implementation in both planners) so it runs on CPU CI without depending on the pre-existing low-level import bug.
Updated _train_prefix_total / _infer_prefix_total / _verify_pad_masks in tests/policies/test_pi07_paligemma_low_level_planner.py for the reordered prefix layout (gpu-marked integration test).
Full CPU suite: pre-existing failures only (verified by re-running with these changes stashed; counts identical).
GPU pytests + regression tests: deferred to nightly gpu_test.yml / regression_test.yml.

How to checkout & try? (for the reviewer)

git fetch origin claude/stupefied-roentgen-0c1d8f && git checkout claude/stupefied-roentgen-0c1d8f
pre-commit run --files \
  src/opentau/policies/pi07_paligemma/high_level_planner/modeling_pi07_high_level.py \
  src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py \
  tests/policies/test_pi07_paligemma_low_level_planner.py \
  tests/policies/test_pi07_paligemma_attention_layout.py
pytest -sx tests/policies/test_pi07_paligemma_attention_layout.py

GPU smoke (when on a CUDA box):

pytest -m gpu -n 0 tests/policies/test_pi07_paligemma_high_level_planner.py
pytest -m gpu -n 0 tests/policies/test_pi07_paligemma_low_level_planner.py

Checklist

I have added Google-style docstrings to important functions and ensured function parameters are typed.
My PR includes policy-related changes.
- If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

The pi0.7 paper §VI.B says "the following text tokens use causal attention" (same rule fixed in PR #235 for pi0.6). Both pi07 planners violated this for language tokens (treated as bidirectional, lumped into the image block) and for several other text spans (metadata, response, indicators, separators). Fix every text span to open one causal block per token. Also reorder the low-level prefix per Fig. 19's caption "image goals come after the text prompt": the subgoal-images block now sits at the tail, just before the optional discrete-action block, instead of in the middle of the text prompt. State tokens get their own bidirectional block so they don't bleed into the now-causal "State: " indicator. Add CPU-runnable locking tests in a new attention-layout test file (imports make_att_2d_masks from the working high-level module to side-step the unrelated pre-existing VJEPA2VideoEncoder import bug in the low-level module, tracked in #232 / #234).

shuheng-liu added the bug Something isn't working label May 2, 2026

shuheng-liu self-assigned this May 2, 2026

shuheng-liu added the bug Something isn't working label May 2, 2026

shuheng-liu added 2 commits May 2, 2026 12:54

fix(pi07): replace non-ASCII ellipsis (…) with ASCII '...' in docstrings

4e7959a

fix(pi07): replace em-dashes with ASCII punctuation in docstrings

1568f66

shuheng-liu marked this pull request as ready for review May 2, 2026 20:17

shuheng-liu merged commit e24ba88 into feat/pi07 May 2, 2026
5 of 6 checks passed

shuheng-liu deleted the claude/stupefied-roentgen-0c1d8f branch May 2, 2026 20:18

shuheng-liu mentioned this pull request May 4, 2026

fix(pi07): align text attention with paper §VI.B + Fig. 19 (apply to pi07) #242

Merged

3 tasks

claude Bot mentioned this pull request May 4, 2026

chore(claude): learn from #242 #243

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pi07): align text attention with paper §VI.B + Fig. 19#236

fix(pi07): align text attention with paper §VI.B + Fig. 19#236
shuheng-liu merged 3 commits into
feat/pi07from
claude/stupefied-roentgen-0c1d8f

shuheng-liu commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuheng-liu commented May 2, 2026

What this does

Bugs fixed

Out of scope (intentionally not fixed here)

How it was tested

How to checkout & try? (for the reviewer)

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant