fix(pi07): align text attention with paper §VI.B + Fig. 19#236
Merged
Conversation
The pi0.7 paper §VI.B says "the following text tokens use causal attention" (same rule fixed in PR #235 for pi0.6). Both pi07 planners violated this for language tokens (treated as bidirectional, lumped into the image block) and for several other text spans (metadata, response, indicators, separators). Fix every text span to open one causal block per token. Also reorder the low-level prefix per Fig. 19's caption "image goals come after the text prompt": the subgoal-images block now sits at the tail, just before the optional discrete-action block, instead of in the middle of the text prompt. State tokens get their own bidirectional block so they don't bleed into the now-causal "State: " indicator. Add CPU-runnable locking tests in a new attention-layout test file (imports make_att_2d_masks from the working high-level module to side-step the unrelated pre-existing VJEPA2VideoEncoder import bug in the low-level module, tracked in #232 / #234).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Fixes the same family of attention-mask bugs in both pi07 planners that PR #235 addressed for pi0.6, plus an order divergence from Fig. 19 of the pi0.7 paper.
The pi0.7 paper §VI.B says:
Fig. 19 (Appendix B) further specifies:
The current implementation violates both rules in several places.
Bugs fixed
Language tokens were bidirectional in both
embed_prefixs — paper requires causal. Same bug as PR fix(pi06): align with π0.6 model card paper #235. The 256-token prompt was emitted as[0] * num_lang_embs(continues the bidirectional image block), so every prompt token saw every other prompt token. Fixed to[1] * num_lang_embs.Other text spans had the same class of bug. Across the two planners, 13 additional text spans (
metadata,";\n "separator,"Updated Memory: "/"Subtask: "/"State: "/"Subgoal: "/"Action: "indicators, response/Subtask content, commas) were emitted as either fully-bidirectional[0] * Nor prefix-LM[1] + [0] * (N-1). Most behaviorally significant: the response (Subtask) span in the low-level planner was prefix-LM, silently leaking future-token information into the response loss. All 13 spans now use causal[1] * N.Inference-time
"Subtask: "injection in the high-level planner mirrors training. The autoregressive injection usedi == 0to open a new block only on the first token; switched toTruefor every token so inference matches the training pattern.Subgoal-images block moved to the tail of the prefix per Fig. 19. Previously sat between the response and metadata (i.e. inside the text prompt); now sits right before the optional discrete-action block, after all text. State tokens get their own bidirectional block (
[1] + [0]*(T-1)) so they don't bleed into the now-causal"State: "indicator.Out of scope (intentionally not fixed here)
VJEPA2VideoEncoderimport bug inmodeling_pi07_low_level.py(tracked in test(pi07): drop VJEPA2 test + fix cpu_test.yml self-race (#234) #232 / ci: cpu_test.yml races itself on PRs (push + pull_request triggers, no concurrency group) #234).How it was tested
pre-commit run --files <changed>— clean.tests/policies/test_pi07_paligemma_attention_layout.py(3 CPU tests, all pass):TestPI07HighLevelPlannerAttentionLayout::test_embed_prefix_layout_has_causal_language_blockTestPI07LowLevelPlannerAttentionLayout::test_embed_prefix_layout_has_causal_language_blockTestPI07LowLevelPlannerAttentionLayout::test_embed_prefix_layout_has_causal_response_blockmake_att_2d_masksfrom the high-level module (byte-identical implementation in both planners) so it runs on CPU CI without depending on the pre-existing low-level import bug._train_prefix_total/_infer_prefix_total/_verify_pad_masksintests/policies/test_pi07_paligemma_low_level_planner.pyfor the reordered prefix layout (gpu-marked integration test).gpu_test.yml/regression_test.yml.How to checkout & try? (for the reviewer)
git fetch origin claude/stupefied-roentgen-0c1d8f && git checkout claude/stupefied-roentgen-0c1d8f pre-commit run --files \ src/opentau/policies/pi07_paligemma/high_level_planner/modeling_pi07_high_level.py \ src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py \ tests/policies/test_pi07_paligemma_low_level_planner.py \ tests/policies/test_pi07_paligemma_attention_layout.py pytest -sx tests/policies/test_pi07_paligemma_attention_layout.pyGPU smoke (when on a CUDA box):
Checklist