Skip to content

refactor(pi07_paligemma): layout-agnostic ContextItem prefix/suffix#262

Merged
WilliamYue37 merged 4 commits into
mainfrom
refactor/pi07-paligemma-context-items
May 5, 2026
Merged

refactor(pi07_paligemma): layout-agnostic ContextItem prefix/suffix#262
WilliamYue37 merged 4 commits into
mainfrom
refactor/pi07-paligemma-context-items

Conversation

@WilliamYue37
Copy link
Copy Markdown
Member

@WilliamYue37 WilliamYue37 commented May 5, 2026

What this does

Refactors PI07LowLevelPlannerFlowMatching so that the prefix and suffix layouts (which video / language / state / response / subgoal / metadata / discrete-action blocks appear, in what order, with what masking and attention) live entirely inside PI07LowLevelPlannerPolicy. The model now just consumes a list[ContextItem], dispatches per item_type, concatenates, and derives the 1-D attention mask from each item's attention setting (continue / bidirectional / causal).

Adding a new conditioning block or reordering existing ones now only requires editing the policy's _build_prefix_items — no FlowMatching changes.

Specifically:

  • New ContextItem dataclass (data, item_type, pad_mask, attention, exclude_from_cross_attention, obs_history_is_pad).
  • embed_prefix(items) and embed_suffix(items, timestep) are layout-agnostic dispatchers; they return num_cross_att_tokens directly so the action expert no longer slices via discrete_action_indicator_max_length / discrete_action_max_length config fields.
  • FlowMatching.forward / sample_actions now take a prebuilt prefix items list; the policy builds it via _build_prefix_items(batch, include_discrete_actions=...).
  • The single-frame-to-num_frames zero-pad and the "skip SigLIP backbone when no sample needs this video" optimization are now generic per-type behaviors (no more bespoke subgoal handling).
  • Behavior is preserved: same prefix order, same per-block masks, same cross-attention exclusion of the trailing "Action: " indicator + discrete-action block.

How it was tested

  • All 37 CPU tests in tests/policies/test_pi07_paligemma_low_level_planner.py pass (response/metadata/subgoal/state masking + state-projection invocation tests).
  • Full tests/policies/ CPU suite passes (228 passed, 2 skipped).
  • Pre-commit hooks (ruff, format, pyupgrade, bandit, secrets, etc.) all green.

How to checkout & try? (for the reviewer)

git checkout refactor/pi07-paligemma-context-items
uv sync --extra dev --extra libero
pytest tests/policies/test_pi07_paligemma_low_level_planner.py -m "not gpu" -n auto

Checklist

  • I have added Google-style docstrings to important functions and ensured function parameters are typed.
  • My PR includes policy-related changes.
    • If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layout-agnostic prefix/suffix is a clear win — the policy is the single owner of layout, and the per-type dispatcher in _embed_item makes future additions (e.g. a new conditioning block) a one-line change.

A few small things flagged inline. None blocking; mainly docstring/maintenance suggestions plus a minor double-call of prepare_state.

Comment thread src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py Outdated
Comment thread src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py Outdated
Comment thread tests/policies/test_pi07_paligemma_low_level_planner.py
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

[claude-review] summary for commit 96cb533

Round 3 — all four addressable findings from round 2 landed in 96cb533:

  • tests/policies/test_pi07_paligemma_low_level_planner.py::TestPI07EmbedPrefixInvariants exercises both the cross_att_locked raise path and the legal trailing-run count.
  • embed_suffix att-mask dtype now torch.bool, matching embed_prefix.
  • Redundant state_mask = state_mask.clone() removed; both branches above already produce fresh tensors.
  • _legacy_embed_prefix now uses a _StrictFakePolicy whose __getattr__ names the missing attribute, so a future read of an unstubbed attr surfaces at the helper rather than as a bare AttributeError deep in _build_prefix_items.

The unrelated test-skip suggestion was explicitly dismissed by the author and isn't re-flagged.

  • suggestion — PR description — CLAUDE.md hard rule Fixing reward normalizer #3 calls for a determinism check (two seeded smoke runs, bit-identical loss series) on any policies/*/modeling_*.py change. The round-2 embed_suffix att-mask dtype switch (bf16 → bool) is the most likely place to perturb numerics — worth confirming before merge.

@WilliamYue37 WilliamYue37 self-assigned this May 5, 2026
@WilliamYue37
Copy link
Copy Markdown
Member Author

@claude fix

claude Bot pushed a commit that referenced this pull request May 5, 2026
- addresses @claude (sample_actions double-call): replace second prepare_state
  call with a direct read of batch["state"].shape (T=1 when ndim==2, else
  shape[-2]) so the warning derives t_dim without redoing the unsqueeze/pad.
- addresses @claude (T=1 video override): _embed_item now raises ValueError
  if a T=1 video item carries obs_history_is_pad, instead of silently
  overwriting it with the synthesized temporal mask. Docstring updated.
- addresses @claude (SigLIP-skip DDP invariant): _embed_item docstring now
  spells out that the data-dependent skip requires a video pad_mask
  derivable from config (not per-sample) so DDP ranks agree and
  find_unused_parameters=False does not deadlock.
- addresses @claude (test parallel layout): _legacy_embed_prefix no longer
  re-implements the prefix layout. It now builds a fake
  PI07LowLevelPlannerPolicy via object.__new__, stubs the prepare_*
  methods to return the test tensors, and dispatches through
  PI07LowLevelPlannerPolicy._build_prefix_items. Single source of truth
  for layout; future _build_prefix_items changes flow into the tests.
  Dropped now-unused ContextItem import.

tests: passed — pytest -m "not gpu" -n auto tests/policies/test_pi07_paligemma_low_level_planner.py (24 PaliGemma-gated tests skipped locally for lack of HF_TOKEN; CI has it)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WilliamYue37 and others added 3 commits May 5, 2026 14:47
The PI07LowLevelPlannerFlowMatching prefix and suffix layouts were
hard-coded inside embed_prefix / embed_suffix, so adding a conditioning
block or reordering existing ones required editing both the model and
the policy in lockstep. This commit moves all layout decisions into
PI07LowLevelPlannerPolicy._build_prefix_items: the policy constructs a
list of ContextItem dataclasses (data + item_type + pad_mask +
attention mode + exclude_from_cross_attention flag), and the model
just dispatches per type, concatenates, and derives the 1-D attention
mask. Future layout changes only touch the policy.

Behavior is unchanged: same prefix order, same per-block masking
semantics, same cross-attention exclusion of the trailing
"Action: " + discrete-action block (now derived from the
exclude_from_cross_attention flag rather than config-based slicing).
- addresses @claude (sample_actions double-call): replace second prepare_state
  call with a direct read of batch["state"].shape (T=1 when ndim==2, else
  shape[-2]) so the warning derives t_dim without redoing the unsqueeze/pad.
- addresses @claude (T=1 video override): _embed_item now raises ValueError
  if a T=1 video item carries obs_history_is_pad, instead of silently
  overwriting it with the synthesized temporal mask. Docstring updated.
- addresses @claude (SigLIP-skip DDP invariant): _embed_item docstring now
  spells out that the data-dependent skip requires a video pad_mask
  derivable from config (not per-sample) so DDP ranks agree and
  find_unused_parameters=False does not deadlock.
- addresses @claude (test parallel layout): _legacy_embed_prefix no longer
  re-implements the prefix layout. It now builds a fake
  PI07LowLevelPlannerPolicy via object.__new__, stubs the prepare_*
  methods to return the test tensors, and dispatches through
  PI07LowLevelPlannerPolicy._build_prefix_items. Single source of truth
  for layout; future _build_prefix_items changes flow into the tests.
  Dropped now-unused ContextItem import.

tests: passed — pytest -m "not gpu" -n auto tests/policies/test_pi07_paligemma_low_level_planner.py (24 PaliGemma-gated tests skipped locally for lack of HF_TOKEN; CI has it)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@WilliamYue37 WilliamYue37 force-pushed the refactor/pi07-paligemma-context-items branch from bd08132 to e633dc8 Compare May 5, 2026 21:49
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 — addressed feedback from the prior review looks landed (single source of truth via _build_prefix_items, the prepare_state double-call is gone, the DDP invariant is documented). Spotted a few new things on this pass: a missing test for the new cross_att_locked invariant, and the unrelated test-skip changes are worth a closer look — they pull coverage away from nightly regression on g6.12xlarge, not just from a 24GB RTX 3090.

Comment thread src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py Outdated
Comment thread src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py Outdated
Comment thread tests/policies/test_pi06.py
Comment thread tests/policies/test_pi07_paligemma_low_level_planner.py Outdated
@WilliamYue37
Copy link
Copy Markdown
Member Author

@claude fix

- addresses @claude (cross_att_locked test): added
  TestPI07EmbedPrefixInvariants in
  tests/policies/test_pi07_paligemma_low_level_planner.py — pins the
  raise path (excluded then non-excluded) and the legal trailing-run
  count via two CPU-only tests reusing the existing mock model.
- addresses @claude (suffix att-mask dtype): aligned embed_suffix to
  build att_masks_flat as torch.bool to match embed_prefix:1744; both
  paths now use the same dtype side by side.
- addresses @claude (redundant state_mask.clone): removed; both
  branches above already produce fresh tensors. Comment updated.
- addresses @claude (fragile _legacy_embed_prefix): the helper now
  builds the fake policy from a local PI07LowLevelPlannerPolicy
  subclass whose __getattr__ raises a clear "missing attribute X"
  error. Any future _build_prefix_items access to an unstubbed attr
  will now point at the helper instead of bubbling a generic
  AttributeError up through every test that uses it. Docstring
  documents the contract.

tests: passed — pytest tests/policies/test_pi07_paligemma_low_level_planner.py::TestPI07EmbedPrefixInvariants tests/policies/test_pi07_paligemma_low_level_planner.py::TestPI07LowLevelPlannerStateEmbedding (11/11 — covers the 2 new invariant tests + the 9 state-embedding tests that exercise the strict fake policy). The PaliGemma-gated test classes need HF_TOKEN and are not runnable locally; CI's cpu_test.yml will run them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@WilliamYue37 WilliamYue37 merged commit 428918b into main May 5, 2026
8 checks passed
@WilliamYue37 WilliamYue37 deleted the refactor/pi07-paligemma-context-items branch May 5, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants