chore(claude): learn from #229 by claude[bot] · Pull Request #231 · TensorAuto/OpenTau

claude · 2026-05-02T06:42:00Z

What this does

Adds hard rule 4 to CLAUDE.md: pin training-path layout fixes with a CPU unit test that asserts the exact pattern.

PR #229 review (claude-bot, discussion_r3176226835) flagged that the [1]+[0]*(N-1) → [1]*N att_masks change in pi07/low_level_planner/modeling_pi07_low_level.py — the most material correctness fix in the PR, since it shifts the cumsum at the indicator → first-discrete boundary and changes the discrete-action CE loss — shipped without a CPU test pinning the pattern. Without one, the divergence would only have been caught by deferred GPU/nightly regression tests. The author added test_discrete_actions_indicator_uses_per_token_causal_blocks in tests/policies/test_pi07_cpu.py (commit dda4446) after review.

The lesson generalizes beyond this PR: rule 3 (determinism) only proves two seeded runs agree, not that the layout is correct. A pinned CPU assertion catches a regression on the same PR; GPU + nightly do not.

No stale CLAUDE.md content surfaced by #229's diff — cpu_test.yml, gpu_test.yml, regression_test.yml, and the file paths in rule 3 all still exist.

How it was tested

Documentation-only change. No code paths affected.

How to checkout & try? (for the reviewer)

git fetch origin
git checkout chore/claude-learn-from-229
git diff main..HEAD -- CLAUDE.md

Checklist

I have added Google-style docstrings to important functions and ensured function parameters are typed.
My PR includes policy-related changes.
- If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

🤖 Generated with Claude Code

… encoder

@claude

- addresses @claude[bot] (low-level is_pad defaults, low_level_planner modeling_pi07_low_level.py:984-986): switched the *_is_pad fallbacks to torch.ones to match the high-level planner — missing speed/quality/mistake no longer fabricate "Speed: 0.0" entries in the prompt. - addresses @claude[bot] (prepare_metadata docstrings): updated both planners' docstrings to reference Gemma 3 (not PaliGemma) and to enumerate robot_type / control_mode (string-valued, empty-string-as-pad). - addresses @claude[bot] (high-level all-empty guard, high_level_planner/modeling_pi07_high_level.py:715): mirrored the low-level "if segments else ''" guard so an all-padded sample emits "" instead of the literal "Metadata: ". Both planners now agree. - addresses @claude[bot] (CPU coverage for prepare_metadata): added TestPrepareMetadataSegments to tests/policies/test_pi07_cpu.py covering (a) robot+control populated, (b) both absent, (c) one populated and one empty, (d) per-sample all-empty emits "" (regression for both planners), and (e) low-level missing speed/quality/mistake never produces a fabricated "Speed: 0.0" segment. tests: passed — pytest -m "not gpu" tests/policies/test_pi07_cpu.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@claude

- addresses @claude[bot] (subgoal opt-in docstring): updated _load_subgoal_frames docstring to match the always-on behaviour; replaced the stale test_missing_subgoals_key_in_info_returns_empty test (which encoded the old opt-in gate) with two pinned cases — no-cameras → {}, and "no info.subgoals key still loads subgoals". Added camera_keys / image_keys / episode_data_index attrs to the SimpleNamespace meta in the two existing video tests so they match the new attribute reads. - addresses @claude[bot] (image-dtype fallback row index): new test_image_dtype_fallback_uses_absolute_row_index in tests/datasets/ test_optional_keys.py stubs hf_dataset.__getitem__ and pins that the parquet-row lookup uses ep_start + subgoal_frame, never the within-episode index. - addresses @claude[bot] ("fully 0 padded" comment): rewrote the comment at modeling_pi07_low_level.py:946 to call out -1 ([-1, 1] SigLIP range) and the False-mask role of the placeholder. - addresses @claude[bot] (_action_indicator_len recompute): cached the Action-indicator length at PI07LowLevelPlannerFlowMatching.__init__; both forward sites now read self._action_indicator_len. - addresses @claude[bot] (embed_prefix CPU coverage): added TestEmbedPrefixConditionalGuards in tests/policies/test_pi07_cpu.py with a fake Gemma3WithExpert + tokenizer + state_proj + embed_video so the three guards (response_masks.any(), subgoal availability, metadata_masks.any()) are exercised without GPU. Cases: all-False optional masks → no spurious causal boundary; mixed-availability subgoal batch → header/footer pad mask zeroes the pad-only sample; response.any() → exactly one boundary. - addresses @claude[bot] (.base_layer. stale comment in video_encoder.py): rewrote the wrap comment at video_encoder.py:422 to explain that the wrapper adopts submodules by reference, so wrapped-layer state-dict keys are byte-for-byte identical to a vanilla SiglipEncoderLayer (no .base_layer. prefix). tests: passed — pytest tests/policies tests/datasets -m "not gpu" -n auto (452 passed, 12 skipped; the 1 collection error in tests/policies/test_pi07_paligemma_low_level_planner.py is pre-existing and unrelated — pi07_paligemma still imports VJEPA2VideoEncoder from pi05_mem, which was removed in #171). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#229)

Add hard rule 4: pin training-path layout fixes with a CPU unit test. PR #229 review surfaced that the `[1]+[0]*(N-1)` -> `[1]*N` att_masks fix — the most material correctness change in the PR, since it shifts the cumsum at the indicator -> first-discrete boundary and changes the discrete-action CE loss — initially shipped without a CPU test pinning the pattern. It would have only been caught by deferred GPU and nightly regression tests. Author added a CPU assertion in dda4446 after review. Rule 3 (determinism) doesn't cover this case: two seeded runs of the new code agree, but determinism cannot tell you the layout is correct. A pinned CPU test does. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

shuheng-liu · 2026-05-04T06:15:16Z

Doesn't worth an entry in CLAUDE.md

akshay18iitg and others added 9 commits April 28, 2026 18:30

Adding original pi07 with gemma3 backbone and space-time siglip video…

c0cc4e9

… encoder

fix(pi07): align with #178/#171 invariants & restore CPU tests (#198)

e135cb6

Merge main

90db9e2

Adding control_type and robot_type to metadata in policy code

3c5ee90

Applying pi07_paligemma fixes to pi07

2949e73

fix(pi07): gate optional prefix tokens in low- and high-level planners (

62a5677

#229)

shuheng-liu closed this May 4, 2026

shuheng-liu deleted the chore/claude-learn-from-229 branch May 5, 2026 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(claude): learn from #229#231

chore(claude): learn from #229#231
claude[bot] wants to merge 9 commits into
mainfrom
chore/claude-learn-from-229

claude Bot commented May 2, 2026

Uh oh!

shuheng-liu commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

claude Bot commented May 2, 2026

What this does

How it was tested

How to checkout & try? (for the reviewer)

Checklist

Uh oh!

shuheng-liu commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants