Fixing reward normalizer by ashah645 · Pull Request #3 · TensorAuto/OpenTau

ashah645 · 2025-12-29T19:54:48Z

What this does

Explain what this PR does. Feel free to tag your PR with the appropriate label(s).

Examples:

Title	Label
Fixes #[issue]	(🐛 Bug)
Adds new dataset	(🗃️ Dataset)
Optimizes something	(⚡️ Performance)

How it was tested

Explain/show how you tested your changes.

Examples:

Added test_something in tests/test_stuff.py.
Added new_feature and checked that training converges with policy X on dataset/environment Y.
Optimized some_function, it now runs X times faster than previously.

How to checkout & try? (for the reviewer)

Provide a simple way for the reviewer to try out your changes.

Examples:

pytest -sx tests/test_stuff.py::test_something

python lerobot/scripts/train.py --some.option=true

SECTION TO REMOVE BEFORE SUBMITTING YOUR PR

Note: Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR. Try to avoid tagging more than 3 people.

Note: Before submitting this PR, please read the contributor guideline.

Fixes three distinct correctness issues in the Gemma 3 4B backbone + Gemma-v1 action expert wiring that a careful second pass uncovered: 1. Vision projector crashes at 448×448. `Gemma3MultiModalProjector` hard-codes `patches_per_image = image_size // patch_size`, so the default config's `vision_config.image_size = 896` made the projector reshape `(B, 1024, 64, 64)` when SigLIP actually emits `B × 1024` patches at 448 input — a runtime crash on the first forward pass. Set `image_size = 448` to match π0.6's stated resolution (→ 32 patches/side, 256 mm tokens/image, which also matches the model card). 2. RoPE θ asymmetry between backbone and expert for global layers. Gemma 3 interleaves local layers (θ = 10 000) with global layers (θ = 1 000 000). The expert's own `rope_theta = 10 000` was being applied to its Q/K even on global layers, so the shared cross-attention ran backbone-Q (rotated at 1M) against expert-K (rotated at 10k) — the `q·R(Δpos)·k` invariant breaks when the two rotations live in different RoPE bases. Fix: both streams now use the backbone's per-layer θ; the expert's fallback θ is ignored at runtime and documented as such. 3. Sliding-window mask used dense indices instead of absolute positions. During expert cross-attention the Q tokens sit at `prefix_offsets + chunk_idx`, but `_build_sliding_window_mask` built its mask from `torch.arange(seq_len)` and the call site sliced `[:, :T_suffix, :]`. Result: every prefix key farther than `window` from the dense suffix row index was dropped — i.e. the sliding layers saw essentially no prefix during cross-attention. Fix: take `(query_positions, key_positions)` and compute `|pos_q - pos_k| < window` over absolute positions; stash the prefix positions in the KV cache so the expert can reconstruct them on each denoising step. Tests: * `TestSlidingWindowMask.test_cross_attention_uses_absolute_positions` — regression guard for #3. * `TestGemma3WithExpertConfig.test_vision_image_size_matches_input_resolution` and `.test_projector_accepts_448_inputs` — regression guards for #1 (config invariant + end-to-end projector forward). * `TestRopeThetaSymmetryDuringForward.test_expert_uses_backbone_per_layer_theta` — regression guard for #2, uses monkeypatched `apply_rope` to observe exactly which θ each layer asks for. Local pytest: 29 passed / 1 deselected (up from 25). Full policies/ CPU suite: 75 passed / 2 skipped / 6 deselected. Pre-commit: all hooks green (ruff, ruff-format, pyupgrade, typos, bandit, gitleaks). https://claude.ai/code/session_01MibvjbcZo38nxrx6n9giLi

…ring Addresses inline review on PR #337: - #1 (test tautology): extract the strip predicate to a module-level `_is_normalize_buffer_key` helper in `modeling_pi07_low_level.py`. The production filter now calls it; the two new tests (`test_predicate_matches_all_eight_saved_buffer_keys`, `test_predicate_anchored_at_key_start`) call the same helper directly, so any future edit to the predicate trips them instead of passing vacuously. - #4 (config persistence): after the strip fires, reset `model.config.skip_normalization_weights = False` so that subsequent `save_pretrained` / resume / inference loads do not re-strip the now-correct finetuned buffers. - #2 (inf-buffer trap) + #3 (other policies share the trap): expanded the field docstring to call out the `dataset_stats` precondition, the one-shot semantic, and the current pi07_paligemma_low_level scope.

Addresses the new findings on PR #340: - (#1, high priority) strict=True + skip_normalization_weights=True used to raise RuntimeError("Missing key(s) ...") in pi0/value because the strip removed Normalize buffer keys before load_state_dict, but the buffers were still registered on the model from __init__. The other eight policies hardcode strict=False on the load_state_dict call so they didn't hit this. Fix: pass effective_strict = strict and not stripped_keys so the strict semantics apply unchanged on the default-load path and only relax when the strip actually fired. - (#2) pi0/value's _load_as_safetensor now resolves is_main_process via get_proc_accelerator() and threads it to the strip helper, so the INFO/WARNING fires once per load (not once per rank under DDP/FSDP). - (#3, optional) Added a note in _assert_normalize_buffers_initialized explaining why "any normalize buffer is inf" is a safe proxy for "the stripped buffer is still inf" (create_stats_buffers rejects partial stats, so buffers are all-finite or all-inf — no reachable mixed state). Pre-commit clean on --all-files (CI parity); CPU suite 460 passed.

Addresses the third-pass review on PR #340: - (#1) Defensive _materialize() before torch.isinf in the inline check inside _assert_normalize_buffers_initialized. Today from_pretrained runs before accelerator.prepare so params are plain Tensors, but exposing the helper without _materialize would break on a future caller invoking it on a fully_shard'd model. _materialize is a no-op on plain Tensors. Added a comment noting both the FSDP2 motivation and the named_parameters-vs-named_buffers convention (Normalize uses nn.Parameter(requires_grad=False), not register_buffer). - (#2) Clarified the one-shot reset docstring on both PreTrainedConfig.skip_normalization_weights and the helper itself: the reset is in-memory only. Persistence requires save_pretrained; re-running from_pretrained on the original source path reloads True and re-strips. The common in-process resume path (train.py → save_checkpoint → cfg.save_pretrained) is unaffected. - (#3) The named_parameters clarification is now part of the inline comment block in _assert_normalize_buffers_initialized. - (#4 logging vs print inconsistency) Pre-existing in pi05; not in scope for this PR. Pre-commit clean on --all-files; 30 targeted tests pass.

…ck warn) Two low-priority items from the re-review of 85c535b: - Replace `[{}] * n` with `[{} for _ in range(n)]` for `per_ds_info`. The multiplied-literal form creates N references to the same dict, so a future contributor doing `per_ds_info[idx]["override_X"] = ...` (a legitimate- looking accumulator pattern) would silently fan out across every slot. The `[None] * n` and `["<no-repo-id>"] * n` callsites stay as-is because their element types are immutable; comment notes the distinction so the pattern doesn't regress. - Add a tighter fit-time warning for the specific divergence the deferred fallback-name finding (#3 in the original review) flagged. When a fallback-keyed `repo_id` appears more than once in the mixture, fit-time POOLS those rows under one shared key while training keeps them as separate singleton heads (via `_make_dataset_names`'s `#N` dedup). The new Counter-based warning surfaces the exact duplicate `repo_id -> count` map so an operator can fix it before the misalignment ships. Mechanical; no new infra. Closes the divergence-detection gap while the proper dedup follow-up (tracked separately) is pending.

Fixing reward normalizer

738d75d

ashah645 requested a review from WilliamYue37 December 29, 2025 19:54

ashah645 self-assigned this Dec 29, 2025

WilliamYue37 added the bug Something isn't working label Dec 29, 2025

WilliamYue37 requested changes Dec 29, 2025

View reviewed changes

shuheng-liu requested changes Dec 29, 2025

View reviewed changes

Comment thread lerobot/common/policies/value/configuration_value.py Outdated

Comment thread lerobot/common/policies/value/modeling_value.py Outdated

Fixing reward normalizer

a1c116c

WilliamYue37 approved these changes Dec 29, 2025

View reviewed changes

WilliamYue37 merged commit 5654a7e into main Dec 29, 2025

shuheng-liu deleted the fix/value_func branch December 29, 2025 21:27

WilliamYue37 unassigned ashah645 Dec 30, 2025

shuheng-liu mentioned this pull request Apr 22, 2026

feat: attach metadata and expose optional standard data format keys #168

Merged

3 tasks

This was referenced Apr 29, 2026

profile_step.py missing the gradient_checkpointing backend guard that train.py has #200

Open

fix(pi05): DDP fp32 Adam state + SDPA / grad-ckpt opt-ins (#181, #177) #182

Merged

claude Bot mentioned this pull request May 2, 2026

fix(pi07): gate optional prefix tokens in low- and high-level planners #229

Merged

3 tasks

shuheng-liu mentioned this pull request May 4, 2026

feat(grounding): add <locNNNN> codec + ensure_loc_tokens for pi05/pi06 #237

Merged

3 tasks

This was referenced May 4, 2026

chore(claude): learn from #237 #241

Closed

fix(train): re-seed new ranks deterministically when resuming on more GPUs #180

Merged

test(pi06): verify pi06 ckpt loads + matches gemma-3-4b-pt #245

Merged

claude Bot mentioned this pull request May 5, 2026

refactor(pi07_paligemma): layout-agnostic ContextItem prefix/suffix #262

Merged

3 tasks

This was referenced May 12, 2026

feat(policies): support training and selecting FAST tokenizer #290

Merged

fix(datasets): memory-map parquet instead of rewriting to Arrow #301

Merged

This was referenced May 23, 2026

fix(envs): zero-fill missing camera slots at eval; libero num_cams #328

Merged

Verify per-dataset normalization PR end-to-end on GPU #335

Closed

feat(policies): per-dataset normalization + skip-stats safetensors flag #336

Merged

This was referenced May 27, 2026

feat(pi07_paligemma): skip saved normalization buffers on load #337

Merged

feat(policies): elevate skip_normalization_weights to all policies #340

Merged

shuheng-liu mentioned this pull request May 28, 2026

feat(scripts): per-(robot_type, control_mode) normalization in fit_fast_tokenizer #348

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing reward normalizer#3

Fixing reward normalizer#3
WilliamYue37 merged 2 commits into
mainfrom
fix/value_func

ashah645 commented Dec 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ashah645 commented Dec 29, 2025

What this does

How it was tested

How to checkout & try? (for the reviewer)

SECTION TO REMOVE BEFORE SUBMITTING YOUR PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants