Skip to content

refactor(policies): remove init_strategy#188

Merged
shuheng-liu merged 7 commits into
mainfrom
claude/review-weight-init-GQYCz
Apr 30, 2026
Merged

refactor(policies): remove init_strategy#188
shuheng-liu merged 7 commits into
mainfrom
claude/review-weight-init-GQYCz

Conversation

@shuheng-liu
Copy link
Copy Markdown
Member

@shuheng-liu shuheng-liu commented Apr 26, 2026

What this does

Removes the init_strategy config knob entirely from PI0Config, PI05Config, PI05MemConfig, and PI05ContinuousStateConfig, along with the _init_model / _init_weights methods that consumed it.

Why

The knob had three modes — "no_init", "full_he_init", "expert_only_he_init" — but in practice none of them did meaningful work in any shipped config:

  • Every shipped config sets pretrained_path. The post_init logic silently overrode init_strategy to "no_init" whenever pretrained_path was set, so "expert_only_he_init" (which appeared in 5 configs including the LIBERO repro) was always dead config.
  • "expert_only_he_init" was the only mode that did something the rest of the system couldn't already do — load google/paligemma-3b-pt-224 inside the model constructor for the random-expert + pretrained-backbone use case. We've decided to drop that use case; users who want a pretrained backbone go through pretrained_path exclusively. If someone wants random-expert + pretrained-backbone they can build that checkpoint and upload it themselves.
  • That leaves only "no_init" (skip init) and "full_he_init" (He-init everything). When from_pretrained is used, He-init is wasted work because the state dict overwrites it. When fresh-from-scratch is used, no shipped config exercises that path. So the knob is dead either way.

Changes

  • Drop init_strategy field, docstring entry, and __post_init__ resolution from all four policy configs. Drop now-unused logging / Literal imports.
  • Drop _init_model() call and _init_model / _init_weights method bodies from the four modeling_*.py constructors.
  • Drop load_pretrained_paligemma from PaliGemmaWithExpertConfig / PaliGemmaWithExpertModel in pi0 and pi05; PaliGemma is always built from config.
  • In policies.py: rename _TRANSIENT_POLICY_FIELDS to _REMOVED_POLICY_FIELDS and call strip_deprecated_fields_from_json(Path(config_file)) before draccus.parse in both PreTrainedConfig.from_pretrained and TrainPipelineConfig.from_pretrained. This is needed because draccus errors on unknown fields, and old saved configs (e.g. TensorAuto/pi05_base/config.json) still contain "init_strategy".
  • Strip init_strategy entries from all 9 shipped JSON configs.

Backward-compat note

Old saved configs (HF Hub or local) that still contain "init_strategy" continue to load — the load-time strip drops the key in place before draccus sees it. The next save writes a clean config.

How it was tested

  • Each of the four config dataclasses re-loaded with pretrained_path=None, pretrained_path="ckpt", and confirmed no init_strategy attribute exists post-construction.
  • Integration test with real draccus: a synthetic JSON containing "init_strategy": "expert_only_he_init" raises DecodingError on direct draccus.parse, but parses cleanly after strip_deprecated_fields_from_json runs first.
  • _strip_keys / strip_deprecated_fields_from_json exercised at top-level and nested-under-policy.
  • All 9 modified JSON configs round-trip through json.load cleanly.
  • All modified Python files compile (python -m py_compile).
  • GPU tests not run in this sandbox — flagging for reviewer.

How to checkout & try? (for the reviewer)

git checkout claude/review-weight-init-GQYCz
# Confirm an old-style config (with init_strategy) still loads:
python -c "
import json, tempfile
from pathlib import Path
from opentau.configs.policies import strip_deprecated_fields_from_json
with tempfile.NamedTemporaryFile('w', suffix='.json', delete=False) as f:
    json.dump({'chunk_size': 50, 'init_strategy': 'expert_only_he_init'}, f); p=f.name
strip_deprecated_fields_from_json(Path(p))
print(open(p).read())  # init_strategy stripped
"

Checklist

  • I have added Google-style docstrings to important functions and ensured function parameters are typed.
  • My PR includes policy-related changes.
    • If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Note: this PR does touch policy code, but the GPU pytests / regression tests were not run in this sandbox. Leaving the policy box unchecked to reflect that honestly — please flag if a GPU run is required before merge.

https://claude.ai/code/session_01KaAzgu1xy7io6Etf6MXYnJ

claude added 2 commits April 26, 2026 04:00
The "expert_only_he_init" path loaded google/paligemma-3b-pt-224 inside the
model constructor (random expert + pretrained backbone), but every shipped
config that named it also set pretrained_path, which silently overrode
init_strategy to "no_init" before construction. The path was dead config
in practice. Users who want a pretrained backbone now go through
pretrained_path exclusively.

- Drop expert_only_he_init from PI0/PI05/PI05Mem/PI05ContinuousState configs
  and the corresponding _init_model branches.
- Drop load_pretrained_paligemma from PaliGemmaWithExpertConfig/Model in pi0
  and pi05; PaliGemma is always built from config now.
- Make init_strategy default to None and resolve in __post_init__:
  pretrained_path set -> "no_init", else "full_he_init". Warn (not info)
  when a user-set non-no_init value is overridden.
- Strip init_strategy from saved config JSON and from the wandb upload dict
  alongside the existing deprecated-latency stripping; it's a
  construction-time-only field with no meaning in a saved record.
- Remove now-dead "init_strategy" entries from all 9 shipped configs.

https://claude.ai/code/session_01KaAzgu1xy7io6Etf6MXYnJ
Continue the cleanup begun in the previous commit by removing the
init_strategy field, the _init_model/_init_weights methods, and the
post_init resolution logic across all four policy variants. Without the
expert_only_he_init path, init_strategy was just a knob to choose
between He-init everything (full_he_init) or skip init (no_init), and
in every shipped config the post_init override was driving it to
no_init anyway because pretrained_path was set. With from_pretrained
loading state dict on top, custom He-init was either redundant
(overwritten) or unused (no fresh-from-scratch configs ship).

- Drop init_strategy field, docstring entries, and post_init resolution
  in PI0Config / PI05Config / PI05MemConfig / PI05ContinuousStateConfig.
  Drop now-unused logging and Literal imports where applicable.
- Remove _init_model() call and _init_model / _init_weights method
  bodies from the four modeling_*.py constructors.
- In policies.py: rename _TRANSIENT_POLICY_FIELDS to
  _REMOVED_POLICY_FIELDS and call strip_deprecated_fields_from_json
  on the config_file before draccus.parse in PreTrainedConfig and
  TrainPipelineConfig from_pretrained, so old saved configs that still
  contain "init_strategy" load without erroring on draccus's strict
  unknown-field check. Drop strip_transient_fields_in_place and revert
  TrainPipelineConfig.to_dict() to its pre-PR shape — the field no
  longer exists in the dataclass, so draccus.encode never emits it.

https://claude.ai/code/session_01KaAzgu1xy7io6Etf6MXYnJ
@shuheng-liu shuheng-liu changed the title refactor(policies): drop expert_only_he_init init strategy refactor(policies): remove init_strategy Apr 26, 2026
@shuheng-liu shuheng-liu marked this pull request as ready for review April 26, 2026 04:18
@shuheng-liu shuheng-liu self-assigned this Apr 26, 2026
Reconcile with #161 which deprecates the pi05_continuous_state policy
and merges its functionality into PI05Config via state_type.

- src/opentau/policies/pi05/configuration_pi05.py: kept main's state_type
  field, PI05ContinuousStateConfig deprecated wrapper, and warnings/Literal
  imports; dropped main's init_strategy validation block (the field is gone
  on this branch). Removed the now-unused logging import.
- src/opentau/policies/pi05_continuous_state/configuration_pi05.py and
  modeling_pi05.py: accepted main's deletion (modify/delete conflict).
- pi05/modeling_pi05.py and the rest auto-merged cleanly: this branch's
  removal of _init_model / _init_weights survived intact.
@shuheng-liu
Copy link
Copy Markdown
Member Author

@claude fix the merge conflicts

- addresses @shuheng-liu (fix merge conflicts): re-merge latest main into
  the branch and resolve two docstring conflicts. In pi05/paligemma_with_expert.py
  keep main's improved attention_implementation docstring (eager/sdpa/fa2)
  but drop the load_pretrained_paligemma line — that param was removed
  by this PR. In pi05_mem/configuration_pi05.py keep main's improved
  attention_implementation + freeze_vision_encoder docstrings but drop
  init_strategy from the args list — that field was removed by this PR.
- addresses @shuheng-liu (fix merge conflicts, runtime fallout): main
  added tests/policies/test_pi05_mem_gpu.py which constructed PI05MemConfig
  with init_strategy="no_init"; that's now an unknown kwarg on a dataclass
  config and crashes at construction. Drop the kwarg and update the
  comment to reference pretrained_path=None as the equivalent skip-loads
  signal.
tests: passed — pytest -m "not gpu" -n auto --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py tests/policies/ tests/configs/ tests/scripts/test_train.py (159 passed, 2 skipped)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

[claude-fix] Merged latest origin/main (8583dc4) into the PR branch in commit b0f5445. Two literal docstring conflicts and one runtime fallout from the merge:

  • src/opentau/policies/pi05/paligemma_with_expert.py — kept main's improved attention_implementation docstring (now lists "eager", "sdpa", "fa2", matching the validator). Dropped the load_pretrained_paligemma line main added — that parameter was removed by this PR.
  • src/opentau/policies/pi05_mem/configuration_pi05.py — kept main's improved attention_implementation and freeze_vision_encoder docstrings (the latter now correctly describes the SigLIP vision tower, not V-JEPA2). Dropped the init_strategy line main added — that field was removed by this PR.
  • tests/policies/test_pi05_mem_gpu.py (new from main) — its PI05MemConfig(init_strategy="no_init", …) would now raise TypeError because the dataclass no longer accepts init_strategy. Removed the kwarg and updated the adjacent comment to reference pretrained_path=None as the equivalent skip-loads signal.

CPU tests: passed locally — pytest -m "not gpu" -n auto --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py tests/policies/ tests/configs/ tests/scripts/test_train.py → 159 passed, 2 skipped. (The pi07 low-level test is the same one CI's cpu_test.yml skips per #209 — pre-existing breakage on main, unrelated to this PR.)

Two follow-up items I deliberately did NOT touch — they may warrant separate PRs:

  1. pi06 (added in feat(policies): add authentic pi06 policy with Gemma 3 4B backbone #178) still has its own init_strategy field/logic with all three modes including expert_only_he_init. Cleaning that up to match this PR's premise is out of scope for "fix the merge conflicts", but worth noting since it contradicts the PR title "remove init_strategy". Worse — _REMOVED_POLICY_FIELDS = ("init_strategy",) in policies.py will now silently strip init_strategy from saved pi06 configs at load time, dropping the user's choice on the floor. This is a real regression for pi06 users, and probably wants to be fixed before merge by either (a) dropping init_strategy from pi06 entirely, or (b) making the strip policy-type-aware.
  2. pi07_paligemma high/low planners still pass load_pretrained_paligemma=False to PaliGemmaWithExpertConfig. After this PR removes that parameter, the kwarg gets silently absorbed by **kwargs (no crash) but it's dead code. Minor.

Happy to address either or both of those if you'd like — let me know.

@shuheng-liu
Copy link
Copy Markdown
Member Author

@claude fix the first one in pi06. Leave pi07_paligemma alone. Also make sure the 2 breaking CPU tests skipped by main is also skipped here.

- addresses @shuheng-liu (pi06 init_strategy regression): drop init_strategy
  field, post_init resolution, _init_model/_init_weights, and the
  load_pretrained_gemma3 plumbing in pi06's Gemma3WithExpertConfig/Model.
  Also strip "init_strategy" from configs/examples/pi06_training_config.json
  so _REMOVED_POLICY_FIELDS no longer silently drops a user choice that
  used to be load-bearing.
- addresses @shuheng-liu (CPU test skips): verified the merge-in cpu_test.yml
  matches main's skip list verbatim — same --ignore for test_planner.py,
  test_libero_utils.py, test_pi07_paligemma_low_level_planner.py, and the two
  --deselect entries for test_make_env_{async,sync}_vector_env. Re-ran tests
  locally with that exact skip list (vs. the previous run which only ignored
  the pi07 file) and confirmed the suite passes.

tests: passed -- pytest -m "not gpu" -n auto --ignore=tests/planner/test_planner.py --ignore=tests/utils/test_libero_utils.py --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py --deselect=tests/envs/test_factory.py::TestMakeEnv::test_make_env_async_vector_env --deselect=tests/envs/test_factory.py::TestMakeEnv::test_make_env_sync_vector_env tests/policies/ tests/configs/ tests/scripts/test_train.py (159 passed, 2 skipped)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

[claude-fix] Done — pushed in a595c8e.

1) pi06 — drop init_strategy entirely (matching the rest of this PR). Without this, _REMOVED_POLICY_FIELDS = ("init_strategy",) would silently strip the user's chosen mode from saved pi06 configs at load time, dropping their selection on the floor. Changes:

  • src/opentau/policies/pi06/configuration_pi06.py — remove the init_strategy field, its docstring entry, the __post_init__ validation/resolution, and now-unused logging / Literal imports.
  • src/opentau/policies/pi06/modeling_pi06.py — remove _init_model() / _init_weights() and the load_pretrained_gemma3 flag derivation in __init__.
  • src/opentau/policies/pi06/gemma3_with_expert.py — drop the load_pretrained_gemma3 parameter from Gemma3WithExpertConfig and the corresponding from_pretrained("google/gemma-3-4b-pt") branch in Gemma3WithExpertModel. Gemma 3 is now always built from config (mirrors the pi05 PaliGemma cleanup).
  • configs/examples/pi06_training_config.json — strip the "init_strategy": "expert_only_he_init" line so the config round-trips cleanly without going through _REMOVED_POLICY_FIELDS.

Per your instruction, pi07_paligemma is left alone — its load_pretrained_paligemma=False calls are dead but absorbed by **kwargs, harmless.

2) CPU test skips. Confirmed .github/workflows/cpu_test.yml on this branch already matches main verbatim: same three --ignore= (test_planner.py, test_libero_utils.py, test_pi07_paligemma_low_level_planner.py) and the same two --deselect= (test_make_env_async_vector_env, test_make_env_sync_vector_env). My previous local run only applied the pi07 ignore — I've now re-run with the full skip list to match CI.

Tests: passed locally — pytest -m "not gpu" -n auto --ignore=tests/planner/test_planner.py --ignore=tests/utils/test_libero_utils.py --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py --deselect=tests/envs/test_factory.py::TestMakeEnv::test_make_env_async_vector_env --deselect=tests/envs/test_factory.py::TestMakeEnv::test_make_env_sync_vector_env tests/policies/ tests/configs/ tests/scripts/test_train.py → 159 passed, 2 skipped. tests/policies/test_pi06.py alone → 26 passed, 1 deselected. pre-commit run over the changed files passed.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Targeted review of #188 — see inline comments and the summary on the PR.

Comment thread src/opentau/configs/policies.py Outdated
Comment thread src/opentau/configs/policies.py
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

[claude-review] summary for commit c6befca

Scope: removal of init_strategy knob across pi0/pi05/pi05_mem/pi06 + load-time strip of removed/deprecated keys. Diff is mostly subtractive and the cleanup is consistent across the four policies.

Findings:

  • suggestionsrc/opentau/configs/policies.py:376 (and src/opentau/configs/train.py:405) — strip_deprecated_fields_from_json rewrites the source JSON in place. When config_file came from hf_hub_download, that path is a symlink to a content-addressed blob in ~/.cache/huggingface/hub/.../blobs/<sha> — the open(path, "w") follows the symlink and mutates the blob, leaving cache files whose contents no longer match their SHA. For user-supplied local paths, the side-effect rewrite (incl. reformatting) is also surprising. Strip in memory and write to a tempfile.NamedTemporaryFile before handing to draccus.parse. The on-write strip in _save_pretrained is fine.
  • suggestionsrc/opentau/configs/policies.py:58_REMOVED_POLICY_FIELDS is dropped silently, while _DEPRECATED_LATENCY_FIELDS emits a DeprecationWarning via warn_deprecated_latency_fields. A user loading an older config with init_strategy="expert_only_he_init" gets zero signal that their choice was discarded. Extend the warn helper (or add a sibling) so removed fields surface a deprecation warning the same way.
  • nitsrc/opentau/policies/pi05/configuration_pi05.py:62-63 — stale init_strategy: entry in the class Args: docstring. The field, the Literal import, and the __post_init__ resolution were removed but the docstring entry was missed. (The other three configs were scrubbed cleanly.)
  • nittests/policies/test_pi05_mem_gpu.py:234print("[e2e] device=cuda dtype=bfloat16 init=no_init n_obs_steps=4") still references init=no_init even though the kwarg is gone. Misleading at debug time.

Non-blocking notes:

  • src/opentau/policies/pi07_paligemma/{high,low}_level_planner/modeling_pi07_*.py still pass load_pretrained_paligemma=False to the now-renamed config — absorbed by **kwargs, harmless. The PR author chose to defer per the conversation in the comments.
  • No new tests covering _strip_keys / _REMOVED_POLICY_FIELDS round-trip in tests/configs/. The PR description describes manual verification but nothing pinned in CI. Worth a small unit test so the strip-on-load behavior doesn't silently regress.
  • GPU pytests not run in the sandbox per the PR description; reasonable, but the test_pi05_mem_gpu.py change does need a GPU runner to actually exercise it.

@shuheng-liu
Copy link
Copy Markdown
Member Author

@claude fix the suggestions and nits if they make sense

- addresses claude bot (in-place HF cache mutation): replace `strip_deprecated_fields_from_json(Path(config_file))` at the load sites in `from_pretrained` (`policies.py`, `train.py`) with new `load_stripped_config_to_tempfile()` that strips in memory and writes to a `tempfile.mkstemp` file. The HF cache symlink → blob is no longer mutated; user-supplied paths are no longer reformatted as a side effect of load. The on-write strip in `_save_pretrained` is retained (caller owns the directory).
- addresses claude bot (silent strip of removed fields): add `warn_removed_policy_fields()` sibling to `warn_deprecated_latency_fields()` and call it before the strip at both load sites, so users loading old configs containing `init_strategy=...` see a `DeprecationWarning` rather than having their choice silently dropped. Factored the `top-level + policy.<key>` lookup into `_find_present_keys()` to share between the two warn helpers.
- addresses claude bot (stale docstring): drop the `init_strategy:` entry from `PI05Config`'s class docstring (`pi05/configuration_pi05.py:62-63`).
- addresses claude bot (misleading print): change `init=no_init` → `pretrained_path=None` in the e2e debug print in `tests/policies/test_pi05_mem_gpu.py`.
- addresses claude bot (no test for strip-on-load): add unit tests in `tests/configs/test_utils_policies.py` covering `_strip_keys` (top-level + nested-under-policy + no-op), `load_stripped_config_to_tempfile` (does not mutate source, does not follow symlinks → cache-safe), `strip_deprecated_fields_from_json` (still in-place for owned files), and `warn_removed_policy_fields` (warns when present, silent when clean).

tests: passed — pytest -m "not gpu" -n auto --ignore=tests/planner/test_planner.py --ignore=tests/utils/test_libero_utils.py --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py --deselect=tests/envs/test_factory.py::TestMakeEnv::test_make_env_async_vector_env --deselect=tests/envs/test_factory.py::TestMakeEnv::test_make_env_sync_vector_env tests/ → 720 passed, 16 skipped (3 unrelated test_hub.py errors require HUGGINGFACE_HUB_TOKEN — provided by CI, not the sandbox).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

[claude-fix] Done — pushed in 4505fac. Walked through the two suggestions and two nits from the review summary in comment 4348009825; replied inline on the two suggestions and addressed the two nits + the test gap here:

  • suggestion (in-place rewrite of HF cache symlinks) — addressed in discussion r3164662608. New load_stripped_config_to_tempfile() strips in memory and writes to a tempfile.mkstemp temp file at both load sites; the source path is never mutated.
  • suggestion (silent strip of removed fields) — addressed in discussion r3164663068. New warn_removed_policy_fields() emits a DeprecationWarning before the strip at both load sites.
  • nit (pi05/configuration_pi05.py:62-63 stale init_strategy: docstring entry) — dropped the two-line entry. The other three configs were already clean.
  • nit (tests/policies/test_pi05_mem_gpu.py:234 misleading init=no_init in the e2e debug print) — changed to pretrained_path=None, matching the actual signal that picks the skip-loads path.
  • non-blocking note (no CI test for _strip_keys / _REMOVED_POLICY_FIELDS round-trip) — added 7 unit tests in tests/configs/test_utils_policies.py:
    • test_strip_keys_top_level_and_nested / test_strip_keys_no_change — base helper
    • test_load_stripped_config_to_tempfile_does_not_mutate_source — source byte-for-byte unchanged
    • test_load_stripped_config_does_not_follow_symlink — explicitly pins HF-cache safety (creates a symlink to a "blob", strips through the symlink, asserts the linked-to file is unchanged)
    • test_strip_deprecated_fields_from_json_in_place_for_owned_files — the on-write path still works
    • test_warn_removed_policy_fields_emits_deprecation_warning / test_warn_removed_policy_fields_silent_when_clean — new warn helper
  • non-blocking note (pi07_paligemma still passes load_pretrained_paligemma=False) — left alone per your earlier instruction in comment 4347923227.

Tests: pytest -m "not gpu" -n auto --ignore=tests/planner/test_planner.py --ignore=tests/utils/test_libero_utils.py --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py --deselect=tests/envs/test_factory.py::TestMakeEnv::test_make_env_async_vector_env --deselect=tests/envs/test_factory.py::TestMakeEnv::test_make_env_sync_vector_env tests/720 passed, 16 skipped in 127s. (3 errors in tests/utils/test_hub.py::test_save_pretrained require HUGGINGFACE_HUB_TOKEN which CI provides but the sandbox does not — unrelated to this PR.) pre-commit run over the changed files passed.

@shuheng-liu shuheng-liu merged commit 3786dde into main Apr 30, 2026
7 checks passed
@shuheng-liu shuheng-liu deleted the claude/review-weight-init-GQYCz branch April 30, 2026 00:21
claude Bot pushed a commit that referenced this pull request Apr 30, 2026
Merge origin/main into claude/sdpa-ckpt-pi0-pi06.

- addresses @shuheng-liu (merge conflicts): merged main (PR #188 removed
  init_strategy and load_pretrained_paligemma / load_pretrained_gemma3).
  Three docstring conflicts resolved in
  src/opentau/policies/pi0/paligemma_with_expert.py,
  src/opentau/policies/pi06/configuration_pi06.py, and
  src/opentau/policies/pi06/gemma3_with_expert.py — kept the SDPA additions
  from this PR while dropping the docstring lines for fields that main
  removed (init_strategy, load_pretrained_paligemma, load_pretrained_gemma3).
  The PR's __init__ signatures were already free of those parameters.

tests: passed — pytest tests/policies tests/configs -m "not gpu" --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py -n auto

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants