Skip to content

fix(hy_worldplay): load base Wan checkpoint into the HY network (no --ckpt-path path)#227

Open
wenqingw-nv wants to merge 1 commit into
NVIDIA:mainfrom
wenqingw-nv:wenqing/hy-worldplay-base-ckpt-load
Open

fix(hy_worldplay): load base Wan checkpoint into the HY network (no --ckpt-path path)#227
wenqingw-nv wants to merge 1 commit into
NVIDIA:mainfrom
wenqingw-nv:wenqing/hy-worldplay-base-ckpt-load

Conversation

@wenqingw-nv
Copy link
Copy Markdown
Collaborator

Spun off from #203 (surfaced while GPU-verifying the HY-WorldPlay follow-ups).

Bug: running the HY pipeline without --ckpt-path — the path the README documents as "loads base Wan 2.2, HY conditioners stay zero-init, strict identity / parity-safe vs base Wan" — actually raises:

RuntimeError: Error(s) in loading state_dict for HyWorldPlayWanDiTNetwork:
  Missing key(s): blocks.*.self_attn.o_prope.{weight,bias}, action_embedding.{0,2}.{weight,bias}

The base Wan checkpoint has no action_embedding / o_prope params, and Wan21Transformer.__init__ does a strict load_state_dict. Latent because CI/GPU runs always pass --ckpt-path (distilled), and the smoke tests only assert the static config — never a real base load.

Fix: override HyWorldPlayWanDiTNetwork.load_state_dict to tolerate only the HY-specific zero-init keys (action_embedding.*, *.o_prope.*) when absent — they keep their zero-init values (the intended identity). Any other missing key, or any unexpected key, still raises (not a blanket strict=False). The distilled checkpoint carries these keys, so that path is unchanged.

Verified:

  • New CPU tests: tolerates missing HY zero-init keys (and confirms they stay zero) / still rejects other-missing + unexpected.
  • End-to-end on RTX 6000 Ada: flashdreams-run hy-worldplay-wan-i2v-5b --example-data True --num-chunk 1 --pose w-3 (no --ckpt-path) → valid 13-frame 704×1280 mp4.

Part of #203.

Running the HY pipeline without `--ckpt-path` (README's "conditioners
stay zero-init, parity-safe vs base Wan" path) raised
`RuntimeError: Missing key(s)` — the base Wan checkpoint has no
`action_embedding` / `o_prope` params and `Wan21Transformer.__init__`
does a strict `load_state_dict`. (Latent because CI/GPU runs always pass
`--ckpt-path`; the smoke tests only assert the static config, never a
real base load.)

Override `HyWorldPlayWanDiTNetwork.load_state_dict` to tolerate *only*
those HY-specific zero-init keys when absent (they keep their zero-init
identity); any other missing key, or any unexpected key, still raises.
HY's distilled checkpoint carries the keys, so that path is unaffected.

Verified end-to-end: `flashdreams-run hy-worldplay-wan-i2v-5b
--example-data True --num-chunk 1 --pose w-3` (no `--ckpt-path`) now
produces a valid mp4. Adds CPU tests for the tolerate / still-reject
behaviour.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@wenqingw-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 0270385

@wenqingw-nv wenqingw-nv enabled auto-merge May 30, 2026 00:57
@liruilong940607
Copy link
Copy Markdown
Collaborator

Why would it load Wan2.2 base checkpoint? We only consider inference so it should directly load the HY ckpt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants