feat(wan22): native DiT checkpoint path — drops the diffusers DiT remap#224
Open
wenqingw-nv wants to merge 3 commits into
Open
feat(wan22): native DiT checkpoint path — drops the diffusers DiT remap#224wenqingw-nv wants to merge 3 commits into
wenqingw-nv wants to merge 3 commits into
Conversation
Follow-up from PR #155 review (tracked in #203). Ruilong asked whether the DiT key remap could be simplified by loading a different checkpoint (as with the VAE ``.pth``). Finding: yes -- entirely. Upstream's *native* ``Wan-AI/Wan2.2-TI2V-5B`` DiT checkpoint (sharded safetensors + index) uses byte-for-byte the ``WanDiTNetwork`` key names (``blocks.N.self_attn.q``, ``text_embedding.0``, ``head.head`` ...), because our network was ported from native Wan. Verified an 825<->825 name-identical bijection, so it loads with ``state_dict_transform=None`` -- no analogue of the ~25-rule ``_WAN22_TI2V_5B_DIT_KEY_REMAP`` is needed. ``load_checkpoint`` already handles the ``.safetensors.index.json`` shard format. - Add ``WAN22_TI2V_5B_DIT_NATIVE_PATH`` (+ docstring documenting the zero-remap finding) and export it. - Note the native alternative on the diffusers remap dict / module. - ``test_native_dit_checkpoint_needs_no_remap`` (manual; fetches the ~250 KB index, no weights) guards the key-identity claim. Kept opt-in: diffusers stays the production default until a GPU decode-parity smoke confirms the two checkpoints decode identically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7 tasks
…aming GPU-box verification of the native-DiT finding: the native `Wan-AI/Wan2.2-TI2V-5B` DiT and the diffusers checkpoint (after the ~25-rule remap) carry bit-identical weights -- 825/825 fp32 tensors, max |Δ| = 0.0. Adds `test_native_dit_matches_diffusers_weights` (manual) to lock that in. Correct the earlier framing: the remap can NOT be deleted. The HY distilled checkpoint ships in diffusers-key format and routes through `wan22_ti2v_5b_dit_state_dict_transform` (hy_worldplay/_checkpoint.py), so the remap is load-bearing. The native path is a proven-equivalent, simpler source for the *base* (un-distilled) Wan 2.2 pipeline only; kept opt-in (diffusers default avoids the larger sharded fp32 download). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`WAN22_TI2V_5B_DIT_DIFFUSERS_PATH` pointed at a single-file `transformer/diffusion_pytorch_model.safetensors` that does not exist — the diffusers repo ships 5 shards + a `.safetensors.index.json`, so the bare-filename URL returns 404 and any base-pipeline DiT load fails. (Went unnoticed because GPU CI always loads via `--ckpt-path`/distilled, never the base diffusers DiT.) Point the constant at the index; `load_checkpoint` resolves shards from it. Found while running the `--example-data` rollout to verify the pose follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
/ok to test be70dbe |
Collaborator
liruilong940607
left a comment
There was a problem hiding this comment.
Do we have any test to verify the output is the same from two different way of loading the checkpoint?
Comment on lines
+62
to
+68
| The ``transformer/`` subfolder ships 5 shards + a | ||
| ``.safetensors.index.json``; there is **no** single-file | ||
| ``diffusion_pytorch_model.safetensors`` (that bare-filename URL 404s -- | ||
| it was the prior value of this constant, which broke any base-pipeline | ||
| load). ``load_checkpoint`` resolves the index directly. Loads via | ||
| :func:`wan22_ti2v_5b_dit_state_dict_transform` (the diffusers naming | ||
| differs from ours). This is the production default.""" |
Collaborator
There was a problem hiding this comment.
Can you do a pass in a separate PR later on to clean up all those chain of thoughts comment / doc strings from the agent? Not only in this file but also many places in the HY/wan22 integration.
Imagine you are a user reading the code and these doc strings, Many details here are not needed to be known for the user. Only be explainable enough for user to know how to use. The history of how we get to here doesn't need to be marked.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
One follow-up item from #203 (HY-WorldPlay #155).
Same question as the VAE follow-up, for the DiT remap (~25 rules).
Finding: the remap can be dropped entirely. Upstream's native
Wan-AI/Wan2.2-TI2V-5BDiT checkpoint (sharded safetensors + index) uses byte-for-byte theWanDiTNetworkkey names (blocks.N.self_attn.q,text_embedding.0,head.head...), because our network was ported from native Wan. Verified an 825↔825 name-identical bijection, so it loads withstate_dict_transform=None.load_checkpointalready handles the.safetensors.index.jsonshard format.WAN22_TI2V_5B_DIT_NATIVE_PATH(+ docstring documenting the zero-remap finding); note the native alternative on the diffusers remap dict.test_native_dit_checkpoint_needs_no_remap(manual; fetches the ~250 KB index, no weights) guards the key-identity claim.Opt-in: diffusers stays the production default; a follow-up flips the default + deletes
_WAN22_TI2V_5B_DIT_KEY_REMAPonce a GPU decode-parity smoke confirms identical output. (Native DiT is sharded fp32 ~20 GB vs the diffusers single file — worth confirming download/merge cost in that smoke.)Part of #203.