Skip to content

feat(wan22): native DiT checkpoint path — drops the diffusers DiT remap#224

Open
wenqingw-nv wants to merge 3 commits into
NVIDIA:mainfrom
wenqingw-nv:wenqing/hy-worldplay-dit-native-ckpt
Open

feat(wan22): native DiT checkpoint path — drops the diffusers DiT remap#224
wenqingw-nv wants to merge 3 commits into
NVIDIA:mainfrom
wenqingw-nv:wenqing/hy-worldplay-dit-native-ckpt

Conversation

@wenqingw-nv
Copy link
Copy Markdown
Collaborator

One follow-up item from #203 (HY-WorldPlay #155).

Same question as the VAE follow-up, for the DiT remap (~25 rules).

Finding: the remap can be dropped entirely. Upstream's native Wan-AI/Wan2.2-TI2V-5B DiT checkpoint (sharded safetensors + index) uses byte-for-byte the WanDiTNetwork key names (blocks.N.self_attn.q, text_embedding.0, head.head ...), because our network was ported from native Wan. Verified an 825↔825 name-identical bijection, so it loads with state_dict_transform=None. load_checkpoint already handles the .safetensors.index.json shard format.

  • Add WAN22_TI2V_5B_DIT_NATIVE_PATH (+ docstring documenting the zero-remap finding); note the native alternative on the diffusers remap dict.
  • test_native_dit_checkpoint_needs_no_remap (manual; fetches the ~250 KB index, no weights) guards the key-identity claim.

Opt-in: diffusers stays the production default; a follow-up flips the default + deletes _WAN22_TI2V_5B_DIT_KEY_REMAP once a GPU decode-parity smoke confirms identical output. (Native DiT is sharded fp32 ~20 GB vs the diffusers single file — worth confirming download/merge cost in that smoke.)

Part of #203.

Follow-up from PR #155 review (tracked in #203). Ruilong asked whether
the DiT key remap could be simplified by loading a different checkpoint
(as with the VAE ``.pth``).

Finding: yes -- entirely. Upstream's *native* ``Wan-AI/Wan2.2-TI2V-5B``
DiT checkpoint (sharded safetensors + index) uses byte-for-byte the
``WanDiTNetwork`` key names (``blocks.N.self_attn.q``,
``text_embedding.0``, ``head.head`` ...), because our network was
ported from native Wan. Verified an 825<->825 name-identical bijection,
so it loads with ``state_dict_transform=None`` -- no analogue of the
~25-rule ``_WAN22_TI2V_5B_DIT_KEY_REMAP`` is needed. ``load_checkpoint``
already handles the ``.safetensors.index.json`` shard format.

- Add ``WAN22_TI2V_5B_DIT_NATIVE_PATH`` (+ docstring documenting the
  zero-remap finding) and export it.
- Note the native alternative on the diffusers remap dict / module.
- ``test_native_dit_checkpoint_needs_no_remap`` (manual; fetches the
  ~250 KB index, no weights) guards the key-identity claim.

Kept opt-in: diffusers stays the production default until a GPU
decode-parity smoke confirms the two checkpoints decode identically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

wenqingw-nv and others added 2 commits May 29, 2026 23:35
…aming

GPU-box verification of the native-DiT finding: the native
`Wan-AI/Wan2.2-TI2V-5B` DiT and the diffusers checkpoint (after the
~25-rule remap) carry bit-identical weights -- 825/825 fp32 tensors,
max |Δ| = 0.0. Adds `test_native_dit_matches_diffusers_weights` (manual)
to lock that in.

Correct the earlier framing: the remap can NOT be deleted. The HY
distilled checkpoint ships in diffusers-key format and routes through
`wan22_ti2v_5b_dit_state_dict_transform` (hy_worldplay/_checkpoint.py),
so the remap is load-bearing. The native path is a proven-equivalent,
simpler source for the *base* (un-distilled) Wan 2.2 pipeline only; kept
opt-in (diffusers default avoids the larger sharded fp32 download).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`WAN22_TI2V_5B_DIT_DIFFUSERS_PATH` pointed at a single-file
`transformer/diffusion_pytorch_model.safetensors` that does not exist —
the diffusers repo ships 5 shards + a `.safetensors.index.json`, so the
bare-filename URL returns 404 and any base-pipeline DiT load fails.
(Went unnoticed because GPU CI always loads via `--ckpt-path`/distilled,
never the base diffusers DiT.) Point the constant at the index;
`load_checkpoint` resolves shards from it. Found while running the
`--example-data` rollout to verify the pose follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wenqingw-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test be70dbe

@wenqingw-nv wenqingw-nv enabled auto-merge May 30, 2026 00:56
Copy link
Copy Markdown
Collaborator

@liruilong940607 liruilong940607 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any test to verify the output is the same from two different way of loading the checkpoint?

Comment on lines +62 to +68
The ``transformer/`` subfolder ships 5 shards + a
``.safetensors.index.json``; there is **no** single-file
``diffusion_pytorch_model.safetensors`` (that bare-filename URL 404s --
it was the prior value of this constant, which broke any base-pipeline
load). ``load_checkpoint`` resolves the index directly. Loads via
:func:`wan22_ti2v_5b_dit_state_dict_transform` (the diffusers naming
differs from ours). This is the production default."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a pass in a separate PR later on to clean up all those chain of thoughts comment / doc strings from the agent? Not only in this file but also many places in the HY/wan22 integration.

Imagine you are a user reading the code and these doc strings, Many details here are not needed to be known for the user. Only be explainable enough for user to know how to use. The history of how we get to here doesn't need to be marked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants