feat(wan): Add text encoder batching and optional scan loop for diffusion by Perseus14 · Pull Request #397 · AI-Hypercomputer/maxdiffusion

Perseus14 · 2026-05-04T06:14:29Z

This PR introduces several key optimizations for the WAN pipelines (T2V 2.1/2.2 and I2V 2.1/2.2) to improve performance, CPU resource utilization, and TPU execution efficiency:

T5 Text Encoder CPU Optimizations:
- Dynamic bfloat16 loading: Dynamically maps JAX config.weights_dtype to torch_dtype (enabling bfloat16 CPU execution) to cut memory bandwidth consumption in half on the CPU host.
- JIT Compilation (torch.compile): Compiles the T5 model for CPU using PyTorch's compiler to fuse kernels and maximize weight reuse in the CPU cache, achieving ultra-fast CPU inference.
Optional Text Encoder Batching: Allows batching positive and negative prompts together when calling the heavy T5 text encoder. This reduces the number of calls from 2 to 1, saving compute on the CPU.
Optional Scan Loop for Diffusion: Implements a hybrid scan loop using jax.lax.scan for the non-cache path of the diffusion process in all four main WAN pipelines. This avoids Python loop overhead while remaining compatible with scan_layers: true at the layer level. For WAN 2.2 pipelines, it uses jax.lax.cond to switch between the dual transformers at each step.
Timing Instrumentation: Added timing measurement (trace dictionary) to all pipelines to support the TIMING SUMMARY printout in generate_wan.py, providing visibility into Conditioning, Denoise Total, and VAE Decode times.

Changes

maxdiffusion/pipelines/wan

[MODIFY] wan_pipeline.py

Dynamically map JAX config.weights_dtype to PyTorch torch_dtype using getattr.
Enabled torch.compile(text_encoder) inside load_text_encoder for CPU optimization.
Refactored encode_prompt to batch positive and negative prompts when use_batched_text_encoder is enabled in the config.

[MODIFY] wan_pipeline_2_2.py, wan_pipeline_i2v_2p2.py

Implemented a hybrid scan loop in run_inference methods using jax.lax.scan and jax.lax.cond.

[MODIFY] wan_pipeline_2_1.py, wan_pipeline_i2v_2p1.py

Implemented a similar scan loop using jax.lax.scan (without needing lax.cond as they use a single transformer).
Added trace dictionary return from __call__ to support timing summary.

maxdiffusion/configs

[MODIFY] All 5 WAN config files (`base_wan_*.yml`)

Added use_batched_text_encoder: False by default.
Added scan_diffusion_loop: False by default, with a warning that enabling it will disable per-step profiling.

Generation Time

Environment & Configuration:

Config: 720p, 81 frames
Model: WAN2.2 - T2V
Hardware: TPU v7x-8
JAX Version: 0.10.0

Command: https://paste.googleplex.com/6221970925551616

==================================================
  TIMING SUMMARY
==================================================
  Load (checkpoint):     132.1s
  Compile:               164.1s
  Inference:             132.5s
  ────────────────────────────────────────
  Conditioning:            1.6s
  Denoise Total:         127.4s
  VAE Decode:              3.6s
==================================================

github-actions · 2026-05-04T06:14:37Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

github-actions · 2026-05-05T04:24:33Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-05-05T04:25:06Z

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

github-actions · 2026-05-05T04:45:50Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This PR introduces two valuable performance optimizations for the WAN pipeline: batched text encoding and a jax.lax.scan-based diffusion loop. These changes improve compute efficiency and reduce Python loop overhead during inference. The implementation is clean and integrates well with the existing architecture.

🔍 General Feedback

Optimization Consistency: The batched text encoder logic correctly handles the partitioning of embeddings back into positive and negative sets, ensuring compatibility with the existing API.
Robustness: I've identified one potential unsafe access to the config object in the scan loop path which could lead to a crash if config is None. A simple fix has been suggested.
Performance: The use of jax.lax.scan for the non-cache path is a great addition for performance-sensitive workloads on TPU/GPU.

Perseus14 · 2026-05-05T14:09:55Z

Done! PTAL @mbohlool

eltsai · 2026-05-05T21:01:15Z

This is great @Perseus14! Do we know how much speed gain we get from (1) text encoding batching and (2) scan diffusion look respectively?

….compile

Perseus14 requested a review from entrpn as a code owner May 4, 2026 06:14

Perseus14 changed the title ~~feat(wan): Optimize WAN 2.2 VAE with JIT/Scan and batch text encoder prompts~~ feat(wan): Optimize WAN 2.2 VAE and batch text encoder prompts, and integrate WAN 2.2 VAE May 4, 2026

Perseus14 force-pushed the wan_autoencoder_opt branch 3 times, most recently from 54e608d to 4840b6f Compare May 4, 2026 09:16

Perseus14 changed the title ~~feat(wan): Optimize WAN 2.2 VAE and batch text encoder prompts, and integrate WAN 2.2 VAE~~ feat(wan): WAN batch text encoder prompts May 4, 2026

Perseus14 force-pushed the wan_autoencoder_opt branch from 4840b6f to 1968294 Compare May 4, 2026 09:19

Perseus14 changed the title ~~feat(wan): WAN batch text encoder prompts~~ feat(wan): Add optional text encoder batching for positive and negative prompts May 4, 2026

Perseus14 force-pushed the wan_autoencoder_opt branch 2 times, most recently from 466f90e to 151df42 Compare May 4, 2026 17:18

Perseus14 changed the title ~~feat(wan): Add optional text encoder batching for positive and negative prompts~~ feat(wan): Add text encoder batching and optional scan loop for diffusion May 4, 2026

Perseus14 requested review from mbohlool May 4, 2026 18:40

Perseus14 added the gemini-review label May 5, 2026

Perseus14 force-pushed the wan_autoencoder_opt branch from 151df42 to 867ae29 Compare May 5, 2026 04:44

Perseus14 added gemini-review and removed gemini-review labels May 5, 2026

github-actions Bot reviewed May 5, 2026

View reviewed changes

Comment thread src/maxdiffusion/pipelines/wan/wan_pipeline_2_2.py Outdated

Comment thread src/maxdiffusion/pipelines/wan/wan_pipeline.py

Perseus14 force-pushed the wan_autoencoder_opt branch 5 times, most recently from d237477 to 921290d Compare May 5, 2026 07:13

Perseus14 requested a review from eltsai May 5, 2026 07:35

mbohlool requested changes May 5, 2026

View reviewed changes

Perseus14 force-pushed the wan_autoencoder_opt branch from 921290d to 1e2f5c1 Compare May 5, 2026 08:15

Perseus14 force-pushed the wan_autoencoder_opt branch from 1e2f5c1 to 9f14475 Compare May 5, 2026 08:34

Perseus14 marked this pull request as draft May 5, 2026 14:28

Perseus14 force-pushed the wan_autoencoder_opt branch 2 times, most recently from 2f79061 to 4945072 Compare May 5, 2026 18:34

Perseus14 marked this pull request as ready for review May 5, 2026 18:42

Perseus14 force-pushed the wan_autoencoder_opt branch 2 times, most recently from c382924 to 62e6fdc Compare May 5, 2026 21:23

feat: add optional batched text encoder and diffusion loop with torch…

058b22a

….compile

Perseus14 force-pushed the wan_autoencoder_opt branch from 62e6fdc to 058b22a Compare May 5, 2026 21:53

mbohlool approved these changes May 5, 2026

View reviewed changes

github-actions Bot added the pull ready label May 5, 2026

copybara-service Bot merged commit 4b503a1 into main May 5, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(wan): Add text encoder batching and optional scan loop for diffusion#397

feat(wan): Add text encoder batching and optional scan loop for diffusion#397
copybara-service[bot] merged 1 commit intomainfrom
wan_autoencoder_opt

Perseus14 commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Perseus14 commented May 5, 2026 •

edited

Loading

Uh oh!

eltsai commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Perseus14 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

maxdiffusion/pipelines/wan

[MODIFY] wan_pipeline.py

[MODIFY] wan_pipeline_2_2.py, wan_pipeline_i2v_2p2.py

[MODIFY] wan_pipeline_2_1.py, wan_pipeline_i2v_2p1.py

maxdiffusion/configs

[MODIFY] All 5 WAN config files (base_wan_*.yml)

Generation Time

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Perseus14 commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eltsai commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Perseus14 commented May 4, 2026 •

edited

Loading

[MODIFY] All 5 WAN config files (`base_wan_*.yml`)

Perseus14 commented May 5, 2026 •

edited

Loading