Add Context parallelism to Wan 2.1 by entrpn · Pull Request #200 · AI-Hypercomputer/maxdiffusion

entrpn · 2025-07-09T20:27:58Z

Adds context parallelism to flash attention fn.
Adds better sharding constraints when reshaping and padding to reduce AGs.

coolkp · 2025-07-14T15:16:38Z

  if multi_slice_env:
    dcn_parallelism = fill_unspecified_mesh_axes(dcn_parallelism, num_slices, "DCN")
-    mesh = mesh_utils.create_hybrid_device_mesh(ici_parallelism, dcn_parallelism, devices)
+    mesh = mesh_utils.create_hybrid_device_mesh(ici_parallelism, dcn_parallelism, devices, allow_split_physical_axes=config.allow_split_physical_axes)


I also discovered that cloud tpus, don't have "slice_index" attribute which is used in create_hybrid_mesh and also couple lines up to determine dcn or not, maybe we should pass process_is_granule=True to create_hybrid_device_mesh

coolkp · 2025-07-14T16:01:25Z

Awesome PR ! excited for the perf gains!

jfacevedo-google added 8 commits June 26, 2025 23:58

wip - context parallelism

69a93b9

fix padding remove extra mask.

125dcfa

single forward loop.

3f6eb05

Merge branch 'main' into wan_context_parallelism_inference

ae9c952

remove heads sharding contraint after rope for seq parallelism.

4543686

add sharding contraint to reshape after attn. Use mesh with vae decode.

ce3ee64

split activation_batch across data.

50d2fe7

set sharding contraints to reduce ags.

3ef352f

entrpn requested a review from susanbao July 9, 2025 20:29

jfacevedo-google added 13 commits July 10, 2025 04:45

better block sizes.

6e6fb76

fix sharding contraint for padded tensor.

0d1e0f1

update requirements to remove outdated dependency.

858e168

replace device_put with replicated for multi host.

b219048

read local wan checkpoints.

2a48490

adding localmask to check multihost.

7c84ec2

set q_seq_shards=1

4d1775f

add posoitional arg names to hf_hub_download

223ad70

disable shardy for generate_wan

500d1c1

retry with shardy and latest libtpu verison.

793574a

add config option to allow split physical mesh axis.

a8f80b7

update shardings in attn.

d5b6da3

allow passing logical axis rules in cli

ee38d09

coolkp reviewed Jul 14, 2025

View reviewed changes

Comment thread src/maxdiffusion/models/attention_flax.py

coolkp reviewed Jul 14, 2025

View reviewed changes

Comment thread src/maxdiffusion/models/attention_flax.py

jfacevedo-google added 3 commits July 14, 2025 19:08

update sharding config.

a585a75

update unit tests.

fcb1ab1

update transformer test.

bee57ba

coolkp approved these changes Jul 15, 2025

View reviewed changes

entrpn merged commit 4a6f807 into main Jul 15, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Context parallelism to Wan 2.1#200

Add Context parallelism to Wan 2.1#200
entrpn merged 24 commits intomainfrom
wan_context_parallelism_inference

entrpn commented Jul 9, 2025

Uh oh!

coolkp Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

coolkp commented Jul 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

entrpn commented Jul 9, 2025

Uh oh!

coolkp Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coolkp commented Jul 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants