Skip to content

Fix: LazyTurtle materialization for non-square fused experts#2707

Merged
Qubitium merged 3 commits intomainfrom
zx_fix_offload_disk_True_when_defuser
Apr 10, 2026
Merged

Fix: LazyTurtle materialization for non-square fused experts#2707
Qubitium merged 3 commits intomainfrom
zx_fix_offload_disk_True_when_defuser

Conversation

@ZX-ModelCloud
Copy link
Copy Markdown
Collaborator

This PR fixes LazyTurtle rematerialization for fused MoE expert checkpoints that do not match the previously hard-coded split layout.

The original logic could incorrectly materialize tensors such as gate_proj.weight by assuming a fixed split dimension. That breaks models with rectangular expert projections and models that store fused expert weights in
transposed layouts.

…final tensor layout from the target leaf shape instead of relying on a hard-coded split dimension

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Comment thread tests/test_offload_files.py Dismissed
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
@Qubitium Qubitium changed the title [FIX] LazyTurtle fused MoE materialization for rectangular/transposed expert layouts Fix: LazyTurtle materialization for non-square fused experts Apr 10, 2026
@Qubitium Qubitium merged commit a888ab8 into main Apr 10, 2026
6 checks passed
@Qubitium Qubitium deleted the zx_fix_offload_disk_True_when_defuser branch April 10, 2026 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants