fix: enable ulysses sharding for custom kernels and improve scaling precision#396
Merged
copybara-service[bot] merged 1 commit intomainfrom May 4, 2026
Merged
fix: enable ulysses sharding for custom kernels and improve scaling precision#396copybara-service[bot] merged 1 commit intomainfrom
copybara-service[bot] merged 1 commit intomainfrom
Conversation
Collaborator
|
Thanks for fixing this @Perseus14 ! I tested on v6e (don't have v7x for now due to capacity) and the generation time is 204 sec: I think the generation time is expected, because on v7x we are seeing 28% speed boost (140 vs 194.4). From go/wan-dashboard the e2e gen time is 322 sec, the speed gain is about 36% percent. |
eltsai
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces two small but important fixes to the Ulysses attention implementation:
src/maxdiffusion/pyconfig.py, the check for Ulysses attention is changed fromattention == "ulysses"to"ulysses" in attention. This ensures that custom attention implementations that include "ulysses" in their identifier (e.g.,custom_ulysses) will correctly trigger Ulysses sequence sharding instead of falling back to default sharding strategies.src/maxdiffusion/models/attention_flax.py, padding was done forqueryvariable butquery_scaledwas used in the attention calculation. This fixes it and ensures the padded variable is used in the attention calculation.src/maxdiffusion/models/attention_flax.py, the hardcoded constant1.44269504used to scale queries for base-2 exponentiation is replaced withmath.log2(math.e). This provides better precision and makes the intent of the code clearer.Generation Time