Fix MaxText 22b.sh AOT compilation OOM on v4-128#4081
Merged
Conversation
This change enables vocabulary tiling (num_vocab_tiling=8) and forces flash attention (attention=flash) in 22b.sh. This resolves the HBM OOM when running AOT compilation for v4-128 with per_device_batch_size=13. By using vocabulary tiling, we save ~1.4 GB HBM, allowing the model to fit without requiring TP=2 or activation offloading, maintaining the original performance intent of the config.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
khatwanimohit
approved these changes
Jun 5, 2026
YixuanWang-99
approved these changes
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR resolves the TPU HBM Out-of-Memory (OOM) error during Ahead-of-Time (AOT) compilation for the
v4-128topology withper_device_batch_size=13by enabling Vocabulary Tiling and forcing Flash Attention in the22b.shconfiguration.Details
22b.shfails with an HBM OOM during compilation.num_vocab_tiling=8) and forcing flash attention (attention=flash) reduces peak memory usage and compiler fragmentation, allowing the compilation to succeed.FIXED: b/517329766
BUGS: b/517329766
Tests
The fix was verified by running the AOT compilation script directly on a TPU VM targeting the 2-slice
v4-128topology.Command to Reproduce
Results
Compilation completed successfully:
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.