Fix MaxText 22b.sh AOT compilation OOM on v4-128 by darisoy · Pull Request #4081 · AI-Hypercomputer/maxtext

darisoy · 2026-06-05T18:17:12Z

Description

This PR resolves the TPU HBM Out-of-Memory (OOM) error during Ahead-of-Time (AOT) compilation for the v4-128 topology with per_device_batch_size=13 by enabling Vocabulary Tiling and forcing Flash Attention in the 22b.sh configuration.

Details

The default configuration of 22b.sh fails with an HBM OOM during compilation.
Enabling vocabulary tiling (num_vocab_tiling=8) and forcing flash attention (attention=flash) reduces peak memory usage and compiler fragmentation, allowing the compilation to succeed.

FIXED: b/517329766
BUGS: b/517329766

Tests

The fix was verified by running the AOT compilation script directly on a TPU VM targeting the 2-slice v4-128 topology.

Command to Reproduce

bash src/maxtext/configs/tpu/v4/22b.sh \
  EXECUTABLE=train_compile \
  M_COMPILE_TOPOLOGY=v4-128 \
  M_COMPILE_TOPOLOGY_NUM_SLICES=2 \
  DATASET_PATH=dummy-dataset \
  OUTPUT_PATH=dummy-output-dir \
  RUN_PREFLIGHT=false

Results

Compilation completed successfully:

Jitting and compilation complete!
Finished train_compile.py successfully!

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

This change enables vocabulary tiling (num_vocab_tiling=8) and forces flash attention (attention=flash) in 22b.sh. This resolves the HBM OOM when running AOT compilation for v4-128 with per_device_batch_size=13. By using vocabulary tiling, we save ~1.4 GB HBM, allowing the model to fit without requiring TP=2 or activation offloading, maintaining the original performance intent of the config.

codecov · 2026-06-05T18:21:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

darisoy marked this pull request as ready for review June 5, 2026 18:26

khatwanimohit approved these changes Jun 5, 2026

View reviewed changes

YixuanWang-99 approved these changes Jun 5, 2026

View reviewed changes

github-actions Bot added the pull ready label Jun 5, 2026

copybara-service Bot merged commit b2153a3 into main Jun 5, 2026
48 checks passed

copybara-service Bot deleted the fix-22b-sh-oom branch June 5, 2026 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MaxText 22b.sh AOT compilation OOM on v4-128#4081

Fix MaxText 22b.sh AOT compilation OOM on v4-128#4081
copybara-service[bot] merged 1 commit into
mainfrom
fix-22b-sh-oom

darisoy commented Jun 5, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

darisoy commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Details

Tests

Command to Reproduce

Results

Checklist

Uh oh!

codecov Bot commented Jun 5, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

darisoy commented Jun 5, 2026 •

edited

Loading