Skip to content

Add custom mesh and logical rule support#3438

Merged
copybara-service[bot] merged 1 commit intomainfrom
chengnuojin-custom-logical
Mar 19, 2026
Merged

Add custom mesh and logical rule support#3438
copybara-service[bot] merged 1 commit intomainfrom
chengnuojin-custom-logical

Conversation

@NuojCheng
Copy link
Copy Markdown
Collaborator

@NuojCheng NuojCheng commented Mar 18, 2026

Description

This pull request introduces the ability to define and use customized physical meshes and logical axis rules in MaxText via external YAML configuration files. This enhances flexibility for complex parallelism strategies, such as those required for large scale MoE setups.

Key Changes

Configuration Enhancements

  • Custom Config Loading: Added a custom_mesh_and_rule field to base.yml.
  • Dynamic YAML Parsing: Implemented logic in src/maxtext/configs/types.py to automatically load mesh_axes, logical_axis_rules, and data_sharding from specified YAML files located in src/maxtext/configs/custom_mesh_and_rule/.
  • Example Configs: Provided initial custom configurations for pure-fsdp.yml and a complex ds3-large-pp.yml.

Robust Utility Functions

  • Added helper functions in src/maxtext/utils/max_utils.py for safer access to mesh properties:
    • get_mesh_axes_size: Returns the size of a physical axis or 1 if it doesn't exist.
    • get_logical_rule_contents: Safely retrieves physical axes corresponding to a logical name.

Code Refactoring for Mesh Flexibility

  • Updated src/maxtext/layers/attention_op.py, src/maxtext/utils/sharding.py, and src/maxtext/utils/train_utils.py to use the new utility functions. This ensures the model can compile even when standard axes (like data or fsdp) are absent or redefined in a custom mesh.

Tests

Added tests/unit/custom_mesh_and_rule_test.py to verify successful compilation using the new custom mesh flags for both FSDP and multi-axis configurations.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread src/maxtext/configs/custom_mesh_and_rule/pipeline-large-moe.yml
Comment thread src/maxtext/configs/custom_mesh_and_rule/pure-fsdp.yml
Comment thread src/maxtext/configs/base.yml Outdated
Comment thread src/maxtext/utils/max_utils.py Outdated
Comment thread src/maxtext/utils/max_utils.py Outdated
Comment thread tests/unit/custom_mesh_and_rule_test.py
Copy link
Copy Markdown
Collaborator

@gobbleturk gobbleturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea of increased visibility and sharability!

@NuojCheng NuojCheng force-pushed the chengnuojin-custom-logical branch 2 times, most recently from 2b2826a to 77b676e Compare March 18, 2026 21:59
@NuojCheng NuojCheng requested a review from gobbleturk March 18, 2026 21:59
@NuojCheng NuojCheng force-pushed the chengnuojin-custom-logical branch from 77b676e to cf5ac08 Compare March 18, 2026 22:00
Copy link
Copy Markdown
Collaborator

@richjames0 richjames0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to the moon

@NuojCheng NuojCheng force-pushed the chengnuojin-custom-logical branch from cf5ac08 to 307bc11 Compare March 19, 2026 16:52
Copy link
Copy Markdown
Collaborator

@gobbleturk gobbleturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let the sharding sharing begin!

@copybara-service copybara-service Bot merged commit e1a2ba7 into main Mar 19, 2026
31 of 32 checks passed
@copybara-service copybara-service Bot deleted the chengnuojin-custom-logical branch March 19, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants