NNX train by charlesli640 · Pull Request #3442 · AI-Hypercomputer/maxtext

charlesli640 · 2026-03-18T17:36:38Z

Description

Implement pre-train using NNX style

Tests

python3 src/maxtext/trainers/pre_train/nnx_train.py src/maxtext/configs/base.yml \
run_name="run_llama2_7b" \
model_name="llama2-7b" \
dataset_type=synthetic \
steps=10 \
scan_layers=True \
debug_sharding=True \
async_checkpointing=False \
remat_policy=full \
checkpoint_storage_use_zarr3=false \
enable_checkpointing=false \
enable_nnx=true \
pure_nnx_decoder=true

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-03-18T17:40:36Z

Codecov Report

❌ Patch coverage is 38.68778% with 271 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/trainers/pre_train/nnx_train.py	30.65%	203 Missing and 21 partials ⚠️
src/maxtext/utils/gradient_accumulation.py	5.26%	36 Missing ⚠️
...rc/maxtext/trainers/pre_train/nnx_train_compile.py	88.15%	4 Missing and 5 partials ⚠️
src/maxtext/utils/maxtext_utils.py	60.00%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

bvandermoon

Thank you @charlesli640. Is it possible to migrate train.py directly instead of forking this logic? I am concerned these two files will get out of sync before they are merged, this setup will skip running unit tests, and it makes the code a bit more complicated/harder to follow

charlesli640 · 2026-03-19T01:46:14Z

Thank you @charlesli640. Is it possible to migrate train.py directly instead of forking this logic? I am concerned these two files will get out of sync before they are merged, this setup will skip running unit tests, and it makes the code a bit more complicated/harder to follow

Definitely we can move the logic to train.py and make it controlled by enable_nnx config. Actually this is one of experimental solutions I am doing internally - try to create brand-new pre-train using pure NNX style, leaving old linen style pre-train untouched/co-existing.

Another solution is submitted on PR #3427. This solution tries to keep/re-use current linen style train loop as much as possible. It created TrainStateNNX class and make existing linen functions compatible to both linen model and nnx model. Please also review the PR3427. We can discuss more on which direction we are going.

charlesli640 requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, dipannita08, gagika, gobbleturk, hengtaoguo, igorts-git, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners March 18, 2026 17:36

bvandermoon reviewed Mar 18, 2026

View reviewed changes

charlesli640 force-pushed the charlesli/nnx_train branch from 94324cd to bdc6ec6 Compare March 19, 2026 01:23

Charles Li added 4 commits March 19, 2026 01:30

Add nnx_train

77f0592

Combine loss_fn for both train and evel

c311ead

Support gradient_accumulation and align to latest train.py

9f9629c

Support muon

82365f7

charlesli640 force-pushed the charlesli/nnx_train branch from bdc6ec6 to 82365f7 Compare March 19, 2026 01:47

charlesli640 marked this pull request as draft March 19, 2026 01:56

NNX train_compile

09d3714

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NNX train#3442

NNX train#3442
charlesli640 wants to merge 5 commits intoAI-Hypercomputer:mainfrom
CIeNET-International:charlesli/nnx_train

charlesli640 commented Mar 18, 2026

Uh oh!

codecov bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

bvandermoon left a comment

Uh oh!

charlesli640 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

charlesli640 commented Mar 18, 2026

Description

Tests

Checklist

Uh oh!

codecov bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bvandermoon left a comment

Choose a reason for hiding this comment

Uh oh!

charlesli640 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 18, 2026 •

edited

Loading