Add total memory usage logging in GB and unit tests for activation offload #1813

zhenying-liu · 2025-06-10T07:01:40Z

Add total memory usage logging in GB which is more readable and useful to be tracked especially for offloading.
Also add two unit tests for activation offloading with and without scan.

gobbleturk · 2025-06-10T15:33:56Z

MaxText/train.py

@@ -963,11 +963,16 @@ def train_loop(config, recorder, state=None):
      compiled = p_train_step.lower(state, example_batch, nextrng).compile()
      compiled_stats = compiled.memory_analysis()
      if compiled_stats is not None:
+        total = (compiled_stats.output_size_in_bytes +


I realize the existing code doesn't follow this pattern, but can we move this formula + print statement into max_utils? Other files (e.g. sft_trainer or grpo_trainer) may also want to call this function, and it cleans up train.py which we prefer to be lean

Yes, moved this formula + print statement into max_utils. Done.

Also as requested in G Chat: "printing the stats at the end of the first step instead of at the end of all of the steps". This is done. It is moved to before the first step.

The current logs are in this order:

Memstats: After params initialized: Using (GB) XX / YY (ZZ%) on cuda:0 Total memory size: AA GB, Output size: BB GB, Temp size: CC GB, Argument size: DD GB, Host temp size: EE GB. completed step: 0, seconds: xxx, TFLOP/s/device: yyy, Tokens/s/device: zzz, total_weights: www, loss: ooo To see full metrics 'tensorboard --logdir=/tmp/tmp.Exyqpj9oUF/logdir/tensorboard/' completed step: 1, seconds: xxx, TFLOP/s/device: yyy, Tokens/s/device: zzz, total_weights: www, loss: ooo

gobbleturk · 2025-06-10T15:37:01Z

MaxText/tests/integration_tests/train_tests.py

+
+  @pytest.mark.integration_test
+  @pytest.mark.gpu_only
+  def test_gpu_activation_offload_without_scan(self):


Curious why have scan=False, is this something we really care about?

We also want to keep the explicit loop (non-scan version) working with good performance, as it may be needed for some non-LLM models. Also because they use different HLOs (copy-start/done vs dynamic-update-start/done, dynamic-slice-start/done), we want to track both.

gobbleturk · 2025-06-11T20:08:08Z

MaxText/train.py

@@ -841,6 +841,10 @@ def train_loop(config, recorder, state=None):
    if step == first_profiling_step or prof.should_activate_periodic_profile(step):
      optional_postfix = f"step_{step}" if config.profile_periodically_period > 0 else ""
      prof.activate(blocking_object=state, optional_postfix=optional_postfix)
+      with mesh, nn_partitioning.axis_rules(config.logical_axis_rules):


This fails our internal linter since nextrng may not be defined yet. Is it possible to move this before the main train for loop (beforefor step in np.arange(start_step, config.steps):)? You may have to plug in a random rngkey

Yes, moved this before the main train for loop and defined in max_utils.py to make train.py lean.
The data_iterator is copied to get the "example_batch" without disrupting the training loop. Please take a look.

gobbleturk · 2025-06-13T17:38:05Z

MaxText/max_utils.py


 import orbax.checkpoint as ocp

 from tensorboardX import writer

 from MaxText import max_logging
+from MaxText.train import load_next_batch


We don't want max_utils to depend on train, can we instead get a shaped version of batch similar to how its done in train_compile here

maxtext/MaxText/train_compile.py

Line 102 in 3296117

shaped_batch = maxtext_utils.get_shaped_batch(config)

It makes sense. Took this approach of maxtext_utils.get_shaped_batch(config). Done.

…inting to max_utils.py

SurbhiJainUSC · 2025-07-01T21:01:15Z

These changes are included in #1904 and it has been merged. Closing this PR.

zhenying-liu requested review from gobbleturk, khatwanimohit, bvandermoon, vipannalla, RissyRan, richjames0, gagika, shralex, yangyuwei, SurbhiJainUSC, hengtaoguo, A9isha and aireenmei as code owners June 10, 2025 07:01

gobbleturk reviewed Jun 10, 2025

View reviewed changes

zhenying-liu requested a review from gobbleturk June 10, 2025 16:59

gobbleturk approved these changes Jun 11, 2025

View reviewed changes

gobbleturk added the pull ready label Jun 11, 2025

gobbleturk reviewed Jun 11, 2025

View reviewed changes

zhenying-liu force-pushed the total_GB branch from 2393775 to 928649a Compare June 11, 2025 23:10

zhenying-liu changed the title ~~Add total memory usage in GB and unit tests for activation offload~~ Add total memory usage logging in GB and unit tests for activation offload Jun 11, 2025

gobbleturk added pull ready and removed pull ready labels Jun 13, 2025

gobbleturk reviewed Jun 13, 2025

View reviewed changes

zhenying-liu added 3 commits June 16, 2025 01:43

Add total memory usage in GB and unit tests for activation offload

c1aaa9c

Move compiled memory stats before the first step; move the code of pr…

d6cabaa

…inting to max_utils.py

move the compiled memory stats to max_utils and call it before the loop

cc1aabf

zhenying-liu force-pushed the total_GB branch from 928649a to 943462d Compare June 16, 2025 01:44

Move back the compiled memory stats back to train.py

b8f1641

zhenying-liu force-pushed the total_GB branch from 943462d to b8f1641 Compare June 16, 2025 05:57

gobbleturk added pull ready and removed pull ready labels Jun 16, 2025

gobbleturk removed the pull ready label Jun 25, 2025

SurbhiJainUSC mentioned this pull request Jun 30, 2025

Move jit compilation of train and eval step into train_utils #1904

Merged

4 tasks

SurbhiJainUSC closed this Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add total memory usage logging in GB and unit tests for activation offload #1813

Add total memory usage logging in GB and unit tests for activation offload #1813

Uh oh!

zhenying-liu commented Jun 10, 2025 •

edited

Loading

Uh oh!

gobbleturk Jun 10, 2025

Uh oh!

zhenying-liu Jun 10, 2025

Uh oh!

zhenying-liu Jun 10, 2025

Uh oh!

gobbleturk Jun 10, 2025

Uh oh!

zhenying-liu Jun 10, 2025

Uh oh!

gobbleturk Jun 11, 2025

Uh oh!

zhenying-liu Jun 11, 2025

Uh oh!

gobbleturk Jun 13, 2025 •

edited

Loading

Uh oh!

zhenying-liu Jun 16, 2025

Uh oh!

SurbhiJainUSC commented Jul 1, 2025

Uh oh!

Uh oh!

Add total memory usage logging in GB and unit tests for activation offload #1813

Add total memory usage logging in GB and unit tests for activation offload #1813

Uh oh!

Conversation

zhenying-liu commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gobbleturk Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SurbhiJainUSC commented Jul 1, 2025

Uh oh!

Uh oh!

zhenying-liu commented Jun 10, 2025 •

edited

Loading

gobbleturk Jun 13, 2025 •

edited

Loading