Forward max_memory_padding to _chunked_apply in optimize() by niklashoelter · Pull Request #513 · TorchSim/torch-sim

niklashoelter · 2026-03-18T16:21:53Z

The optimize() function extracts several attributes from the InFlightAutoBatcher and passes them to _chunked_apply(), which creates a BinningAutoBatcher for FIRE initialization. However, max_memory_padding was not forwarded, causing the BinningAutoBatcher to use its default of 1.0 (no safety margin). This can lead to OOM errors during optimizer initialization on large workloads, because the memory estimation fills 100% of GPU memory with a bare forward pass, leaving no headroom for the additional state allocated by fire_init() (velocities, dt, alpha, etc.).

Summary

When passing an InFlightAutoBatcher with a custom max_memory_padding to optimize(), the padding value is not forwarded to the internal _chunked_apply() call used for optimizer initialization (e.g. FIRE init). This causes the BinningAutoBatchercreated inside_chunked_apply()to default tomax_memory_padding=1.0`, effectively using no safety margin during memory estimation for the init phase.

We observed OOM errors during FIRE initialization on large workloads (~4000 structures, 24 GB GPU) that we believe are caused by this. The memory estimator determines batch sizes that fill 100% of GPU memory based on a bare forward pass, leaving no headroom for the additional state allocated by fire_init() (velocities, dt, alpha, etc.). Reducing max_memory_padding had no effect, since the value was not reaching the BinningAutoBatcher.

Fix

Forward max_memory_padding from the InFlightAutoBatcher to _chunked_apply() in runners.py, alongside the other attributes that are already forwarded (max_memory_scaler, memory_scales_with, max_atoms_to_try, oom_error_message).

Before a pull request can be merged, the following items must be checked:

Doc strings have been added in the Google docstring format.
Run ruff on your code.
Tests have been added for any new functionality or bug fixes.

The optimize() function extracts several attributes from the InFlightAutoBatcher and passes them to _chunked_apply(), which creates a BinningAutoBatcher for FIRE initialization. However, max_memory_padding was not forwarded, causing the BinningAutoBatcher to use its default of 1.0 (no safety margin). This can lead to OOM errors during optimizer initialization on large workloads, because the memory estimation fills 100% of GPU memory with a bare forward pass, leaving no headroom for the additional state allocated by fire_init() (velocities, dt, alpha, etc.).

orionarcher

Good catch!

niklashoelter marked this pull request as ready for review March 18, 2026 16:25

niklashoelter mentioned this pull request Mar 18, 2026

max_memory_padding not forwarded to BinningAutoBatcher during optimizer init in optimize() #514

Closed

CompRhys requested a review from orionarcher March 18, 2026 17:01

CompRhys linked an issue Mar 18, 2026 that may be closed by this pull request

max_memory_padding not forwarded to BinningAutoBatcher during optimizer init in optimize() #514

Closed

Merge branch 'main' into fix/forward-max-memory-padding

415377b

orionarcher approved these changes Mar 20, 2026

View reviewed changes

orionarcher merged commit 8c9ddea into TorchSim:main Mar 20, 2026
70 of 72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward max_memory_padding to _chunked_apply in optimize()#513

Forward max_memory_padding to _chunked_apply in optimize()#513
orionarcher merged 2 commits intoTorchSim:mainfrom
niklashoelter:fix/forward-max-memory-padding

niklashoelter commented Mar 18, 2026 •

edited

Loading

Uh oh!

orionarcher left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

niklashoelter commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Uh oh!

orionarcher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

niklashoelter commented Mar 18, 2026 •

edited

Loading