Forward max_memory_padding to _chunked_apply in optimize()#513
Merged
orionarcher merged 2 commits intoTorchSim:mainfrom Mar 20, 2026
Merged
Conversation
The optimize() function extracts several attributes from the InFlightAutoBatcher and passes them to _chunked_apply(), which creates a BinningAutoBatcher for FIRE initialization. However, max_memory_padding was not forwarded, causing the BinningAutoBatcher to use its default of 1.0 (no safety margin). This can lead to OOM errors during optimizer initialization on large workloads, because the memory estimation fills 100% of GPU memory with a bare forward pass, leaving no headroom for the additional state allocated by fire_init() (velocities, dt, alpha, etc.).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The optimize() function extracts several attributes from the InFlightAutoBatcher and passes them to _chunked_apply(), which creates a BinningAutoBatcher for FIRE initialization. However, max_memory_padding was not forwarded, causing the BinningAutoBatcher to use its default of 1.0 (no safety margin). This can lead to OOM errors during optimizer initialization on large workloads, because the memory estimation fills 100% of GPU memory with a bare forward pass, leaving no headroom for the additional state allocated by fire_init() (velocities, dt, alpha, etc.).
Summary
When passing an
InFlightAutoBatcherwith a custommax_memory_paddingtooptimize(), the padding value is not forwarded to the internal_chunked_apply()call used for optimizer initialization (e.g. FIRE init). This causes the BinningAutoBatchercreated inside_chunked_apply()to default tomax_memory_padding=1.0`, effectively using no safety margin during memory estimation for the init phase.We observed OOM errors during FIRE initialization on large workloads (~4000 structures, 24 GB GPU) that we believe are caused by this. The memory estimator determines batch sizes that fill 100% of GPU memory based on a bare forward pass, leaving no headroom for the additional state allocated by
fire_init()(velocities, dt, alpha, etc.). Reducingmax_memory_paddinghad no effect, since the value was not reaching theBinningAutoBatcher.Fix
Forward
max_memory_paddingfrom theInFlightAutoBatcherto_chunked_apply()inrunners.py, alongside the other attributes that are already forwarded (max_memory_scaler,memory_scales_with,max_atoms_to_try,oom_error_message).Before a pull request can be merged, the following items must be checked: