Fix FIRE NaN velocity sentinel corrupting retained systems during autobatcher swaps#490
Merged
abhijeetgangan merged 2 commits intomainfrom Mar 4, 2026
Merged
Conversation
…ched optimization The NaN-velocity sentinel in `_ase_fire_step` used an `if/else` that skipped force transformation AND FIRE mixing for ALL systems whenever ANY system had NaN velocities. When the InFlightAutoBatcher swaps in a new state (NaN velocities from fire_init), retained systems got their FIRE state (dt, alpha, n_pos, velocity mixing) skipped entirely. Proven with a clone-and-compare test: on main, system 0's positions, dt, alpha, and n_pos all differ when system 1 has NaN velocities injected. With the fix, all diffs are exactly 0. Fix: decouple NaN zeroing from the FIRE logic branch. Only skip FIRE mixing when ALL velocities are NaN (first step, matching ASE behavior). When a subset has NaN (autobatcher swap), zero them and proceed with normal FIRE logic — newly zeroed systems naturally get power=0 → negative mask → dt decrease and velocity reset. Note: this is separate from Killian's FixSymmetry NaN error, which also reproduces with LBFGS and appears to be an autobatcher-level issue.
2154c6c to
caa7482
Compare
abhijeetgangan
approved these changes
Mar 4, 2026
Use nan_to_num_() instead of conditional masked zeroing in _ase_fire_step, fix _vv_fire_step using zeros_like(positions) instead of velocities, and add unit cell filter to NaN isolation test parametrization.
Collaborator
Author
|
failing CI is intermittent MACE download issue. @orionarcher do you want to review? if not, @abhijeetgangan feel free to merge? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes a bug in
_ase_fire_stepwhere the NaN-velocity initialization sentinel skips the entire FIRE step (force transformation, power calculation, dt/alpha/n_pos updates, velocity mixing) for all systems when any system has NaN velocities.This matters when the
InFlightAutoBatcherswaps in a new state mid-optimization — the new state has NaN velocities fromfire_init, which causes retained systems to skip a FIRE step entirely.Evidence
Clone-and-compare test: two identical systems, 10 FIRE steps, then inject NaN velocities into system 1 only and run one more step. System 0 should be completely unaffected:
3.76e-040.01.33e-020.09.70e-040.010test_fire_nan_velocities_dont_affect_other_systemsfails on all 4 parametrizations on main, passes on all 4 with the fix.Fix
Decouple NaN zeroing from the FIRE logic branch. Only skip FIRE mixing when
nan_velocities.all()(first step, matching ASE). When a subset has NaN (autobatcher swap), zero them and run normal FIRE — newly zeroed systems naturally getpower=0→ negative mask → dt decrease and velocity reset.