Skip to content

Fix FIRE NaN velocity sentinel corrupting retained systems during autobatcher swaps#490

Merged
abhijeetgangan merged 2 commits intomainfrom
fix-ase-fire-nan-velocity-autobatcher
Mar 4, 2026
Merged

Fix FIRE NaN velocity sentinel corrupting retained systems during autobatcher swaps#490
abhijeetgangan merged 2 commits intomainfrom
fix-ase-fire-nan-velocity-autobatcher

Conversation

@janosh
Copy link
Copy Markdown
Collaborator

@janosh janosh commented Mar 4, 2026

Fixes a bug in _ase_fire_step where the NaN-velocity initialization sentinel skips the entire FIRE step (force transformation, power calculation, dt/alpha/n_pos updates, velocity mixing) for all systems when any system has NaN velocities.

This matters when the InFlightAutoBatcher swaps in a new state mid-optimization — the new state has NaN velocities from fire_init, which causes retained systems to skip a FIRE step entirely.

Evidence

Clone-and-compare test: two identical systems, 10 FIRE steps, then inject NaN velocities into system 1 only and run one more step. System 0 should be completely unaffected:

Metric Main (buggy) With fix
Position diff 3.76e-04 0.0
dt diff 1.33e-02 0.0
alpha diff 9.70e-04 0.0
n_pos diff 1 0

test_fire_nan_velocities_dont_affect_other_systems fails on all 4 parametrizations on main, passes on all 4 with the fix.

Fix

Decouple NaN zeroing from the FIRE logic branch. Only skip FIRE mixing when nan_velocities.all() (first step, matching ASE). When a subset has NaN (autobatcher swap), zero them and run normal FIRE — newly zeroed systems naturally get power=0 → negative mask → dt decrease and velocity reset.

…ched optimization

The NaN-velocity sentinel in `_ase_fire_step` used an `if/else` that
skipped force transformation AND FIRE mixing for ALL systems whenever
ANY system had NaN velocities. When the InFlightAutoBatcher swaps in a
new state (NaN velocities from fire_init), retained systems got their
FIRE state (dt, alpha, n_pos, velocity mixing) skipped entirely.

Proven with a clone-and-compare test: on main, system 0's positions,
dt, alpha, and n_pos all differ when system 1 has NaN velocities
injected. With the fix, all diffs are exactly 0.

Fix: decouple NaN zeroing from the FIRE logic branch. Only skip FIRE
mixing when ALL velocities are NaN (first step, matching ASE behavior).
When a subset has NaN (autobatcher swap), zero them and proceed with
normal FIRE logic — newly zeroed systems naturally get power=0 →
negative mask → dt decrease and velocity reset.

Note: this is separate from Killian's FixSymmetry NaN error, which
also reproduces with LBFGS and appears to be an autobatcher-level issue.
@janosh janosh force-pushed the fix-ase-fire-nan-velocity-autobatcher branch from 2154c6c to caa7482 Compare March 4, 2026 17:05
@janosh janosh changed the title Fix ASE FIRE NaN velocity handling breaking FixSymmetry + autobatcher Fix FIRE NaN velocity sentinel corrupting retained systems during autobatcher swaps Mar 4, 2026
@janosh janosh marked this pull request as draft March 4, 2026 17:06
@janosh janosh marked this pull request as ready for review March 4, 2026 17:07
Use nan_to_num_() instead of conditional masked zeroing in _ase_fire_step,
fix _vv_fire_step using zeros_like(positions) instead of velocities, and
add unit cell filter to NaN isolation test parametrization.
@janosh
Copy link
Copy Markdown
Collaborator Author

janosh commented Mar 4, 2026

failing CI is intermittent MACE download issue. @orionarcher do you want to review? if not, @abhijeetgangan feel free to merge?

@abhijeetgangan abhijeetgangan merged commit 4a45e7e into main Mar 4, 2026
129 of 130 checks passed
@abhijeetgangan abhijeetgangan deleted the fix-ase-fire-nan-velocity-autobatcher branch March 4, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants