Skip to content

Fix buffer overflow in H_Ewald_pw::rgen for very small unit cells / high ecutwfc#7025

Merged
mohanchen merged 5 commits intodevelopfrom
copilot/fix-sdft-simulation-crash
Mar 14, 2026
Merged

Fix buffer overflow in H_Ewald_pw::rgen for very small unit cells / high ecutwfc#7025
mohanchen merged 5 commits intodevelopfrom
copilot/fix-sdft-simulation-crash

Conversation

Copy link

Copilot AI commented Mar 13, 2026

SDFT simulations with very small unit cells (e.g., 4-atom deuterium at 1596 g/cc) and high plane-wave cutoffs cause heap corruption due to a fixed-size buffer overflow in the Ewald real-space sum.

Root cause

r, r2, irr arrays in compute_ewald() were allocated with a hardcoded mxr=200. For small unit cells, rmax = 4/√α/lat0 grows large, and rgen() can find >200 r-vectors within the sphere — writing past the array end. The old guard (nrm > mxr) also let one out-of-bounds write through silently before printing to std::cerr.

Changes

  • Dynamic mxr sizing: Compute rmax and the loop bounds nm1/nm2/nm3 (same formula as rgen() uses) before allocating the work arrays. Set mxr = (2·nm1+1)(2·nm2+1)(2·nm3+1) — a tight upper bound guaranteeing no overflow for any unit cell geometry.
  • Remove duplicated rmax: Previously computed redundantly inside both #ifdef __MPI and #else branches after allocation; now computed once, before.
  • Harden rgen() guard: Changed nrm > mxrnrm >= mxr and replaced silent std::cerr with ModuleBase::WARNING_QUIT including nrm/mxr diagnostics. This is now a true safety net that should never trigger.

Reminder

  • Have you linked an issue with this pull request?
  • Have you added adequate unit tests and/or case tests for your pull request?
  • Have you noticed possible changes of behavior below or in the linked issue?
  • Have you explained the changes of codes in core modules of ESolver, HSolver, ElecState, Hamilt, Operator or Psi? (ignore if not applicable)

Linked Issue

Fixes the "rgen, too many r-vectors" crash and associated heap corruption.

Unit Tests and/or Case Tests for my changes

Unit tests for H_Ewald_pw::rgen() are added in source/source_hamilt/test/rgen_test.cpp:

  • ZeroRmax: verifies nrm=0 when rmax=0 (early-return path)
  • SimpleCubicNearestNeighbors: for a unit cubic cell with rmax=1.5, verifies exactly 18 r-vectors (6 nearest + 12 next-nearest neighbors) are returned in ascending-magnitude order
  • SimpleCubicNonZeroDtau: verifies non-zero dtau correctly shifts all computed vectors by that offset
  • LargeRmaxExceedsOriginalLimit: regression test using rmax=4.0 (yielding ~499 vectors, far above the old mxr=200 hard limit); sets mxr via the same dynamic formula added in the fix and verifies all vectors are within the sphere, properly sorted, and that nrm > 200

What's changed?

mxr is now computed per-call from the actual rmax and reciprocal lattice vectors, replacing the static value of 200 that overflowed for dense/small systems. Memory allocation and safety bounds in rgen() are now consistent.

Any changes of core modules? (ignore if not applicable)

H_Ewald_pw::compute_ewald() in source_hamilt/module_ewald/H_Ewald_pw.cpp — Ewald energy used by all ESolver variants. Logic is equivalent for normal cases; only the array size changes (grows as needed for extreme cases).

Original prompt

This section details on the original issue you should resolve

<issue_title>Crash in SDFT simulation: "ERROR! Current core has no plane waves! Please reduce the cores." and "rgen, too many r-vectors“</issue_title>
<issue_description>### Describe the bug

I am performing DFT simulations (using SDFT) of warm dense deuterium with ABACUS v3.9.0.22. My system contains 4 atoms in an NVT ensemble with a very high initial temperature (4e6 K) and density (1596.49gcc). When I set the plane‑wave cutoff ecutwfc = 70 Rydberg, the code immediately fails with ERROR! Current core has no plane waves! Please reduce the cores. even though I am using only 1 MPI process. After increasing the cutoff to 700 Rydberg, the run starts but produces an enormous number of warnings rgen, too many r-vectors (from 201 up to 1023 repeatedly) and eventually crashes with a double free or corruption error.

Expected behavior

No response

To Reproduce

SDFT_D_ecut=70.zip

SDFT_D_ecut=700.zip

Environment

ABACUS version: 3.9.0.22
MPI: OpenMPI (mpirun)
Hardware: CPU-only, Intel Xeon Gold 6132

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).</issue_description>

<agent_instructions>Follow @Cstandardlib's instructions and fix this bug.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@Cstandardlib similar to #1159, now this case uses a even smaller unitcell, need to increase `H_Ewald_pw::mxr` again.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

… for small unit cells

Co-authored-by: Cstandardlib <49788094+Cstandardlib@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix crash in SDFT simulation with high ecutwfc value Fix buffer overflow in H_Ewald_pw::rgen for very small unit cells / high ecutwfc Mar 13, 2026
Copilot AI requested a review from Cstandardlib March 13, 2026 10:09
@mohanchen mohanchen added Bugs Bugs that only solvable with sufficient knowledge of DFT Refactor Refactor ABACUS codes labels Mar 14, 2026
@Cstandardlib
Copy link
Collaborator

@copilot Add a unit test for this part.

…-overflow regression

Co-authored-by: Cstandardlib <49788094+Cstandardlib@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 14, 2026

@copilot Add a unit test for this part.

Added unit tests for H_Ewald_pw::rgen in source/source_hamilt/test/rgen_test.cpp (commit d744de9). Four tests cover:

  • ZeroRmax: no vectors returned when rmax=0
  • SimpleCubicNearestNeighbors: correct count (18) and sorted order for a unit cubic cell with rmax=1.5
  • SimpleCubicNonZeroDtau: non-zero dtau correctly shifts the origin
  • LargeRmaxExceedsOriginalLimit: regression test that uses the same dynamic mxr formula added in the fix and verifies nrm > 200 is handled safely (the exact scenario that caused the original crash)

Co-authored-by: Cstandardlib <49788094+Cstandardlib@users.noreply.github.com>
Copy link
Collaborator

@mohanchen mohanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent job. LGTM

Copy link
Collaborator

@Cstandardlib Cstandardlib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Cstandardlib Cstandardlib marked this pull request as ready for review March 14, 2026 03:39
@mohanchen mohanchen merged commit 7962634 into develop Mar 14, 2026
27 checks passed
@Cstandardlib Cstandardlib deleted the copilot/fix-sdft-simulation-crash branch March 14, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bugs Bugs that only solvable with sufficient knowledge of DFT Refactor Refactor ABACUS codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash in SDFT simulation: "ERROR! Current core has no plane waves! Please reduce the cores." and "rgen, too many r-vectors“

3 participants