Skip to content

Error in LCAO NSPIN4 calculation using large orbital and high parallelism #5328

@pxlxingliang

Description

@pxlxingliang

Describe the bug

I try to do the SCF of Fe16 with nspin is 4 by using LCAO, and ABACUS throw error at the beginning of SCF.

Using machine c32_m64_cpu in bohrium (memory is 64G) with 16 cores parallel, the SCF with orbitals 6/7/8 au are successful, while the calculations with 9/10 au are failed.
While, using machine c32_m128_cpu (memory is 128G) with 32 cores parallel, the calculations of all orbitals are failed.
It seems that using large orbital and high parallelism will cause the error.

I have tested the calculation of 9au with c32_m256_cpu machine, and check the memory during SCF by htop. With 16 cores parallel, the memory cost is about 35G at most time with a peak memory of about 50G, while with 32 cores parallel, the memory is increased to about 47G for most time with a peak memory of about 74G, but after the peak memory (memory cost has down to about 47G) abacus will terminate abnormal.

a (2).zip

 * * * * * *
 << Start SCF iteration.

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 39 RUNNING AT dp-lbg-471-15043057
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).

Metadata

Metadata

Assignees

Labels

BugsBugs that only solvable with sufficient knowledge of DFT

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions