Skip to content

Bug: Segmentation fault during SCF calculation in v3.9.0.3+ for specific structure (MPI_ERR_TRUNCATE) #6229

@AsTonyshment

Description

@AsTonyshment

Describe the bug

While investigating the resolved Issue #6228 (LCAO pchg calculation), I encountered a new segmentation fault when running SCF calculations on the same test structure using ABACUS versions 3.9.0.3 and later. The calculation completes successfully in v3.9.0.2 but fails with an MPI_ERR_TRUNCATE error in newer versions.

The program crashes with:

[ItzTony-Workstation:1319202] *** An error occurred in MPI_Allreduce
[ItzTony-Workstation:1319202] *** reported by process [290127873,3]
[ItzTony-Workstation:1319202] *** on communicator MPI_COMM_WORLD
[ItzTony-Workstation:1319202] *** MPI_ERR_TRUNCATE: message truncated
[ItzTony-Workstation:1319202] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ItzTony-Workstation:1319202] ***    and potentially your MPI job)

Expected behavior

The calculation should complete successfully as in v3.9.0.2, showing:
• Normal SCF convergence
• Final stress/pressure output
• Clean program termination

To Reproduce

  1. Use the input files from Issue LCAO Partial charge density calculation failed by get_pchg: Outputs NaN in v3.9.0.4, Works in older v3.8.5 #6228 (attached by the original reporter)
  2. Run SCF calculation with ABACUS v3.9.0.3 or later
  3. Observe segmentation fault right after SCF iterations are finished

Environment

• First broken version: 3.9.0.3
• Last working version: 3.9.0.2
• System: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
• Uncertainty: Currently unclear if this affects other structures or is specific to this case

My cmake output is as follows:

itztony@ItzTony-Workstation $ cmake -B build -DCMAKE_PREFIX_PATH="/home/itztony/Softwares/elpa-2024.05.001/lib;/home/itztony/Softwares/libxc-6.2.2-install" -DELPA_INCLUDE_DIR=/home/itztony/Softwares/elpa-2024.05.001/elpa -DELPA_LIBRARIES=/home/itztony/Softwares/elpa-2024.05.001/lib/libelpa_openmp.so -DLibxc_DIR=/home/itztony/Softwares/libxc-6.2.2-install -DENABLE_LIBXC=1 -DENABLE_LIBRI=1
-- The CXX compiler identification is GNU 12.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.43.0") 
-- Found git: attempting to get commit info...
-- Current commit hash: 62077982b
-- Last commit date: Tue Apr 1 20:24:01 2025 +0800
-- Found Cereal: /usr/include  
-- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") 
-- Found ELPA: /home/itztony/Softwares/elpa-2024.05.001/lib/libelpa_openmp.so  
-- Performing Test ELPA_VERSION_SATISFIES
-- Performing Test ELPA_VERSION_SATISFIES - Success
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda-12.1/bin/nvcc
-- CUDA components detected, but USE_CUDA is set to OFF. NOT building CUDA version of ABACUS.
-- Found FFTW3: /usr/lib/x86_64-linux-gnu/libfftw3_omp.so  
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libopenblas.so  
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so;-lm;-ldl  
-- Found ScaLAPACK: /usr/lib/x86_64-linux-gnu/libscalapack-openmpi.so  
-- Populating libri
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-subbuild
[ 11%] Creating directories for 'libri-populate'
[ 22%] Performing download step (download, verify and extract) for 'libri-populate'
-- Downloading...
   dst='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-subbuild/libri-populate-prefix/src/v0.2.1.1.tar.gz'
   timeout='none'
   inactivity timeout='none'
-- Using src='https://github.com/abacusmodeling/LibRI/archive/refs/tags/v0.2.1.1.tar.gz'
-- Downloading... done
-- extracting...
     src='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-subbuild/libri-populate-prefix/src/v0.2.1.1.tar.gz'
     dst='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-src'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 33%] No update step for 'libri-populate'
[ 44%] No patch step for 'libri-populate'
[ 55%] No configure step for 'libri-populate'
[ 66%] No build step for 'libri-populate'
[ 77%] No install step for 'libri-populate'
[ 88%] No test step for 'libri-populate'
[100%] Completed 'libri-populate'
[100%] Built target libri-populate
-- Found LibRI: /home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-src  
-- Populating libcomm
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-subbuild
[ 11%] Creating directories for 'libcomm-populate'
[ 22%] Performing download step (download, verify and extract) for 'libcomm-populate'
-- Downloading...
   dst='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-subbuild/libcomm-populate-prefix/src/v0.1.1.tar.gz'
   timeout='none'
   inactivity timeout='none'
-- Using src='https://github.com/abacusmodeling/LibComm/archive/refs/tags/v0.1.1.tar.gz'
-- Downloading... done
-- extracting...
     src='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-subbuild/libcomm-populate-prefix/src/v0.1.1.tar.gz'
     dst='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-src'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 33%] No update step for 'libcomm-populate'
[ 44%] No patch step for 'libcomm-populate'
[ 55%] No configure step for 'libcomm-populate'
[ 66%] No build step for 'libcomm-populate'
[ 77%] No install step for 'libcomm-populate'
[ 88%] No test step for 'libcomm-populate'
[100%] Completed 'libcomm-populate'
[100%] Built target libcomm-populate
-- Found LibComm: /home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-src  
-- Checking for one of the modules 'libxc'
-- Found Libxc: /home/itztony/Softwares/libxc-6.2.2-install/lib/libxc.a  
-- Found Libxc: version 6.2.2
-- Configuring done (13.4s)
-- Generating done (0.1s)
-- Build files have been written to: /home/itztony/Softwares/ABACUS_releases/abacus-develop/build

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions