forked from abacusmodeling/abacus-develop
-
Notifications
You must be signed in to change notification settings - Fork 145
Closed
Description
Describe the bug
While investigating the resolved Issue #6228 (LCAO pchg calculation), I encountered a new segmentation fault when running SCF calculations on the same test structure using ABACUS versions 3.9.0.3 and later. The calculation completes successfully in v3.9.0.2 but fails with an MPI_ERR_TRUNCATE error in newer versions.
The program crashes with:
[ItzTony-Workstation:1319202] *** An error occurred in MPI_Allreduce
[ItzTony-Workstation:1319202] *** reported by process [290127873,3]
[ItzTony-Workstation:1319202] *** on communicator MPI_COMM_WORLD
[ItzTony-Workstation:1319202] *** MPI_ERR_TRUNCATE: message truncated
[ItzTony-Workstation:1319202] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ItzTony-Workstation:1319202] *** and potentially your MPI job)
Expected behavior
The calculation should complete successfully as in v3.9.0.2, showing:
• Normal SCF convergence
• Final stress/pressure output
• Clean program termination
To Reproduce
- Use the input files from Issue LCAO Partial charge density calculation failed by
get_pchg: OutputsNaNin v3.9.0.4, Works in older v3.8.5 #6228 (attached by the original reporter) - Run SCF calculation with ABACUS v3.9.0.3 or later
- Observe segmentation fault right after SCF iterations are finished
Environment
• First broken version: 3.9.0.3
• Last working version: 3.9.0.2
• System: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
• Uncertainty: Currently unclear if this affects other structures or is specific to this case
My cmake output is as follows:
itztony@ItzTony-Workstation $ cmake -B build -DCMAKE_PREFIX_PATH="/home/itztony/Softwares/elpa-2024.05.001/lib;/home/itztony/Softwares/libxc-6.2.2-install" -DELPA_INCLUDE_DIR=/home/itztony/Softwares/elpa-2024.05.001/elpa -DELPA_LIBRARIES=/home/itztony/Softwares/elpa-2024.05.001/lib/libelpa_openmp.so -DLibxc_DIR=/home/itztony/Softwares/libxc-6.2.2-install -DENABLE_LIBXC=1 -DENABLE_LIBRI=1
-- The CXX compiler identification is GNU 12.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.43.0")
-- Found git: attempting to get commit info...
-- Current commit hash: 62077982b
-- Last commit date: Tue Apr 1 20:24:01 2025 +0800
-- Found Cereal: /usr/include
-- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1")
-- Found ELPA: /home/itztony/Softwares/elpa-2024.05.001/lib/libelpa_openmp.so
-- Performing Test ELPA_VERSION_SATISFIES
-- Performing Test ELPA_VERSION_SATISFIES - Success
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda-12.1/bin/nvcc
-- CUDA components detected, but USE_CUDA is set to OFF. NOT building CUDA version of ABACUS.
-- Found FFTW3: /usr/lib/x86_64-linux-gnu/libfftw3_omp.so
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so;-lm;-ldl
-- Found ScaLAPACK: /usr/lib/x86_64-linux-gnu/libscalapack-openmpi.so
-- Populating libri
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-subbuild
[ 11%] Creating directories for 'libri-populate'
[ 22%] Performing download step (download, verify and extract) for 'libri-populate'
-- Downloading...
dst='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-subbuild/libri-populate-prefix/src/v0.2.1.1.tar.gz'
timeout='none'
inactivity timeout='none'
-- Using src='https://github.com/abacusmodeling/LibRI/archive/refs/tags/v0.2.1.1.tar.gz'
-- Downloading... done
-- extracting...
src='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-subbuild/libri-populate-prefix/src/v0.2.1.1.tar.gz'
dst='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-src'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 33%] No update step for 'libri-populate'
[ 44%] No patch step for 'libri-populate'
[ 55%] No configure step for 'libri-populate'
[ 66%] No build step for 'libri-populate'
[ 77%] No install step for 'libri-populate'
[ 88%] No test step for 'libri-populate'
[100%] Completed 'libri-populate'
[100%] Built target libri-populate
-- Found LibRI: /home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libri-src
-- Populating libcomm
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-subbuild
[ 11%] Creating directories for 'libcomm-populate'
[ 22%] Performing download step (download, verify and extract) for 'libcomm-populate'
-- Downloading...
dst='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-subbuild/libcomm-populate-prefix/src/v0.1.1.tar.gz'
timeout='none'
inactivity timeout='none'
-- Using src='https://github.com/abacusmodeling/LibComm/archive/refs/tags/v0.1.1.tar.gz'
-- Downloading... done
-- extracting...
src='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-subbuild/libcomm-populate-prefix/src/v0.1.1.tar.gz'
dst='/home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-src'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 33%] No update step for 'libcomm-populate'
[ 44%] No patch step for 'libcomm-populate'
[ 55%] No configure step for 'libcomm-populate'
[ 66%] No build step for 'libcomm-populate'
[ 77%] No install step for 'libcomm-populate'
[ 88%] No test step for 'libcomm-populate'
[100%] Completed 'libcomm-populate'
[100%] Built target libcomm-populate
-- Found LibComm: /home/itztony/Softwares/ABACUS_releases/abacus-develop/build/_deps/libcomm-src
-- Checking for one of the modules 'libxc'
-- Found Libxc: /home/itztony/Softwares/libxc-6.2.2-install/lib/libxc.a
-- Found Libxc: version 6.2.2
-- Configuring done (13.4s)
-- Generating done (0.1s)
-- Build files have been written to: /home/itztony/Softwares/ABACUS_releases/abacus-develop/buildMetadata
Metadata
Assignees
Labels
No labels