forked from abacusmodeling/abacus-develop
-
Notifications
You must be signed in to change notification settings - Fork 145
Closed
Labels
BugsBugs that only solvable with sufficient knowledge of DFTBugs that only solvable with sufficient knowledge of DFTGPU & DCU & HPCGPU and DCU and HPC related any issuesGPU and DCU and HPC related any issuescollinear/non-collinear/SOCIssues related to SOCIssues related to SOC
Description
Describe the bug
When setting device=gpu, the nspin=4 calculations result in an error (see log below).
<< Start SCF iteration.
[Workstation:842863] *** Process received signal ***
[Workstation:842863] Signal: Segmentation fault (11)
[Workstation:842863] Signal code: Address not mapped (1)
[Workstation:842863] Failing at address: 0x8
[Workstation:842863] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0xebbad842520]
[Workstation:842863] [ 1] /home/abacus-develop/build/abacus(+0x75d2a5)[0x60d5e747f2a5]
[Workstation:842863] [ 2] /home/abacus-develop/build/abacus(+0x75724d)[0x60d5e747924d]
[Workstation:842863] [ 3] /home/abacus-develop/build/abacus(+0x73270b)[0x60d5e745470b]
[Workstation:842863] [ 4] /home/abacus-develop/build/abacus(+0x683b58)[0x60d5e73a5b58]
[Workstation:842863] [ 5] /home/abacus-develop/build/abacus(+0x682811)[0x60d5e73a4811]
[Workstation:842863] [ 6] /home/abacus-develop/build/abacus(+0x67b1a5)[0x60d5e739d1a5]
[Workstation:842863] [ 7] /home/abacus-develop/build/abacus(+0x3e54b9)[0x60d5e71074b9]
[Workstation:842863] [ 8] /home/abacus-develop/build/abacus(+0x5b37b5)[0x60d5e72d57b5]
[Workstation:842863] [ 9] /home/abacus-develop/build/abacus(+0x56994d)[0x60d5e728b94d]
[Workstation:842863] [10] /home/abacus-develop/build/abacus(+0x34ef5c)[0x60d5e7070f5c]
[Workstation:842863] [11] /home/abacus-develop/build/abacus(+0x36416b)[0x60d5e708616b]
[Workstation:842863] [12] /home/abacus-develop/build/abacus(+0x3621c1)[0x60d5e70841c1]
[Workstation:842863] [13] /home/abacus-develop/build/abacus(+0x3638b7)[0x60d5e70858b7]
[Workstation:842863] [14] /home/abacus-develop/build/abacus(+0x99b64)[0x60d5e6dbbb64]
[Workstation:842863] [15] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0xebbad829d90]
[Workstation:842863] [16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0xebbad829e40]
[Workstation:842863] [17] /home/abacus-develop/build/abacus(+0x99a05)[0x60d5e6dbba05]
[Workstation:842863] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node Workstation exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Upon further testing, it appears that the issue is not related to the ks_solver itself, as device=cpu with ks_solver=cusolver works correctly. The problem seems to stem from <GPU grid integration> rather than the solver.
Expected behavior
Setting device=cpu works well.
To Reproduce
- Set
device=gpuandnspin=4for any SCF calculation. - Run the calculation.
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)
- Verify the issue is not a duplicate.
- Describe the bug.
- Steps to reproduce.
- Expected behavior.
- Error message.
- Environment details.
- Additional context.
- Assign a priority level (low, medium, high, urgent).
- Assign the issue to a team member.
- Label the issue with relevant tags.
- Identify possible related issues.
- Create a unit test or automated test to reproduce the bug (if applicable).
- Fix the bug.
- Test the fix.
- Update documentation (if necessary).
- Close the issue and inform the reporter (if applicable).
Metadata
Metadata
Assignees
Labels
BugsBugs that only solvable with sufficient knowledge of DFTBugs that only solvable with sufficient knowledge of DFTGPU & DCU & HPCGPU and DCU and HPC related any issuesGPU and DCU and HPC related any issuescollinear/non-collinear/SOCIssues related to SOCIssues related to SOC