{numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvompi-2023a + QuantumESPRESSO-7.3.1 (GPU enabled) #20364

Crivella · 2024-04-15T16:32:52Z

Added easyconfig files for nvofbf toolchain + QE 7.3.1

local compilers:

GCC/12.3.0
CUDA/12.1.1

Added toolchain/numlib

nvofbf-2023a
- nvompi-2023a
  - NVHPC-23.7-CUDA-12.1.1
  - OpenMPI-4.1.5
- FlexiBLAS-3.3.1
  - OpenBLAS-0.3.24
- FFTW-3.3.10
- FFTW.MPI-3.3.10
- ScaLAPACK-2.2.0-fb

Added easyconfigs

HDF5-1.14.0-nvompi-2023a-CUDA-12.1.1.eb
libxc-6.2.2-NVHPC-23.7-CUDA-12.1.1.eb
QuantumESPRESSO-7.3.1-nvompi-2023a-CUDA-12.1.1.eb

NOTES:

QuantumESPRESSO easyconfig also requires changes from this commit. Will incorporate them once Archive autotools-based QuantumESPRESSO easyblock and switch default to CMakeMake-based easyblock easybuild-easyblocks#3306 is merged
ELPA: compiles using cuda compilers which requires specified compute capability (CC), while QE uses hpc-sdk compilers which if not specified compiles for all supported CCs

Solved issues:

High number of failures in OpenBLAS lapack-testsuite: LAPACK test failures with NVHPC 23.7 OpenMathLib/OpenBLAS#4625
- Use less optimization for v0.3.24
- Use patch for v0.3.27

Open issue:

Segfault in QE test-suite due to FlexiBLAS occasionally when calling the ZHEEV BLAS routine
- Bug does not manifest when running the code with cuda-gdb
- Tested starting from nvompi linking directly to OpenBLAS and the error was not present
Segfault in 3 test cases with RMM-DIS diagonalization with k points other than GAMMA, most likely a QE bug (https://gitlab.com/QEF/q-e/-/issues/675)
Full CUDA libxc: https://gitlab.com/libxc/libxc/-/issues/135
- Tested patch from commit e648f37b
  - Compile time goes from ~5min to ~3.5h
  - Tests are unable to run
  - I would argue for now since it is not officially supported with CMAKE and only experimental with autotools, and also not a really widely used feature of QE, it is ok to not have the libxc routines run on GPU

This reverts commit 706e9d1.

Crivella · 2024-04-18T08:58:21Z

Comparison of code efficiency when linked to EB numlibs (no prefix) VS linked to NVHPC math_libs (-test prefix) shows no significative difference running on one node with a A100 GPU

[ RUN      ] MINE_QESPRESSO %ecut=250 %nbnd=400 %module_name=QuantumESPRESSO/7.3.1-nvompi-2023a-test %threads=1 /bf4db141 @vega-gpu:default+default
[ RUN      ] MINE_QESPRESSO %ecut=250 %nbnd=400 %module_name=QuantumESPRESSO/7.3.1-nvompi-2023a-CUDA-12.1.1 %threads=1 /e4ce2bb2 @vega-gpu:default+default
[       OK ] (1/2) MINE_QESPRESSO %ecut=250 %nbnd=400 %module_name=QuantumESPRESSO/7.3.1-nvompi-2023a-test %threads=1 /bf4db141 @vega-gpu:default+default
P: extract_report_time: 0 s (r:0, l:None, u:None)
P: PWSCF_cpu: 231.98 s (r:0, l:None, u:None)
P: PWSCF_wall: 241.82 s (r:0, l:None, u:None)
P: electrons_cpu: 213.8 s (r:0, l:None, u:None)
P: electrons_wall: 216.04 s (r:0, l:None, u:None)
P: c_bands_cpu: 181.97 s (r:0, l:None, u:None)
P: c_bands_wall: 183.78 s (r:0, l:None, u:None)
P: cegterg_cpu: 142.72 s (r:0, l:None, u:None)
P: cegterg_wall: 144.01 s (r:0, l:None, u:None)
P: calbec_cpu: 0.12 s (r:0, l:None, u:None)
P: calbec_wall: 0.55 s (r:0, l:None, u:None)
P: fft_cpu: 0.12 s (r:0, l:None, u:None)
P: fft_wall: 0.14 s (r:0, l:None, u:None)
P: ffts_cpu: 0.0 s (r:0, l:None, u:None)
P: ffts_wall: 0.0 s (r:0, l:None, u:None)
P: fftw_cpu: 1.26 s (r:0, l:None, u:None)
P: fftw_wall: 77.36 s (r:0, l:None, u:None)
[       OK ] (2/2) MINE_QESPRESSO %ecut=250 %nbnd=400 %module_name=QuantumESPRESSO/7.3.1-nvompi-2023a-CUDA-12.1.1 %threads=1 /e4ce2bb2 @vega-gpu:default+default
P: extract_report_time: 0 s (r:0, l:None, u:None)
P: PWSCF_cpu: 232.44 s (r:0, l:None, u:None)
P: PWSCF_wall: 241.74 s (r:0, l:None, u:None)
P: electrons_cpu: 214.16 s (r:0, l:None, u:None)
P: electrons_wall: 216.18 s (r:0, l:None, u:None)
P: c_bands_cpu: 182.3 s (r:0, l:None, u:None)
P: c_bands_wall: 183.9 s (r:0, l:None, u:None)
P: cegterg_cpu: 143.11 s (r:0, l:None, u:None)
P: cegterg_wall: 144.18 s (r:0, l:None, u:None)
P: calbec_cpu: 0.12 s (r:0, l:None, u:None)
P: calbec_wall: 0.56 s (r:0, l:None, u:None)
P: fft_cpu: 0.0 s (r:0, l:None, u:None)
P: fft_wall: 0.01 s (r:0, l:None, u:None)
P: ffts_cpu: 0.0 s (r:0, l:None, u:None)
P: ffts_wall: 0.0 s (r:0, l:None, u:None)
P: fftw_cpu: 1.24 s (r:0, l:None, u:None)
P: fftw_wall: 77.23 s (r:0, l:None, u:None)
[----------] all spawned checks have finished

cgross95 · 2024-05-21T22:06:57Z

Thanks for putting all of this together! Our site is interested in a GPU enabled QuantumESPRESSO build, so we've been testing this.

Were you able to get around the "other error"s that occur in LAPACK testing when building OpenBLAS? Using the OpenBLAS_0.3.24-NVHPC-23.7-CUDA-12.1.1.eb EasyConfig as provided gives us 55 other errors:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1328283         0       (0.000%)        0       (0.000%)        
DOUBLE PRECISION        1328013         10      (0.001%)        0       (0.000%)        
COMPLEX                 769507          159     (0.021%)        55      (0.007%)        
COMPLEX16               780654          116     (0.015%)        0       (0.000%)        

--> ALL PRECISIONS      4206457         285     (0.007%)        55      (0.001%)

I saw that you had done some work with on OpenBLAS issue #4652 to get some of the numerical failures down, but was wondering if you were ever able to get rid of the other errors that stop EasyBuild from finishing.

Crivella · 2024-05-22T08:33:17Z

@cgross95
In my case by compiling on a machine with a A100 gpu i ended up with only 148 numerical errors.

			-->   LAPACK TESTING SUMMARY  <--
SUMMARY             	nb test run 	numerical error   	other error  
================   	===========	=================	================  
REAL             	1326099		12	(0.001%)	0	(0.000%)	
DOUBLE PRECISION	1326921		36	(0.003%)	0	(0.000%)	
COMPLEX          	762663		42	(0.006%)	0	(0.000%)	
COMPLEX16         	771518		58	(0.008%)	0	(0.000%)	

--> ALL PRECISIONS	4187201		148	(0.004%)	0	(0.000%)

What hardware are you trying this on?
Would be interesting in finding out if this is strictly OpenBLAS/LaPACK related or if it is about setting more compiler flags for different architectures.

I think i was still getting some other errors as well with 0.3.27 but i didn't investigate much further into it as i was aiming at 0.3.24 for this release (In that case i was getting 14 errors related to the ZHSQR and ZGEEV routines failing to find all eigenvalues).

The logs should give you further details on which lapack routine failed and with what error code (each function should have the meaning of the errors as comments in the source/documentation).
In case you think those errors might not be a problem you could also increase the threshold of allowed lapack test errors by changing the value assigned to max_failing_lapack_tests_num_errors and adding also max_failing_lapack_tests_other_errors to allow the other errors.

cgross95 · 2024-05-22T12:15:34Z

I'm compiling on a v100s with an Intel Xeon Skylake on Ubuntu 22.04. We also have some a100 cards, but we're in the midst of transferring everything in our cluster to Ubuntu, so they're not easily accessible at the moment. I'll dig into the LAPACK testing logs and see if I can produce some more useful debugging information.

cgross95 · 2024-06-07T19:22:32Z

I finally got access to our A100 cards, and can report that there were no "other error"s in the LAPACK tests. I ended up with 152 numerical errors, so increased the max_failing_lapack_tests_num_errors EasyConfig parameter, and was able to successfully install OpenBLAS. I'm continuing on with the rest of the build now.

Crivella added 2 commits April 8, 2024 12:35

Added cuda recipe for QE

ce4438d

Added EC files for nvofbf toolchain + QE

7a62d7e

migueldiascosta added the update label Apr 16, 2024

migueldiascosta added this to the 4.x milestone Apr 16, 2024

Crivella added 7 commits April 17, 2024 15:12

Added HDF5

32c751f

Added libxc

dac0e06

Changed QE to compile from nvompi

0133074

Removed flexiblas and child packages

706e9d1

Cleanup and better docs

7ebde85

Revert "Removed flexiblas and child packages"

9a45407

This reverts commit 706e9d1.

Removed flexiblas and child packages

ad07ec8

Crivella changed the title ~~{numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvofbf-2023a + QuantumESPRESSO-7.3.1 (GPU enabled)~~ {numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvompi-2023a + QuantumESPRESSO-7.3.1 (GPU enabled) Apr 22, 2024

bedroge mentioned this pull request Jun 5, 2024

add patches for failing LAPACK tests and RISC-V test segfaults to OpenBLAS 0.3.27 #20745

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvompi-2023a + QuantumESPRESSO-7.3.1 (GPU enabled) #20364

{numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvompi-2023a + QuantumESPRESSO-7.3.1 (GPU enabled) #20364

Crivella commented Apr 15, 2024 •

edited

Loading

Crivella commented Apr 18, 2024

cgross95 commented May 21, 2024

Crivella commented May 22, 2024 •

edited

Loading

cgross95 commented May 22, 2024

cgross95 commented Jun 7, 2024

{numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvompi-2023a + QuantumESPRESSO-7.3.1 (GPU enabled) #20364

Are you sure you want to change the base?

{numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvompi-2023a + QuantumESPRESSO-7.3.1 (GPU enabled) #20364

Conversation

Crivella commented Apr 15, 2024 • edited Loading

Crivella commented Apr 18, 2024

cgross95 commented May 21, 2024

Crivella commented May 22, 2024 • edited Loading

cgross95 commented May 22, 2024

cgross95 commented Jun 7, 2024

Crivella commented Apr 15, 2024 •

edited

Loading

Crivella commented May 22, 2024 •

edited

Loading