Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvompi-2023a + QuantumESPRESSO-7.3.1 (GPU enabled) #20364

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

Crivella
Copy link
Contributor

@Crivella Crivella commented Apr 15, 2024

Added easyconfig files for nvofbf toolchain + QE 7.3.1

local compilers:

  • GCC/12.3.0
  • CUDA/12.1.1

Added toolchain/numlib

  • nvofbf-2023a
    • nvompi-2023a
      • NVHPC-23.7-CUDA-12.1.1
      • OpenMPI-4.1.5
    • FlexiBLAS-3.3.1
      • OpenBLAS-0.3.24
    • FFTW-3.3.10
    • FFTW.MPI-3.3.10
    • ScaLAPACK-2.2.0-fb

Added easyconfigs

  • HDF5-1.14.0-nvompi-2023a-CUDA-12.1.1.eb
  • libxc-6.2.2-NVHPC-23.7-CUDA-12.1.1.eb
  • QuantumESPRESSO-7.3.1-nvompi-2023a-CUDA-12.1.1.eb

NOTES:

Solved issues:

Open issue:

  • Segfault in QE test-suite due to FlexiBLAS occasionally when calling the ZHEEV BLAS routine
    • Bug does not manifest when running the code with cuda-gdb
    • Tested starting from nvompi linking directly to OpenBLAS and the error was not present
  • Segfault in 3 test cases with RMM-DIS diagonalization with k points other than GAMMA, most likely a QE bug (https://gitlab.com/QEF/q-e/-/issues/675)
  • Full CUDA libxc: https://gitlab.com/libxc/libxc/-/issues/135
    • Tested patch from commit e648f37b
      • Compile time goes from ~5min to ~3.5h
      • Tests are unable to run
      • I would argue for now since it is not officially supported with CMAKE and only experimental with autotools, and also not a really widely used feature of QE, it is ok to not have the libxc routines run on GPU

@migueldiascosta migueldiascosta added this to the 4.x milestone Apr 16, 2024
@Crivella
Copy link
Contributor Author

Comparison of code efficiency when linked to EB numlibs (no prefix) VS linked to NVHPC math_libs (-test prefix) shows no significative difference running on one node with a A100 GPU

[ RUN      ] MINE_QESPRESSO %ecut=250 %nbnd=400 %module_name=QuantumESPRESSO/7.3.1-nvompi-2023a-test %threads=1 /bf4db141 @vega-gpu:default+default
[ RUN      ] MINE_QESPRESSO %ecut=250 %nbnd=400 %module_name=QuantumESPRESSO/7.3.1-nvompi-2023a-CUDA-12.1.1 %threads=1 /e4ce2bb2 @vega-gpu:default+default
[       OK ] (1/2) MINE_QESPRESSO %ecut=250 %nbnd=400 %module_name=QuantumESPRESSO/7.3.1-nvompi-2023a-test %threads=1 /bf4db141 @vega-gpu:default+default
P: extract_report_time: 0 s (r:0, l:None, u:None)
P: PWSCF_cpu: 231.98 s (r:0, l:None, u:None)
P: PWSCF_wall: 241.82 s (r:0, l:None, u:None)
P: electrons_cpu: 213.8 s (r:0, l:None, u:None)
P: electrons_wall: 216.04 s (r:0, l:None, u:None)
P: c_bands_cpu: 181.97 s (r:0, l:None, u:None)
P: c_bands_wall: 183.78 s (r:0, l:None, u:None)
P: cegterg_cpu: 142.72 s (r:0, l:None, u:None)
P: cegterg_wall: 144.01 s (r:0, l:None, u:None)
P: calbec_cpu: 0.12 s (r:0, l:None, u:None)
P: calbec_wall: 0.55 s (r:0, l:None, u:None)
P: fft_cpu: 0.12 s (r:0, l:None, u:None)
P: fft_wall: 0.14 s (r:0, l:None, u:None)
P: ffts_cpu: 0.0 s (r:0, l:None, u:None)
P: ffts_wall: 0.0 s (r:0, l:None, u:None)
P: fftw_cpu: 1.26 s (r:0, l:None, u:None)
P: fftw_wall: 77.36 s (r:0, l:None, u:None)
[       OK ] (2/2) MINE_QESPRESSO %ecut=250 %nbnd=400 %module_name=QuantumESPRESSO/7.3.1-nvompi-2023a-CUDA-12.1.1 %threads=1 /e4ce2bb2 @vega-gpu:default+default
P: extract_report_time: 0 s (r:0, l:None, u:None)
P: PWSCF_cpu: 232.44 s (r:0, l:None, u:None)
P: PWSCF_wall: 241.74 s (r:0, l:None, u:None)
P: electrons_cpu: 214.16 s (r:0, l:None, u:None)
P: electrons_wall: 216.18 s (r:0, l:None, u:None)
P: c_bands_cpu: 182.3 s (r:0, l:None, u:None)
P: c_bands_wall: 183.9 s (r:0, l:None, u:None)
P: cegterg_cpu: 143.11 s (r:0, l:None, u:None)
P: cegterg_wall: 144.18 s (r:0, l:None, u:None)
P: calbec_cpu: 0.12 s (r:0, l:None, u:None)
P: calbec_wall: 0.56 s (r:0, l:None, u:None)
P: fft_cpu: 0.0 s (r:0, l:None, u:None)
P: fft_wall: 0.01 s (r:0, l:None, u:None)
P: ffts_cpu: 0.0 s (r:0, l:None, u:None)
P: ffts_wall: 0.0 s (r:0, l:None, u:None)
P: fftw_cpu: 1.24 s (r:0, l:None, u:None)
P: fftw_wall: 77.23 s (r:0, l:None, u:None)
[----------] all spawned checks have finished

@Crivella Crivella changed the title {numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvofbf-2023a + QuantumESPRESSO-7.3.1 (GPU enabled) {numlib,chem,tollchain}[NVHPC/23.7-CUDA-12.1.1] nvompi-2023a + QuantumESPRESSO-7.3.1 (GPU enabled) Apr 22, 2024
@cgross95
Copy link
Contributor

Thanks for putting all of this together! Our site is interested in a GPU enabled QuantumESPRESSO build, so we've been testing this.

Were you able to get around the "other error"s that occur in LAPACK testing when building OpenBLAS? Using the OpenBLAS_0.3.24-NVHPC-23.7-CUDA-12.1.1.eb EasyConfig as provided gives us 55 other errors:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1328283         0       (0.000%)        0       (0.000%)        
DOUBLE PRECISION        1328013         10      (0.001%)        0       (0.000%)        
COMPLEX                 769507          159     (0.021%)        55      (0.007%)        
COMPLEX16               780654          116     (0.015%)        0       (0.000%)        

--> ALL PRECISIONS      4206457         285     (0.007%)        55      (0.001%)        

I saw that you had done some work with on OpenBLAS issue #4652 to get some of the numerical failures down, but was wondering if you were ever able to get rid of the other errors that stop EasyBuild from finishing.

@Crivella
Copy link
Contributor Author

Crivella commented May 22, 2024

@cgross95
In my case by compiling on a machine with a A100 gpu i ended up with only 148 numerical errors.

			-->   LAPACK TESTING SUMMARY  <--
SUMMARY             	nb test run 	numerical error   	other error  
================   	===========	=================	================  
REAL             	1326099		12	(0.001%)	0	(0.000%)	
DOUBLE PRECISION	1326921		36	(0.003%)	0	(0.000%)	
COMPLEX          	762663		42	(0.006%)	0	(0.000%)	
COMPLEX16         	771518		58	(0.008%)	0	(0.000%)	

--> ALL PRECISIONS	4187201		148	(0.004%)	0	(0.000%)

What hardware are you trying this on?
Would be interesting in finding out if this is strictly OpenBLAS/LaPACK related or if it is about setting more compiler flags for different architectures.

I think i was still getting some other errors as well with 0.3.27 but i didn't investigate much further into it as i was aiming at 0.3.24 for this release (In that case i was getting 14 errors related to the ZHSQR and ZGEEV routines failing to find all eigenvalues).

The logs should give you further details on which lapack routine failed and with what error code (each function should have the meaning of the errors as comments in the source/documentation).
In case you think those errors might not be a problem you could also increase the threshold of allowed lapack test errors by changing the value assigned to max_failing_lapack_tests_num_errors and adding also max_failing_lapack_tests_other_errors to allow the other errors.

@cgross95
Copy link
Contributor

I'm compiling on a v100s with an Intel Xeon Skylake on Ubuntu 22.04. We also have some a100 cards, but we're in the midst of transferring everything in our cluster to Ubuntu, so they're not easily accessible at the moment. I'll dig into the LAPACK testing logs and see if I can produce some more useful debugging information.

@cgross95
Copy link
Contributor

cgross95 commented Jun 7, 2024

I finally got access to our A100 cards, and can report that there were no "other error"s in the LAPACK tests. I ended up with 152 numerical errors, so increased the max_failing_lapack_tests_num_errors EasyConfig parameter, and was able to successfully install OpenBLAS. I'm continuing on with the rest of the build now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants