Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenBLAS 0.3.18 LAPACK tests failed due to numerical errors #16504

Closed
Ezimkin opened this issue Oct 28, 2022 · 4 comments
Closed

OpenBLAS 0.3.18 LAPACK tests failed due to numerical errors #16504

Ezimkin opened this issue Oct 28, 2022 · 4 comments
Milestone

Comments

@Ezimkin
Copy link

Ezimkin commented Oct 28, 2022

Hello all,

I'm having some issues building the dependencies for LAMMPS-23Jun2022-foss-2021b-kokkos-CUDA-11.4.1.eb. In particular, OpenBLAS version 0.3.18 fails due to too many numerical error occurrences during testing.

I assume there is due to some imprecise flags given to the compiler? If it is system dependent then I will mention that this is on an AMD EPYC 7713 server.

			-->   LAPACK TESTING SUMMARY  <--
SUMMARY             	nb test run 	numerical error   	other error  
================   	===========	=================	================  
REAL             	1294329		1233	(0.095%)	0	(0.000%)	
DOUBLE PRECISION	1302917		1195	(0.092%)	0	(0.000%)	
COMPLEX          	756180		1005	(0.133%)	0	(0.000%)	
COMPLEX16         	768848		144	(0.019%)	0	(0.000%)	

--> ALL PRECISIONS	4122274		3577	(0.087%)	0	(0.000%)	

Here is the full log file from the build.

easybuild-OpenBLAS-0.3.18-20221027.182153.scXWI.log

@akesandgren
Copy link
Contributor

If you're not using EasyBuild 4.6.2 then upgrade to that and recompile GCCcore and OpenBLAS for that (and any other toolchain you have that uses GCC 11/12)
There is a vectorizer bug in the stock GCC 11/12 and a FMA problem in the LAPACK part of OpenBLAS that got fixed in EasyBUild 4.6.2

@Ezimkin
Copy link
Author

Ezimkin commented Oct 28, 2022

Ah, that may be a solution. I'm not sure what version it was from since i upgraded eb a few days before this but GCC and GCCcore were definitely using that older build.

I will build it this weekend and report back if it resolves my issue.

@Ezimkin
Copy link
Author

Ezimkin commented Oct 28, 2022

I got my results back a earlier than expected. I can confirm that recompilation of GCC and GCCcore resolved this issue.

Thank you @akesandgren !

@Ezimkin Ezimkin closed this as completed Oct 28, 2022
@boegel
Copy link
Member

boegel commented Oct 29, 2022

More details on this in #16380

@boegel boegel added this to the 4.6.2 milestone Oct 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants