Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLAS misidentified as lp64 (it's ilp64) #206

Open
Jellby opened this issue Mar 2, 2021 · 16 comments
Open

BLAS misidentified as lp64 (it's ilp64) #206

Jellby opened this issue Mar 2, 2021 · 16 comments

Comments

@Jellby
Copy link

Jellby commented Mar 2, 2021

I would like to use a compiled Reference BLAS (and LAPACK), as part of another package. The libraries should be ilp64 and I add it as a required linalg component. However, I get, with CMake:

-- Could NOT find BLAS (missing: ilp64)

It seems https://github.com/GlobalArrays/ga/blob/develop/cmake/linalg-modules/util/ilp64_checker.c returns with:

 ** On entry to DGEMM parameter number  8 had an illegal value

but with return code of 0, and the result checksum is never done.

In https://github.com/Reference-LAPACK/lapack/blob/master/BLAS/SRC/xerbla.f, changing the STOP to STOP 1, causes at least the return code to be 1 and the BLAS library being identified as ilp64.

@ajaypanyala
Copy link
Contributor

@Jellby Could you please post the full CMake configure line you are using ? Also when you build/use an existing Reference LAPACK library, do you know if it was built with the ilp64 option -DBUILD_INDEX64=ON ? Could you also confirm if the libs libblas64.a and liblapack64.a are available in the existing Ref. BLAS/LAPACK installation.

@Jellby
Copy link
Author

Jellby commented Mar 3, 2021

It's a bit of an unorthodox build, I'm afraid... There's a file blas.F containing just lines like:

#include "caxpy.f"
#include "cgemm.f"
#include "dasum.f"
#include "daxpy.f"
#include "dcabs1.f"
...

which is compiled with:

$ gfortran -fdefault-integer-8 -I/path/to/lapack/BLAS/SRC -c blas.F
$ ar cr libblas.a blas.o

Then, for GA:

$ cmake -D ENABLE_FORTRAN=ON -D ENABLE_CXX=OFF -D ENABLE_TESTS=OFF -D ENABLE_I8=ON -D LINALG_REQUIRED_COMPONENTS=ilp64 -D ENABLE_BLAS=ON -D LINALG_PREFIX="/" -D BLAS_LIBRARIES=/path/to/libblas.a /path/to/ga
...
-- void size: 8, USE_I8: ON, ENABLE_I8: ON
-- Performing Test BLAS_LOWER_UNDERSCORE
-- Performing Test BLAS_LOWER_UNDERSCORE -- found
-- Could NOT find BLAS (missing: ilp64) 
CMake Error at cmake/ga-linalg.cmake:142 (message):
  ENABLE_BLAS=ON, but a LAPACK library was not found
Call Stack (most recent call first):
  CMakeLists.txt:154 (include)


-- Configuring incomplete, errors occurred!

And manually compiling the diagnostic file shows this:

$ gfortran -DDGEMM_NAME=dgemm_ /path/to/ga/cmake/util/ilp64_checker.c libblas.a
$ ./a.out ; echo $?
 ** On entry to DGEMM parameter number  8 had an illegal value
0

while compiling blas.F with fno-default-integer-8 gives just 0, without the ** On entry line (so a true lp64 dgemm is working correctly).

@ajaypanyala
Copy link
Contributor

We use the CMake LinAlg modules provided by @wavefunction91 and the modules only support the recommended Ref. BLAS/LAPACK/ScaLAPACK builds. In this case, the solution might just be to cp libblas.a libblas64.a and try again. Also, ENABLE_I8 does not have to be specified since it's set automatically.

$ gfortran -DDGEMM_NAME=dgemm_ /path/to/ga/cmake/util/ilp64_checker.c libblas.a
$ ./a.out ; echo $?
** On entry to DGEMM parameter number 8 had an illegal value
0

I think the way it's setup is if this test fails, the CMake BLAS discovery module determines it to be ILP64, otherwise LP64. See
https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/BLASUtilities.cmake#L93-L108
https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/ilp64_checker.c#L25-L28

@wavefunction91
Copy link

If you want to use ILP64 reference ScaLAPACK, you'll have to specify it manually (e.g. specify ScaLAPACK_LIBRARIES=..., ScaLAPACK_IS_LP64=FALSE), we assume NETLIB ScaLAPACK is LP64, and would need to run an MPI program to validate this at configure (not generally possible on e.g. SLURM systems)

@Jellby
Copy link
Author

Jellby commented Mar 3, 2021

I think the way it's setup is if this test fails, the CMake BLAS discovery module determines it to be ILP64, otherwise LP64.

The point is the test does not "fail", but it should. The dgemm call aborts, but the program still returns 0, which makes it look successful. Perhaps the test should check that the return code is 0 and that stderr/stdout is empty?

@ajaypanyala
Copy link
Contributor

@Jellby Does this happen even after renaming libblas.a -> libblas64.a ? I think that is because ReferenceBLAS_ilp64_FOUND is set to FALSE because it could not find libblas64.a with -DLINALG_REQUIRED_COMPONENTS=ilp64 @wavefunction91 Is that right?

@wavefunction91
Copy link

@ajaypanyala That shouldn't matter, he's specifying it manually so it should just go through the validation.

@Jellby Checking output would not be robust in general, for example, it's possible to trigger call-tree print in MKL with an environment variable while still giving the correct result.

This is strange, if your BLAS library is ILP64, I can't see how it would give the correct results here
https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/ilp64_checker.c#L37

@Jellby
Copy link
Author

Jellby commented Mar 3, 2021

@wavefunction91 It's not giving the correct result, the dgemm call (https://github.com/wavefunction91/linalg-cmake-modules/blob/main/util/ilp64_checker.c#L29-L32) aborts and the program returns with 0, the check after is never executed.

@wavefunction91
Copy link

wavefunction91 commented Mar 3, 2021

@Jellby Program aborts signal 0? That's not canonical at all, SIGABRT should be a return of 6, no?
EDIT: Nevermind, guess I should have read the entire issue - the issue is in XERBLA, OK. That's tedious, I'll have to give it some thought as to how to best handle that robustly. I guess we could change the exe return to be the checksum, that way 0 is a failure - 8 is a success?

@Jellby
Copy link
Author

Jellby commented Mar 5, 2021

Yeah, it's a Fortran STOP, I don't know if the standard says anything about the return code of that, but apparently it's a "normal termination" and a return code of 0 is recommended (https://stackoverflow.com/a/49545586/). Maybe, instead of the return code, something more specific (e.g. "dgemm call completed sucessfully") could be written to stdout and that could be tested.

@wavefunction91
Copy link

That's a good idea, I'll add that today and @ajaypanyala can update it here when its checked in

@ajaypanyala
Copy link
Contributor

@Jellby Could you please try issue206 branch and see if that fixes this issue. This branch has @wavefunction91's latest fixes.

@Jellby
Copy link
Author

Jellby commented Mar 8, 2021

It seems to work, but I'll have to change the infrastructure somewhat to be able to do a real test.

@ajaypanyala
Copy link
Contributor

@Jellby I have merged the latest fixes from @wavefunction91 to develop.

@Jellby
Copy link
Author

Jellby commented May 19, 2021

Somehow related to this (tell me if I should open a new issue): Now detection of LAPACK fails, apparently because:

a) The order of libraries matters. This for a case where I pass BLAS_LIBRARIES="foo.a;bar.a", and one has BLAS and the other LAPACK, and I don't necessarily know which is which. Could the finder figure out whether LAPACK needs the (already found) BLAS libraries and add them to the test (as it does with gfortran, etc.)?

b) libm is needed, but not identified as a requirement. Apparently this is because it just searches for logf in the error messages, but in my case (probably because I have only a subset of LAPACK), the missing symbol is cabs.

@Jellby
Copy link
Author

Jellby commented May 21, 2021

About b), actually the error I get is something like https://stackoverflow.com/questions/30013845, where a single function is mentioned, and that might be arbitrary, but at least there's an explicit libm.so in the error message. Perhaps

if( ${__compile_output} MATCHES "logf" )
could say something like this instead:

if( ${__compile_output} MATCHES "logf" OR ${__compile_output} MATCHES "/libm.so" )

And another suggestion. LAPACK is identified by searching for dpstrf, but this function doesn't seem to be used anywhere in GA. Wouldn't it make more sense to search for something that's actually used (like dsyev, which is already used for identifying the interface)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants