Skip to content

Conversation

@mayeut
Copy link
Contributor

@mayeut mayeut commented Nov 17, 2025

fix #5532

The n argument was assumed to be in x0 which is not always true.
Use %[N_] instead of x0 in the assembly code.

The clobber list listed x0->x7 but those are not used in the assembly code, remove them from the clobber list.
v0, z2 & z3 were missing in the clobber list.
d1 relabelled as v1 (v0 & v1 are used through aliases d0/d1, s0/s1)

CI tests added for Ubuntu 24.04 runner default clang (18.1.3) as well as clang-21 on both x86_64 & aarch64

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

Trying your PR on the sames system I reported the issue on initially, all tests now fail with a Segmentation Fault.
Don't know if this is because of LLVM 20 though.


Environment:

[reuter1@jrc0901 OpenBLAS]$ ml

Currently Loaded Modules:
  1) GCCcore/14.3.0   5) XZ/5.8.1         9) Z3/4.15.1   13) LLVM/20.1.8            17) libreadline/8.2   21) OpenSSL/3
  2) zlib/1.3.1       6) libxml2/2.14.3  10) gzip/1.14   14) llvm-compilers/20.1.8  18) libtommath/1.3.0  22) Python/3.13.5
  3) binutils/2.44    7) ncurses/6.5     11) lz4/1.10.0  15) make/4.4.1             19) Tcl/9.0.1
  4) libffi/3.5.1     8) GMP/6.3.0       12) zstd/1.5.7  16) bzip2/1.0.8            20) SQLite/3.50.1

Build steps:

$ make -j 16 shared  BINARY='64'  CC='clang'  FC='flang'  INTERFACE64='1'  LIBPREFIX='libopenblas64' MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 -Wl,--undefined-version -Wno-unused-command-line-argument -Wno-error=int-conversion'
$ OMP_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 make tests  BINARY='64'  CC='clang'  FC='flang'  INTERFACE64='1'  LIBPREFIX='libopenblas64'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 -Wl,--undefined-version -Wno-unused-command-line-argument -Wno-error=int-conversion'
[...]
 Real BLAS Test Program Results


 Test of subprogram number  1             SDOT
make[1]: *** [Makefile:80: level1] Segmentation fault (core dumped)
make[1]: Leaving directory '/p/project1/cswmanage/reuter1/OpenBLAS/test'
make: *** [Makefile:176: tests] Error 2

I'll provide a backtrace later, though it really doesn't provide much information.

@martin-frbg
Copy link
Collaborator

@Thyre can you please check if using the A64FX sdot kernel (i.e. copying the SDOTKERNEL line from KERNEL_A64FX to KERNEL.NEOVERSEN2) works for you ? This is basically the same as the default sve one but manually unrolled...

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

@martin-frbg I've tried the following replacement:

diff --git a/kernel/arm64/KERNEL.NEOVERSEN2 b/kernel/arm64/KERNEL.NEOVERSEN2
index 6431422fa..e608bacc6 100644
--- a/kernel/arm64/KERNEL.NEOVERSEN2
+++ b/kernel/arm64/KERNEL.NEOVERSEN2
@@ -97,7 +97,7 @@ CNRM2KERNEL    = znrm2.S
 ZNRM2KERNEL    = znrm2.S

 DDOTKERNEL     = dot.c
-SDOTKERNEL     = dot.c
+SDOTKERNEL     = dot_sve_v8.c
 CDOTKERNEL     = zdot_thunderx2t99.c
 ZDOTKERNEL     = zdot_thunderx2t99.c
 DSDOTKERNEL    = dot.S

Unfortunately, it still ended in the same error. With an added -g to the CFLAGS, I get this backtrace:

Thread 4 "sblat1" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40009378f680 (LWP 2432921)]
0x0000aaaaaaacfeb8 in dot_compute ()
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-168.el9_6.23.aarch64
(gdb) bt
#0  0x0000aaaaaaacfeb8 in dot_compute ()
#1  0x0000aaaaaaad0164 in dot_thread_function ()
#2  0x0000aaaaaaace408 in exec_threads ()
#3  0x0000aaaaaaace644 in exec_blas.omp_outlined ()
#4  0x00004000004156cc in __kmp_invoke_microtask ()
   from /p/project1/cswmanage/reuter1/EasyBuild/jedi/apps/software/LLVM/20.1.8-GCCcore-14.3.0/lib/aarch64-unknown-linux-gnu/libomp.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

This is the backtrace for a build with -O0:

 Real BLAS Test Program Results


 Test of subprogram number  1             SDOT

Program received signal SIGSEGV, Segmentation fault.
0x0000aaaaaaad298c in dot_kernel_asimd (n=4294967296, x=0xffffffff3528, inc_x=4294967297, y=0xffffffff3544, inc_y=4294967297)
    at ../kernel/arm64/dot_kernel_asimd.c:268
268             __asm__ __volatile__ (
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-168.el9_6.23.aarch64
(gdb) bt
#0  0x0000aaaaaaad298c in dot_kernel_asimd (n=4294967296, x=0xffffffff3528, inc_x=4294967297, y=0xffffffff3544, inc_y=4294967297) at ../kernel/arm64/dot_kernel_asimd.c:268
#1  0x0000aaaaaaad1bf4 in dot_compute (n=4294967296, x=0xffffffff3528, inc_x=4294967297, y=0xffffffff3544, inc_y=4294967297)
    at ../kernel/arm64/dot.c:116
#2  0x0000aaaaaaad1a28 in sdot_k (n=4294967296, x=0xffffffff3528, inc_x=4294967297, y=0xffffffff3544, inc_y=4294967297)
    at ../kernel/arm64/dot.c:145
#3  0x0000aaaaaaacc890 in sdot_ (N=0xaaaaaab7bea4 <combla_+4>, x=0xffffffff3528, INCX=0xaaaaaab7bea8 <combla_+8>, y=0xffffffff3544, INCY=0xaaaaaab7beac <combla_+12>) at dot.c:65
#4  0x0000aaaaaaaca33c in check2_ ()
#5  0x0000aaaaaaac959c in _QQmain ()
#6  0x0000aaaaaaacbf24 in main ()

@martin-frbg
Copy link
Collaborator

dot_kernel_asimd would suggest that you aren't using the SVE code path at all, what is this GH system autodetected as ? (Should be visible in the name of the library, or in the CORE entry in Makefile.conf)

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

dot_kernel_asimd would suggest that you aren't using the SVE code path at all, what is this GH system autodetected as ? (Should be visible in the name of the library, or in the CORE entry in Makefile.conf)

It is detected as Neoverse V2, so that should be correct:

$ grep CORE Makefile.conf
CORE=NEOVERSEV2
LIBCORE=neoversev2
NUM_CORES=72
$ ls libopenblas64*
libopenblas64.a   libopenblas64.so.0                       libopenblas64_neoversev2p-r0.3.30.dev.so
libopenblas64.so  libopenblas64_neoversev2p-r0.3.30.dev.a

@martin-frbg
Copy link
Collaborator

Alright, then the test that is blowing up has to be a case with x and/or y increments not equal one. Perhaps there is more than one problem with LLVM20

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

Alright, then the test that is blowing up has to be a case with x and/or y increments not equal one. Perhaps there is more than one problem with LLVM20

I'll try to cross-check with LLVM 21 once I'm able to use our GH200 nodes again. If that's the case, we unfortunately cannot do much about this.

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

Trying with the LLVM 21.1.6 tar archive from the releases page, I still run into a segmentation fault:

[reuter1@jrc0900 OpenBLAS]$ OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 gdb ./test/sblat1
(gdb) run
Starting program: /p/project1/cswmanage/reuter1/OpenBLAS/test/sblat1

 Real BLAS Test Program Results


 Test of subprogram number  1             SDOT

Program received signal SIGSEGV, Segmentation fault.
0x0000aaaaaaaa7f2c in dot_kernel_asimd (n=4294967296, x=0x10003ffffb244, inc_x=17179869188, y=0x10003ffffb264,
    inc_y=17179869188) at ../kernel/arm64/dot_kernel_asimd.c:268
268             __asm__ __volatile__ (
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-168.el9_6.23.aarch64
(gdb) bt
#0  0x0000aaaaaaaa7f2c in dot_kernel_asimd (n=4294967296, x=0x10003ffffb244, inc_x=17179869188, y=0x10003ffffb264,
    inc_y=17179869188) at ../kernel/arm64/dot_kernel_asimd.c:268
#1  dot_compute (n=<optimized out>, x=<optimized out>, inc_x=<optimized out>, y=<optimized out>, inc_y=<optimized out>)
    at ../kernel/arm64/dot.c:116
#2  sdot_k (n=4294967296, x=0x10003ffffb244, inc_x=17179869188, y=0x10003ffffb264, inc_y=17179869188)
    at ../kernel/arm64/dot.c:145
#3  0x0000aaaaaaaa36d0 in check2_ ()
#4  0x0000aaaaaaaa2c68 in _QQmain ()
#5  0x0000aaaaaaaa4244 in main ()
(gdb)

In this case, I had to disable OpenMP support, since this LLVM build doesn't support OpenMP.

@martin-frbg
Copy link
Collaborator

The n, inc_x and inc_y are wild, as if you had parts built with INTERFACE64=1 and others without

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

The n, inc_x and inc_y are wild, as if you had parts built with INTERFACE64=1 and others without

This was a very good hint. Thanks.
I checked what compiler was detected in F_COMPILER. This resulted in FLANGNEW, even for flang 21.

$ flang --version
flang version 21.1.6 (https://github.com/llvm/llvm-project a832a5222e489298337fbb5876f8dcaf072c5cca)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /p/project1/cswmanage/reuter1/LLVM-21.1.6-Linux-ARM64/bin
$ grep -rn F_COMPILER Makefile.conf
7:F_COMPILER=FLANGNEW

Therefore, we didn't pass -i8 or -fdefault-integer-8 to the Fortran built compilation units. Adding this to Makefile.system actually helped:

diff --git a/Makefile.system b/Makefile.system
index 6241006a8..42b67011a 100644
--- a/Makefile.system
+++ b/Makefile.system
@@ -893,6 +893,9 @@ endif
 ifeq ($(F_COMPILER), FLANG)
 FCOMMON_OPT += -i8
 endif
+ifeq ($(F_COMPILER), FLANGNEW)
+FCOMMON_OPT += -fdefault-integer-8
+endif
 endif
 endif
 endif

This diff is only for aarch64 though. With this, the tests actually passed, at least as far as I can tell.
The flag should at least exist since LLVM 18, it I can trust the help output.

@martin-frbg
Copy link
Collaborator

Eek, that's pretty bad. :-(
But probably separate from the original issue, as I could reproduce that before realizing that it was about INTERFACE64

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

Eek, that's pretty bad. :-( But probably separate from the original issue, as I could reproduce that before realizing that it was about INTERFACE64

I used this PR for all my testing, and this previously failed even in the first part (without INTERFACE64) IIRC. I'm preparing a commit updating the Makefile.system to test if I'm able to successfully do the "full" build we're doing (including tests).

Will report back 😄

I'll do individual tests with all the PRs to check every step...

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

So, a test build using #5540, #5536 and #5534 via patches worked.
I'll do another build which uploads a test report and then move to individual builds of these three PRs for more testing, so that I can verify which PR fixes what exactly.


Test in the context of EasyBuild: easybuilders/easybuild-easyconfigs#24482 (comment)

@mayeut
Copy link
Contributor Author

mayeut commented Nov 18, 2025

The only tests done on my end are the ones added to the CI workflow as I don't have direct access to these kinds of hardware.
I added the workflows first to verify the issue was reproduced in CI then amended the commit to include the dot_kernel_sve.c patch.

@Thyre
Copy link
Contributor

Thyre commented Nov 18, 2025

The only tests done on my end are the ones added to the CI workflow as I don't have direct access to these kinds of hardware. I added the workflows first to verify the issue was reproduced in CI then amended the commit to include the dot_kernel_sve.c patch.

I can confirm that your PR solves the exact issue I've described in the issue. This workflow now works on a GH200:

$ make -j 16 shared  BINARY='64'  CC='clang'  FC='flang'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 -Wl,--undefined-version -Wno-unused-command-line-argument -Wno-error=int-conversion'
$ OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=16   make tests  BINARY='64'  CC='clang'  FC='flang'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'

adding INTERFACE64='1' fails, but is then solved by #5540:

$ make -j 16 shared  BINARY='64' INTERFACE64='1'  CC='clang'  FC='flang'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 -Wl,--undefined-version -Wno-unused-command-line-argument -Wno-error=int-conversion'
$ OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=16   make tests  BINARY='64'  CC='clang'  FC='flang'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1' INTERFACE64='1'
[...]
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1
 Real BLAS Test Program Results


 Test of subprogram number  1             SDOT
make[1]: *** [Makefile:80: level1] Segmentation fault (core dumped)
$ wget https://github.com/OpenMathLib/OpenBLAS/pull/5540.patch 
$ patch -p1 < 5540.patch
patching file Makefile.system
$ make clean
$ make -j 16 shared  BINARY='64' INTERFACE64='1'  CC='clang'  FC='flang'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 -Wl,--undefined-version -Wno-unused-command-line-argument -Wno-error=int-conversion'
$ OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=16   make tests  BINARY='64'  CC='clang'  FC='flang'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1' INTERFACE64='1'
[...]
 cblas_zsyr2k PASSED THE TESTS OF ERROR-EXITS

 cblas_zsyr2k PASSED THE COLUMN-MAJOR COMPUTATIONAL TESTS (  1764 CALLS)
 cblas_zsyr2k PASSED THE ROW-MAJOR    COMPUTATIONAL TESTS (  1764 CALLS)

 END OF TESTS
make[1]: Leaving directory '/p/project1/cswmanage/reuter1/OpenBLAS/ctest'

@martin-frbg
Copy link
Collaborator

I can confirm that this fixes the segfault (or variously "just" NAN results) in the sblat1 SDOT test as well as the (obviously related) failures in the sblat2 tests seen when not compiling with INTERFACE64=1. Thank you very much for this PR.
The missing -fdefault-integer-8 when compiling with a recent LLVM and INTERFACE64=1 is an entirely separate problem - that would basically make every test blow up. Probably the worst aspect of it is that it could not be caught by CI

@martin-frbg martin-frbg added this to the 0.3.31 milestone Nov 18, 2025
@martin-frbg martin-frbg merged commit 75ceb6c into OpenMathLib:develop Nov 18, 2025
97 of 102 checks passed
@mayeut mayeut deleted the clang-sve branch November 19, 2025 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenBLAS tests fail potrf:smoketest_trivial with LLVM 20.1.8 on aarch64

3 participants