-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix: dot_kernel_sve "n" usage & clobber list #5536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Trying your PR on the sames system I reported the issue on initially, all tests now fail with a Segmentation Fault. Environment: Build steps: I'll provide a backtrace later, though it really doesn't provide much information. |
|
@Thyre can you please check if using the A64FX sdot kernel (i.e. copying the SDOTKERNEL line from KERNEL_A64FX to KERNEL.NEOVERSEN2) works for you ? This is basically the same as the default sve one but manually unrolled... |
|
@martin-frbg I've tried the following replacement: diff --git a/kernel/arm64/KERNEL.NEOVERSEN2 b/kernel/arm64/KERNEL.NEOVERSEN2
index 6431422fa..e608bacc6 100644
--- a/kernel/arm64/KERNEL.NEOVERSEN2
+++ b/kernel/arm64/KERNEL.NEOVERSEN2
@@ -97,7 +97,7 @@ CNRM2KERNEL = znrm2.S
ZNRM2KERNEL = znrm2.S
DDOTKERNEL = dot.c
-SDOTKERNEL = dot.c
+SDOTKERNEL = dot_sve_v8.c
CDOTKERNEL = zdot_thunderx2t99.c
ZDOTKERNEL = zdot_thunderx2t99.c
DSDOTKERNEL = dot.SUnfortunately, it still ended in the same error. With an added Thread 4 "sblat1" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40009378f680 (LWP 2432921)]
0x0000aaaaaaacfeb8 in dot_compute ()
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-168.el9_6.23.aarch64
(gdb) bt
#0 0x0000aaaaaaacfeb8 in dot_compute ()
#1 0x0000aaaaaaad0164 in dot_thread_function ()
#2 0x0000aaaaaaace408 in exec_threads ()
#3 0x0000aaaaaaace644 in exec_blas.omp_outlined ()
#4 0x00004000004156cc in __kmp_invoke_microtask ()
from /p/project1/cswmanage/reuter1/EasyBuild/jedi/apps/software/LLVM/20.1.8-GCCcore-14.3.0/lib/aarch64-unknown-linux-gnu/libomp.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?) |
|
This is the backtrace for a build with Real BLAS Test Program Results
Test of subprogram number 1 SDOT
Program received signal SIGSEGV, Segmentation fault.
0x0000aaaaaaad298c in dot_kernel_asimd (n=4294967296, x=0xffffffff3528, inc_x=4294967297, y=0xffffffff3544, inc_y=4294967297)
at ../kernel/arm64/dot_kernel_asimd.c:268
268 __asm__ __volatile__ (
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-168.el9_6.23.aarch64
(gdb) bt
#0 0x0000aaaaaaad298c in dot_kernel_asimd (n=4294967296, x=0xffffffff3528, inc_x=4294967297, y=0xffffffff3544, inc_y=4294967297) at ../kernel/arm64/dot_kernel_asimd.c:268
#1 0x0000aaaaaaad1bf4 in dot_compute (n=4294967296, x=0xffffffff3528, inc_x=4294967297, y=0xffffffff3544, inc_y=4294967297)
at ../kernel/arm64/dot.c:116
#2 0x0000aaaaaaad1a28 in sdot_k (n=4294967296, x=0xffffffff3528, inc_x=4294967297, y=0xffffffff3544, inc_y=4294967297)
at ../kernel/arm64/dot.c:145
#3 0x0000aaaaaaacc890 in sdot_ (N=0xaaaaaab7bea4 <combla_+4>, x=0xffffffff3528, INCX=0xaaaaaab7bea8 <combla_+8>, y=0xffffffff3544, INCY=0xaaaaaab7beac <combla_+12>) at dot.c:65
#4 0x0000aaaaaaaca33c in check2_ ()
#5 0x0000aaaaaaac959c in _QQmain ()
#6 0x0000aaaaaaacbf24 in main () |
|
dot_kernel_asimd would suggest that you aren't using the SVE code path at all, what is this GH system autodetected as ? (Should be visible in the name of the library, or in the CORE entry in Makefile.conf) |
It is detected as Neoverse V2, so that should be correct: |
|
Alright, then the test that is blowing up has to be a case with x and/or y increments not equal one. Perhaps there is more than one problem with LLVM20 |
I'll try to cross-check with LLVM 21 once I'm able to use our GH200 nodes again. If that's the case, we unfortunately cannot do much about this. |
|
Trying with the LLVM 21.1.6 tar archive from the releases page, I still run into a segmentation fault: [reuter1@jrc0900 OpenBLAS]$ OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 gdb ./test/sblat1
(gdb) run
Starting program: /p/project1/cswmanage/reuter1/OpenBLAS/test/sblat1
Real BLAS Test Program Results
Test of subprogram number 1 SDOT
Program received signal SIGSEGV, Segmentation fault.
0x0000aaaaaaaa7f2c in dot_kernel_asimd (n=4294967296, x=0x10003ffffb244, inc_x=17179869188, y=0x10003ffffb264,
inc_y=17179869188) at ../kernel/arm64/dot_kernel_asimd.c:268
268 __asm__ __volatile__ (
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-168.el9_6.23.aarch64
(gdb) bt
#0 0x0000aaaaaaaa7f2c in dot_kernel_asimd (n=4294967296, x=0x10003ffffb244, inc_x=17179869188, y=0x10003ffffb264,
inc_y=17179869188) at ../kernel/arm64/dot_kernel_asimd.c:268
#1 dot_compute (n=<optimized out>, x=<optimized out>, inc_x=<optimized out>, y=<optimized out>, inc_y=<optimized out>)
at ../kernel/arm64/dot.c:116
#2 sdot_k (n=4294967296, x=0x10003ffffb244, inc_x=17179869188, y=0x10003ffffb264, inc_y=17179869188)
at ../kernel/arm64/dot.c:145
#3 0x0000aaaaaaaa36d0 in check2_ ()
#4 0x0000aaaaaaaa2c68 in _QQmain ()
#5 0x0000aaaaaaaa4244 in main ()
(gdb)In this case, I had to disable OpenMP support, since this LLVM build doesn't support OpenMP. |
|
The n, inc_x and inc_y are wild, as if you had parts built with INTERFACE64=1 and others without |
This was a very good hint. Thanks. $ flang --version
flang version 21.1.6 (https://github.com/llvm/llvm-project a832a5222e489298337fbb5876f8dcaf072c5cca)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /p/project1/cswmanage/reuter1/LLVM-21.1.6-Linux-ARM64/bin
$ grep -rn F_COMPILER Makefile.conf
7:F_COMPILER=FLANGNEWTherefore, we didn't pass diff --git a/Makefile.system b/Makefile.system
index 6241006a8..42b67011a 100644
--- a/Makefile.system
+++ b/Makefile.system
@@ -893,6 +893,9 @@ endif
ifeq ($(F_COMPILER), FLANG)
FCOMMON_OPT += -i8
endif
+ifeq ($(F_COMPILER), FLANGNEW)
+FCOMMON_OPT += -fdefault-integer-8
+endif
endif
endif
endifThis diff is only for |
|
Eek, that's pretty bad. :-( |
I used this PR for all my testing, and this previously failed even in the first part (without Will report back 😄 I'll do individual tests with all the PRs to check every step... |
|
So, a test build using #5540, #5536 and #5534 via patches worked. Test in the context of EasyBuild: easybuilders/easybuild-easyconfigs#24482 (comment) |
|
The only tests done on my end are the ones added to the CI workflow as I don't have direct access to these kinds of hardware. |
I can confirm that your PR solves the exact issue I've described in the issue. This workflow now works on a GH200: $ make -j 16 shared BINARY='64' CC='clang' FC='flang' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' CFLAGS='-O2 -Wl,--undefined-version -Wno-unused-command-line-argument -Wno-error=int-conversion'
$ OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=16 make tests BINARY='64' CC='clang' FC='flang' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1'adding $ make -j 16 shared BINARY='64' INTERFACE64='1' CC='clang' FC='flang' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' CFLAGS='-O2 -Wl,--undefined-version -Wno-unused-command-line-argument -Wno-error=int-conversion'
$ OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=16 make tests BINARY='64' CC='clang' FC='flang' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' INTERFACE64='1'
[...]
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1
Real BLAS Test Program Results
Test of subprogram number 1 SDOT
make[1]: *** [Makefile:80: level1] Segmentation fault (core dumped)
$ wget https://github.com/OpenMathLib/OpenBLAS/pull/5540.patch
$ patch -p1 < 5540.patch
patching file Makefile.system
$ make clean
$ make -j 16 shared BINARY='64' INTERFACE64='1' CC='clang' FC='flang' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' CFLAGS='-O2 -Wl,--undefined-version -Wno-unused-command-line-argument -Wno-error=int-conversion'
$ OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=16 make tests BINARY='64' CC='clang' FC='flang' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' INTERFACE64='1'
[...]
cblas_zsyr2k PASSED THE TESTS OF ERROR-EXITS
cblas_zsyr2k PASSED THE COLUMN-MAJOR COMPUTATIONAL TESTS ( 1764 CALLS)
cblas_zsyr2k PASSED THE ROW-MAJOR COMPUTATIONAL TESTS ( 1764 CALLS)
END OF TESTS
make[1]: Leaving directory '/p/project1/cswmanage/reuter1/OpenBLAS/ctest' |
|
I can confirm that this fixes the segfault (or variously "just" NAN results) in the sblat1 SDOT test as well as the (obviously related) failures in the sblat2 tests seen when not compiling with INTERFACE64=1. Thank you very much for this PR. |
fix #5532
The
nargument was assumed to be inx0which is not always true.Use
%[N_]instead ofx0in the assembly code.The clobber list listed
x0->x7but those are not used in the assembly code, remove them from the clobber list.v0,z2&z3were missing in the clobber list.d1relabelled asv1(v0&v1are used through aliasesd0/d1,s0/s1)CI tests added for Ubuntu 24.04 runner default clang (18.1.3) as well as clang-21 on both x86_64 & aarch64