Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlexiBLAS cause core dump (simple test example) #16387

Closed
schiotz opened this issue Oct 11, 2022 · 31 comments
Closed

FlexiBLAS cause core dump (simple test example) #16387

schiotz opened this issue Oct 11, 2022 · 31 comments
Milestone

Comments

@schiotz
Copy link
Contributor

schiotz commented Oct 11, 2022

Hi EasyBuilders,

We have problems with a core-dump inside our GPAW code, from within FlexiBLAS:

[i019:08729] [ 3] /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib/libflexiblas.so.3(flexiblas_real_cblas_zdotu_sub+0x66)[0x2ba851906406]

We can reproduce the bug with this four-line code snippet (pure numpy code) on most (but not all) our machines:

import numpy as np
U = np.ones((28, 28), complex)
u = np.ones((28, 2, 27, 27, 70), complex)
a = np.dot(U, np.swapaxes(u, 0, 3))

We see the problem with SciPy-bundle/2021.10-foss-2021b and SciPy-bundle/2022.05-foss-2022a , but not with the corresponding intel-toolchain packages. Nor do we see the problem with SciPy-bundle/2020.11-foss-2020b which does not use FlexiBlas (I think...).

CC: @jjmortensen

@boegel boegel added this to the 4.x milestone Oct 12, 2022
@boegel
Copy link
Member

boegel commented Oct 12, 2022

@grisuthedragon Any thoughts on this?

@boegel
Copy link
Member

boegel commented Oct 12, 2022

I can reproduce this too, looks like it's a segfault problem.

GDB backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x00001555512cc16a in zdot_compute () from /apps/gent/RHEL8/haswell-ib/software/OpenBLAS/0.3.18-GCC-11.2.0/lib/libopenblas.so.0
(gdb) bt
#0  0x00001555512cc16a in zdot_compute () from /apps/gent/RHEL8/haswell-ib/software/OpenBLAS/0.3.18-GCC-11.2.0/lib/libopenblas.so.0
#1  0x00001555512cc28e in zdotu_k () from /apps/gent/RHEL8/haswell-ib/software/OpenBLAS/0.3.18-GCC-11.2.0/lib/libopenblas.so.0
#2  0x0000155552df5e06 in flexiblas_real_cblas_zdotu_sub () from /apps/gent/RHEL8/haswell-ib/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib/libflexiblas.so.3
#3  0x0000155553116340 in CDOUBLE_dot (ip1=0x90e880 "", is1=16, ip2=0x155545434300 "", is2=1632960, op=0x1555427ea760 "", n=28, __NPY_UNUSED_TAGGEDignore=0x0) at build/src.linux-x86_64-3.9/numpy/core/src/multiarray/arraytypes.c:3628
#4  0x00001555531f2928 in PyArray_MatrixProduct2 (op1=<optimized out>, op2=<optimized out>, out=<optimized out>) at numpy/core/src/npymath/funcs.inc.src:1083
#5  0x00001555531f3003 in array_matrixproduct (__NPY_UNUSED_TAGGEDdummy=<optimized out>, args=<optimized out>, kwds=<optimized out>) at numpy/core/src/npymath/funcs.inc.src:2438
#6  0x00001555550a177e in cfunction_call (func=0x15555351b360, args=<optimized out>, kwargs=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/descrobject.c:539
#7  0x00001555550a0cbf in _PyObject_Call (tstate=0x407d20, callable=0x15555351b360, args=0x155553595e80, kwargs=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/genobject.c:281
#8  0x0000155553134298 in array_implement_array_function (__NPY_UNUSED_TAGGEDdummy=<optimized out>, positional_args=<optimized out>) at numpy/core/src/multiarray/arrayfunction_override.c:367
#9  0x00001555550a17a0 in cfunction_call (func=0x15555351bbd0, args=<optimized out>, kwargs=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/descrobject.c:548
#10 0x0000155555094932 in _PyObject_MakeTpCall (tstate=0x407d20, callable=0x15555351bbd0, args=<optimized out>, nargs=5, keywords=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/genobject.c:191
#11 0x000015555508d8ae in _PyObject_VectorcallTstate (kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, nargsf=9223372036854775813, args=0x155547e90ad8, callable=0x15555351bbd0,
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at ./Python/pycore_pyerrors.h:116
#12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775813, args=0x155547e90ad8, callable=0x15555351bbd0, tstate=<optimized out>) at ./Python/pycore_pyerrors.h:103
#13 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775813, args=0x155547e90ad8, callable=0x15555351bbd0) at ./Python/pycore_pyerrors.h:127
#14 call_function (kwnames=0x0, kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>,
    pp_stack@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x407d20, tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/ceval_gil.h:5072
#15 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x155547e90950, throwflag=<optimized out>) at Objects/ceval_gil.h:3518
#16 0x000015555508b8d9 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>,
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/codeobject.c:40
#17 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x479108, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0,
    name=0x1555486030f0, qualname=0x155548603170) at Objects/ceval_gil.h:4327
#18 0x0000155555098566 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/genobject.c:396
#19 0x000015555508d79d in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x4790f8, callable=0x155548602160, tstate=0x407d20) at ./Python/pycore_pyerrors.h:118
#20 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x4790f8, callable=<optimized out>) at ./Python/pycore_pyerrors.h:127
#21 call_function (kwnames=0x0, kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>,
    pp_stack@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x407d20, tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/ceval_gil.h:5072
#22 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x478f80, throwflag=<optimized out>) at Objects/ceval_gil.h:3487
#23 0x000015555508b8d9 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>,
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/codeobject.c:40
#24 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0,
    qualname=0x0) at Objects/ceval_gil.h:4327
#25 0x00001555550fe871 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=0x155553720a00, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0,
    closure=0x0, name=0x0, qualname=0x0) at Objects/ceval_gil.h:4359
#26 0x00001555550fe819 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Objects/ceval_gil.h:4375
#27 0x00001555550fe7db in PyEval_EvalCode (co=co@entry=0x15555371e9d0, globals=globals@entry=0x155553720a00, locals=locals@entry=0x155553720a00) at Objects/ceval_gil.h:826
#28 0x000015555510d7d4 in run_eval_code_obj (tstate=0x407d20, co=0x15555371e9d0, globals=0x155553720a00, locals=0x155553720a00) at Modules/find.h:1219
#29 0x00001555551097c6 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x155553720a00, locals=0x155553720a00, flags=<optimized out>, arena=<optimized out>) at Modules/find.h:1240
#30 0x0000155555015d40 in pyrun_file (fp=fp@entry=0x403340, filename=filename@entry=0x1555537fced0, start=start@entry=257, globals=globals@entry=0x155553720a00, locals=locals@entry=0x155553720a00, closeit=closeit@entry=1, flags=0x7ffffffef498) at Modules/find.h:1138
#31 0x0000155555015005 in pyrun_simple_file (flags=0x7ffffffef498, closeit=1, filename=0x1555537fced0, fp=0x403340) at Modules/find.h:449
#32 PyRun_SimpleFileExFlags (fp=fp@entry=0x403340, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7ffffffef498) at Modules/find.h:482
#33 0x000015555501797e in PyRun_AnyFileExFlags (fp=fp@entry=0x403340, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7ffffffef498) at Modules/find.h:91
#34 0x000015555511ec65 in pymain_run_file (cf=0x7ffffffef498, config=0x409010) at Objects/fileutils.c:373
#35 pymain_run_python (exitcode=0x7ffffffef490) at Objects/fileutils.c:598
#36 Py_RunMain () at Objects/fileutils.c:677
#37 0x00001555550f2009 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Objects/fileutils.c:731
#38 0x0000155553873493 in __libc_start_main () from /lib64/libc.so.6
#39 0x00000000004006ce in _start ()

@boegel
Copy link
Member

boegel commented Oct 12, 2022

@schiotz It looks like this could be a problem in recent versions of OpenBLAS, and that FlexiBLAS has nothing to do with it (but I'm not sure)

@schiotz
Copy link
Contributor Author

schiotz commented Oct 12, 2022

@boegel Do you have any suggestions for workarounds or how to fix it? It is a showstopper for us, basically locking us on the foss/2020b toolchain. We see this core dump all over our code.

@boegel
Copy link
Member

boegel commented Oct 12, 2022

Maybe you can try installing OpenBLAS-0.3.20-GCC-11.2.0.eb, and then swapping to the OpenBLAS module after loading SciPy-bundle, and see if the problem persists? It could be a bug that is fixed in OpenBLAS already...

If that doesn't help, it gets more interesting, we would need to hunt down the cause of the problem, and probably come up with a patch to fix it.

We should also see if the problem only happens if FlexiBLAS is used (by tweaking foss/2021b to include OpenBLAS directly).

@bartoldeman
Copy link
Contributor

zdotu is one of the 4 functions affected by different Fortran calling conventions depending on the compiler, because of the complex return type (together with cdotc, zdotc, and cdotu). If the wrong one is used you get a segmentation fault.
I can't reproduce this myself yet but will dig a bit, it'll probably give us a hint what to look for.

@bartoldeman
Copy link
Contributor

@boegel Can you compile OpenBLAS (same version as above) with debug info and then run your reproducer with
export FLEXIBLAS=/path/to/libopenblas.so
That will shine some more light on the issue in the backtrace.

@schiotz
Copy link
Contributor Author

schiotz commented Oct 12, 2022

Maybe you can try installing OpenBLAS-0.3.20-GCC-11.2.0.eb, and then swapping

@boegel I see the problem also with foss/2022a, which uses OpenBLAS/0.3.20-GCC-11.3.0, so it looks like it is present in the 0.3.20 version of OpenBLAS.

However, I tried to use IMKL as a backend (I am not sure I know what I am doing) and it crash in the same way. This could indicate that it is a FlexiBLAS issue, perhaps the calling convention issue that @bartoldeman is referring to.

export FLEXIBLAS=/home/modules/software/imkl/2022.1.0/mkl/latest/lib/intel64/libmkl_rt.so

(gdb) where
#0  0x00007ff662b30d2f in zdotu_ ()
   from /home/modules/software/imkl/2022.1.0/mkl/latest/lib/intel64/libmkl_intel_lp64.so.2
#1  0x00007ff66e39b316 in flexiblas_real_cblas_zdotu_sub ()
   from /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib/libflexiblas.so.3
#2  0x00007ff66e6c4cf4 in CDOUBLE_dot ()
   from /home/modules/software/SciPy-bundle/2022.05-foss-2022a/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so

@bartoldeman
Copy link
Contributor

@schiotz you should not use libmkl_rt.so as backend, that's for sure (long complicated story).
The way to use MKL as backend is to use export FLEXIBLAS=imkl, this will look in the configuration file $EBROOTFLEXIBLAS/etc/flexiblasrc.d/imkl.conf and then in the directory $FLEXIBLAS_LIBRARY_PATH (set by the imkl module).

Can you also try with BLIS?

module load BLIS/0.9.0-GCC-11.3.0
export FLEXIBLAS=blis
export FLEXIBLAS_VERBOSE=1

(last one just to confirm it's using BLIS)

@schiotz
Copy link
Contributor Author

schiotz commented Oct 12, 2022

I can confirm it does not crash with IMKL or BLIS. Here is the output when running with BLIS:

15:41 [sylg] numpy-bug$ export FLEXIBLAS=blis
15:41 [sylg] numpy-bug$ export FLEXIBLAS_VERBOSE=1
15:41 [sylg] numpy-bug$ python bug.py 
<flexiblas> Load system config /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//BLIS.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//NETLIB.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//OpenBLAS.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//imkl.conf
<flexiblas> Config /home/niflheim/schiotz/.flexiblasrc does not exist.
<flexiblas> Config /home/niflheim/schiotz/.flexiblasrc.sylg.fysik.dtu.dk does not exist.
<flexiblas> Environment supplied config ((null)) does not exist.
<flexiblas> libflexiblas.so is /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib/libflexiblas.so.3
<flexiblas> Hook "DUMMY/DUMMY" found in /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_hook_dummy.so
<flexiblas> Hook "Profile/PROFILE" found in /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_hook_profile.so
<flexiblas>
<flexiblas> FlexiBLAS, version 3.2.0
<flexiblas> Copyright (C) 2013-2021 Martin Koehler and others.
<flexiblas> This is free software; see the source code for copying conditions.
<flexiblas> There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
<flexiblas> FITNESS FOR A PARTICULAR PURPOSE.
<flexiblas> 
<flexiblas> Check if shared library exist: /home/modules/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64/flexiblas/libflexiblas_netlib.so
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_netlib.so
<flexiblas> Check if shared library exist: /home/modules/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64/flexiblas/libflexiblas_fallback_lapack.so
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_fallback_lapack.so
<flexiblas> Trying to use the content of FLEXIBLAS: "blis" as shared library.
<flexiblas> Check if shared library exist: /home/modules/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64/flexiblas/blis
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//blis
<flexiblas> "BLIS" does not seem to a shared library. Search inside the FlexiBLAS configuration..
<flexiblas> Trying to load  libflexiblas_blis.so
<flexiblas> Check if shared library exist: /home/modules/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64/flexiblas/libflexiblas_blis.so
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_blis.so
<flexiblas> Set thread number function found ( func_name = bli_thread_set_num_threads ) at 0x7f4035698320
<flexiblas> Set thread number function found ( func_name = bli_thread_set_num_threads_ ) at 0x7f4035650500
<flexiblas> Get thread number function ( func_name = bli_thread_get_num_threads )  at 0x7f4035698290
<flexiblas> Available XERBLA ( backend: 0x7f4035696160, user defined: 0x7f4036598ef0, FlexiBLAS: 0x7f4036598ef0 )
<flexiblas> Use XERBLA of the BLAS backend.
<flexiblas> Available CBLAS_XERBLA ( backend: 0x7f403565aa20, user defined: 0x7f40365baea0, FlexiBLAS: 0x7f40365baea0 )
<flexiblas> Use XERBLA of the BLAS backend.
<flexiblas> The desired BLAS library is BLIS. We do not load their CBLAS wrapper since it might alter the behavior of your programs.<flexiblas> BLAS info:
<flexiblas>  - intel_interface        = 0
<flexiblas>  - flexiblas_integer_size = 4
<flexiblas>  - backend_integer_size   = 4
<flexiblas>  - post_init              = 0
<flexiblas> cleanup
15:41 [sylg] numpy-bug$ 

@bartoldeman
Copy link
Contributor

@schiotz can you run the original (crashing) testcase with FLEXIBLAS_VERBOSE=1 as well?

@schiotz
Copy link
Contributor Author

schiotz commented Oct 12, 2022

16:37 [sylg] numpy-bug$ export FLEXIBLAS_VERBOSE=1
16:37 [sylg] numpy-bug$ python bug.py 
<flexiblas> Load system config /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//BLIS.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//NETLIB.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//OpenBLAS.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//imkl.conf
<flexiblas> Config /home/niflheim/schiotz/.flexiblasrc does not exist.
<flexiblas> Config /home/niflheim/schiotz/.flexiblasrc.sylg.fysik.dtu.dk does not exist.
<flexiblas> Environment supplied config ((null)) does not exist.
<flexiblas> libflexiblas.so is /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib/libflexiblas.so.3
<flexiblas> Hook "DUMMY/DUMMY" found in /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_hook_dummy.so
<flexiblas> Hook "Profile/PROFILE" found in /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_hook_profile.so
<flexiblas>
<flexiblas> FlexiBLAS, version 3.2.0
<flexiblas> Copyright (C) 2013-2021 Martin Koehler and others.
<flexiblas> This is free software; see the source code for copying conditions.
<flexiblas> There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
<flexiblas> FITNESS FOR A PARTICULAR PURPOSE.
<flexiblas> 
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_netlib.so
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_fallback_lapack.so
<flexiblas> Use default BLAS: OPENBLAS - libflexiblas_openblas.so from System Directory
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_openblas.so
<flexiblas> Set thread number function found ( func_name = openblas_set_num_threads ) at 0x7f6cc6d05160
<flexiblas> Set thread number function found ( func_name = openblas_set_num_threads_ ) at 0x7f6cc6d04a60
<flexiblas> Get thread number function ( func_name = openblas_get_num_threads )  at 0x7f6cc6d043c0
<flexiblas> Get thread number function ( func_name = openblas_get_num_threads_ )  at 0x7f6cc6d04a70
<flexiblas> Available XERBLA ( backend: 0x7f6cc6d04990, user defined: 0x7f6cc86c8ef0, FlexiBLAS: 0x7f6cc86c8ef0 )
<flexiblas> Use XERBLA of the BLAS backend.
<flexiblas> Available CBLAS_XERBLA ( backend: 0x7f6cc6b19470, user defined: 0x7f6cc86eaea0, FlexiBLAS: 0x7f6cc86eaea0 )
<flexiblas> Use XERBLA of the BLAS backend.
<flexiblas> BLAS info:
<flexiblas>  - intel_interface        = 0
<flexiblas>  - flexiblas_integer_size = 4
<flexiblas>  - backend_integer_size   = 4
<flexiblas>  - post_init              = 0
Segmentation fault (core dumped)
16:38 [sylg] numpy-bug$ 

@boegel
Copy link
Member

boegel commented Oct 14, 2022

@boegel Can you compile OpenBLAS (same version as above) with debug info and then run your reproducer with export FLEXIBLAS=/path/to/libopenblas.so That will shine some more light on the issue in the backtrace.

Here's (the relevant part of) the GDB backtrace with OpenBLAS/0.3.20-GCC-11.2.0 built with debug symbols (enabled via debug toolchain option):

Program received signal SIGSEGV, Segmentation fault.
0x00001555512cbd6a in zdot_compute (n=n@entry=28, x=<optimized out>, inc_x=2, inc_x@entry=1, y=0x155545438300, inc_y=<optimized out>, result=result@entry=0x7ffffffe9d50)
    at /tmp/vsc40023/easybuild_build/OpenBLAS/0.3.20/GCC-11.2.0/OpenBLAS-0.3.20/kernel/zdot_microk_haswell-2.c:148
148     /tmp/vsc40023/easybuild_build/OpenBLAS/0.3.20/GCC-11.2.0/OpenBLAS-0.3.20/kernel/zdot_microk_haswell-2.c: No such file or directory.
(gdb) bt
#0  0x00001555512cbd6a in zdot_compute (n=n@entry=28, x=<optimized out>, inc_x=2, inc_x@entry=1, y=0x155545438300, inc_y=<optimized out>, result=result@entry=0x7ffffffe9d50)
    at /tmp/vsc40023/easybuild_build/OpenBLAS/0.3.20/GCC-11.2.0/OpenBLAS-0.3.20/kernel/zdot_microk_haswell-2.c:148
#1  0x00001555512cbe8e in zdotu_k (n=28, x=<optimized out>, inc_x=1, y=<optimized out>, inc_y=<optimized out>)
    at /tmp/vsc40023/easybuild_build/OpenBLAS/0.3.20/GCC-11.2.0/OpenBLAS-0.3.20/kernel/zdot_microk_haswell-2.c:204
#2  0x0000155552df5e06 in flexiblas_real_cblas_zdotu_sub () from /user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/haswell-ib/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib/libflexiblas.so.3
#3  0x0000155553116340 in CDOUBLE_dot (ip1=0x86b5e0 "", is1=16, ip2=0x155545438300 "", is2=1632960, op=0x1555427ee760 "", n=28, __NPY_UNUSED_TAGGEDignore=0x0)
    at build/src.linux-x86_64-3.9/numpy/core/src/multiarray/arraytypes.c:3628
#4  0x00001555531f2928 in PyArray_MatrixProduct2 (op1=<optimized out>, op2=<optimized out>, out=<optimized out>) at numpy/core/src/npymath/funcs.inc.src:1083
#5  0x00001555531f3003 in array_matrixproduct (__NPY_UNUSED_TAGGEDdummy=<optimized out>, args=<optimized out>, kwds=<optimized out>) at numpy/core/src/npymath/funcs.inc.src:2438

@schiotz
Copy link
Contributor Author

schiotz commented Oct 14, 2022

@bartoldeman wrote:

zdotu is one of the 4 functions affected by different Fortran calling conventions depending on the compiler, because of the complex return type (together with cdotc, zdotc, and cdotu). If the wrong one is used you get a segmentation fault.

It certainly only occurs with complex arrays, but it is also important that the axes of the second array are swapped, so it must somehow be related to the array being non-contiguous.

@schiotz
Copy link
Contributor Author

schiotz commented Oct 27, 2022

Is there anything I can do to help making progress on this? Can I test something to see if it is due to how easybuild builds it or a bug in OpenBLAS? If the latter, I guess it should be reported upstream.

@bartoldeman
Copy link
Contributor

I'm working on reproducing this, I suspect it's indeed something upstream in the assembly language kernel of zdot, but will isolate a bit further.

@bartoldeman
Copy link
Contributor

@boegel I still can't reproduce this, perhaps it's fixed with the patch to GCC?
@schiotz have you recompiled GCCcore with the new patch (included in the new easybuild 4.6.2)?

@schiotz
Copy link
Contributor Author

schiotz commented Oct 28, 2022

I'll try this. I assume I have to rebuild GCCcore, then OpenBLAS and try again. I'll have to be careful to use the modules I built myself and not the ones on the system, but I can use FLEXIBLAS_VERBOSE to see which library it pick up. The tricky thing may be to check that EasyBuild uses the right GCCcore module itself, but I should be able to see that from the full paths shown by ps while compiling.

I'll report back once the builds are finished.

@schiotz
Copy link
Contributor Author

schiotz commented Oct 28, 2022

@bartoldeman Unfortunately, recompiling GCCcore and then both OpenBLAS and FlexiBLAS did not change anything.

Regarding reproducibility: We have four different login nodes on our cluster, with four slightly different architectures. I only see the bug on three of them.

Edit Affected:

  • Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
  • Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  • Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz

Not affected:

  • Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

@bartoldeman
Copy link
Contributor

I've been able to reproduce it now, so I can debug the issue.

@bartoldeman
Copy link
Contributor

bartoldeman commented Oct 28, 2022

Checking the assembly language there's another compiler vectorization bug :(, where the loop in kernel/x86_64/zdot.c

                while(i < n)
                {

                        dot[0] += x[ix]   * y[iy]   ;
                        dot[1] += x[ix+1] * y[iy+1] ;
                        dot[2] += x[ix]   * y[iy+1] ;
                        dot[3] += x[ix+1] * y[iy]   ;

                        ix  += inc_x ;
                        iy  += inc_y ;
                        i++ ;

                }

gets compiled (if I understand it well!) as the equivalent of

                while(i < n)
                {

                        dot[0] += x[ix]   * y[iy]   ;
                        dot[1] += x[ix+inc_x] * y[iy+inc_y] ;
                        dot[2] += x[ix]   * y[iy+inc_y] ;
                        dot[3] += x[ix+inc_x] * y[iy]   ;

                        ix  += inc_x ;
                        iy  += inc_y ;
                        i++ ;

                }

(EDIT:NO, this isn't the case, it does load y[iy+inc_y] but discards it after!)

for the final loop iteration y[iy+inc_y] , where inc_y is a fairly large value (27*27*70*2*2=204120) may not be readable, and so causes the segfault. Sometimes it is readable, and won't crash, but will still produce the wrong result!

I'm checking if the fix to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 fixes this.
OpenBLAS put in a workaround for GCC 12 already on Windows and Mac OS X only, but since we compile with -ftree-vectorize, it's also in GCC 11.

OpenMathLib/OpenBLAS#3740

@boegel boegel modified the milestones: 4.x, next release (4.6.3?) Oct 28, 2022
@bartoldeman
Copy link
Contributor

Adding
toolchainopts = {'vectorize': False}
to the OpenBLAS easyconfigs for GCC11+ should fix this for now...

Still bad to have another GCC issue as it could affect other code. The fix above didn't solve the issue. Will try with a snapshot, then see if it's another GCC issue with a small test case if that fails too.

@bartoldeman
Copy link
Contributor

GCC bug report here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451
In the end the bug cannot produce wrong results, but can produce a segmentation fault if you are unlucky and y[inc_y*n] isn't accessible.

@schiotz
Copy link
Contributor Author

schiotz commented Oct 28, 2022

Thank you very much, @bartoldeman for looking into this. Is it a workaround to use toolchainopts = {'vectorize': False} (which sounds like it would hurt performance), or is there something else that can be done a part from giving up on foss/2022a (we have the crash twenty-something times in the extended test suite for GPAW)?

@bartoldeman
Copy link
Contributor

@schiotz please see #16510 for a workaround that is less of a hammer than turning off vectorization everywhere.

@schiotz
Copy link
Contributor Author

schiotz commented Nov 1, 2022

Thank you very much indeed, @bartoldeman

I can confirm that this appears to fix our problems, at least for the test case. We are now rebuilding on all platforms, and testing our code. I would expect it to work now.

@boegel
Copy link
Member

boegel commented Nov 9, 2022

@schiotz Can this issue be closed?

@schiotz
Copy link
Contributor Author

schiotz commented Nov 9, 2022

I am not 100% sure, we seem to still have some issues and are trying to figure out if it is related to this problem, or something else.

@schiotz
Copy link
Contributor Author

schiotz commented Nov 9, 2022

After rebuilding GCCcore and OpenBLAS, we are seeing lots of segfaults in OpenMPI that we did not see before. I cannot in any way imagine how fixing a vectorization bug could cause that, but perhaps something else changed as well. I'll continue to investigate.

@schiotz
Copy link
Contributor Author

schiotz commented Nov 9, 2022

@bartoldeman @boegel
Apparently after recompiling GCCcore, OpenBLAS and Flexiblas due to this issue and #16510 our GPAW jobs would almost always crash in OpenMPI. Recompiling OpenMPI fixes it. It makes no sense that fixing a vectorization bug could have that effect, but perhaps something else had also been changing in GCCcore leading to some kind of incompatibility between code compiled with the old and new version. Does that even make sense? Should we just recompile everything made with the 2022a toolchains?

@bartoldeman
Copy link
Contributor

It is a bit strange, since OpenBLAS/FlexiBLAS do not interact with Open MPI. It's possible something in the Open MPI easyconfig was changed between compilations as well. In any case, I wouldn't worry about it.
Recompiling everything would be prudent, some others have done it too.

I'll close this one though, as the segfault is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants