Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glibc segfaulting on P10 with OpenBLAS #1780

Closed
tuliom opened this issue Oct 28, 2020 · 9 comments
Closed

glibc segfaulting on P10 with OpenBLAS #1780

tuliom opened this issue Oct 28, 2020 · 9 comments
Assignees
Labels

Comments

@tuliom
Copy link
Contributor

tuliom commented Oct 28, 2020

Report from @RajalakshmiS :

Build and install OpenBlas.

Then

$ cat blas.c
 #include <stdio.h>
 int main()
 {
   printf("stay safe");
   return 0;
 }
$ /opt/at14.0/bin/gcc -o blas blas.c -I./install/include -L./install/lib -lopenblas
$ ./blas
Segmentation fault (core dumped)
@tuliom
Copy link
Contributor Author

tuliom commented Oct 28, 2020

According to the core dump:

Stack trace of thread 3511211:
    #0  0x00007fff9963ae14 __GI___libc_malloc (libc.so.6)
    #1  0x00007fff9963d3dc __libc_calloc (libc.so.6)
    #2  0x00007fff995fda54 __register_printf_type (libc.so.6)
    #3  0x00007fff98dc3b98 register_printf_flt128 (libquadmath.so.0)
    #4  0x00007fff9a5f24fc call_init (ld64.so.2)
    #5  0x00007fff9a5d144c _dl_start_user (ld64.so.2)

@tuliom tuliom added the bug label Oct 28, 2020
@RajalakshmiSR
Copy link

Steps to recreate on P10 :
git clone https://github.com/xianyi/openblas.git
cd openblas
export PATH=//opt/at14.0/bin/:$PATH
make
make PREFIX=/path/to/your/installation install

and then link the test..

@mscastanho mscastanho self-assigned this Oct 28, 2020
@mscastanho
Copy link
Contributor

mscastanho commented Oct 28, 2020

The code is segfaulting when loading libquadmath, an indirect dependency from libgfortran:

$ LD_LIBRARY_PATH=/home/mscastanho/build/openblas/lib /opt/at14.0/bin/ldd ./blas
        linux-vdso64.so.1 (0x00007fffa7330000)                                             
        libopenblas.so.0 => /home/mscastanho/build/openblas/lib/libopenblas.so.0 (0x00007fffa6580000)
        libc.so.6 => /opt/at14.0/lib64/power10/libc.so.6 (0x00007fffa6360000)        
        libm.so.6 => /opt/at14.0/lib64/power10/libm.so.6 (0x00007fffa6240000)                                                                                                       
        libpthread.so.0 => /opt/at14.0/lib64/power10/libpthread.so.0 (0x00007fffa61f0000)
        libgfortran.so.5 => /opt/at14.0/lib64/power10/libgfortran.so.5 (0x00007fffa5c30000)
        libgomp.so.1 => /opt/at14.0/lib64/power10/libgomp.so.1 (0x00007fffa5bb0000)
        /opt/at14.0/lib64/ld64.so.2 (0x00007fffa7350000)
        libquadmath.so.0 => /opt/at14.0/lib64/power10/libquadmath.so.0 (0x00007fffa5b40000)
        libgcc_s.so.1 => /opt/at14.0/lib64/power10/libgcc_s.so.1 (0x00007fffa5b00000)
        libdl.so.2 => /opt/at14.0/lib64/libdl.so.2 (0x00007fffa5ad0000)

$ LD_LIBRARY_PATH=/home/mscastanho/build/openblas/lib LD_DEBUG=libs ./blas
[...]
   2285610:     find library=libgfortran.so.5 [0]; searching                               
   2285610:      search path=/home/mscastanho/build/openblas/lib                (LD_LIBRARY_PATH)
   2285610:       trying file=/home/mscastanho/build/openblas/lib/libgfortran.so.5   
   2285610:      search path=/opt/at14.0/lib64/power10:/opt/at14.0/lib64/altivec/dfp:/opt/at14.0/lib64/altivec:/opt/at14.0/lib64/dfp:/opt/at14.0/lib64          (system search path)
   2285610:       trying file=/opt/at14.0/lib64/power10/libgfortran.so.5
[...]
   2285610:     calling init: /opt/at14.0/lib64/power10/libgcc_s.so.1
   2285610:
   2285610:
   2285610:     calling init: /opt/at14.0/lib64/power10/libm.so.6
   2285610:
   2285610:
   2285610:     calling init: /opt/at14.0/lib64/power10/libquadmath.so.0
   2285610:
Segmentation fault (core dumped)

$ LD_LIBRARY_PATH=/home/mscastanho/build/openblas/lib /opt/at14.0/bin/ldd /opt/at14.0/lib64/power10/libgfortran.so.5
        linux-vdso64.so.1 (0x00007fff8b6a0000)
        libquadmath.so.0 => /opt/at14.0/lib64/power10/libquadmath.so.0 (0x00007fff8b070000)
        libm.so.6 => /opt/at14.0/lib64/power10/libm.so.6 (0x00007fff8af50000)
        libgcc_s.so.1 => /opt/at14.0/lib64/power10/libgcc_s.so.1 (0x00007fff8af10000)
        libc.so.6 => /opt/at14.0/lib64/power10/libc.so.6 (0x00007fff8acf0000)
        /opt/at14.0/lib64/ld64.so.2 (0x00007fff8b6c0000)

So the initialization code for libquadmath is doing something wrong there...

@RajalakshmiSR
Copy link

FYI, This works with at-next-15.0-0-alpha2.

@mscastanho
Copy link
Contributor

More info: the issue is not likely on libquadmath itself, it's just triggered by it. libquadmath has a constructor that is run at load time that registers new printf formats. In the process a call to calloc is issued and parts of the malloc infrastructure have to be initialized (malloc_hook_ini). In this process, the code tries to access the SINGLE_THREAD_P in the TLS, and this load is causing the invalid memory access.

glibc @ malloc.c:3076

  if (SINGLE_THREAD_P) <----- here
    {
      victim = _int_malloc (&main_arena, bytes);
      assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
	      &main_arena == arena_for_chunk (mem2chunk (victim)));
      return victim;
    }

The instruction that causes the segfault:

3076    in malloc.c                               
=> 0x00007ffff704ae14 <+148>:   lwz     r9,-30784(r13)              
   0x00007ffff704ae18 <+152>:   cmpwi   r9,0                                 
   0x00007ffff704ae1c <+156>:   bne     0x7ffff704b030 <malloc_hook_ini+688>

I tried reproducing the issue without linking to openblas, but using the same dependencies:

/opt/at14.0/bin/gcc -g -o use-gfortran blas.c -lgfortran -lpthread -lgomp -ldl

But the executable above works fine. I ran both programs side by side and single stepped close to where the issue happens, but couldn't spot anything out of the ordinary. I suspect this could be an issue with TLS initialization on glibc.

@austinpagan
Copy link

I was able to replicate this exact segfault traceback, when running one of the shipped tests in our internal OpenBLAS distribution.

After I extract and build OpenBLAS on a P10 (using at14.0), the executable is found in OpenBLAS/lapack-netlib/TESTING/EIG/xeigtstz, and the testing environment executes this test when I type "make complex16" in OpenBLAS/lapack-netlib/TESTING.

Interestingly, the other three data types ("single", "double", and "complex") do not run into this segfault. Only double-precision complex has the problem.

@mscastanho
Copy link
Contributor

The TLS access was failing because the offset used in the instruction was incorrect. The some code in the power9-tuned glibc from AT 14 shows that load should instead be lwz r9, -30720(r13). The offset is calculated in terms of sizeof() and alignof() of struct pthread and tcbhead_t. I found out that the size of struct pthread was actually different when compiled with -mcpu=power9 and -mcpu=power10, and the particular field changing sizes in it was struct _Unwind_Exception exc.

This struct is defined in glibc this way (sysdeps/generic/unwind.h):

struct _Unwind_Exception
{
  _Unwind_Exception_Class exception_class;
  _Unwind_Exception_Cleanup_Fn exception_cleanup;
  _Unwind_Word private_1;
  _Unwind_Word private_2;
  /* @@@ The IA-64 ABI says that this structure must be double-word aligned.
     Taking that literally does not make much sense generically.  Instead we
     provide the maximum alignment required by any type for the machine.  */
} __attribute__((__aligned__));

When no specific value is given to attribute aligned, gcc uses the maximum alignment for the target. The maximum value was increased recently for P10 to enable MMA support. Compiling a separate test case with -mno-mma suppresses the issue, and I can see the same thing for both P9 and P10 builds.

But the code above may need to be changed on glibc upstream. I'll start a discussion there.

@mscastanho
Copy link
Contributor

A bit more explanation on the problem: the issue happens only when linking against the P10 glibc multilib (compiled with -mcpu=power10, which implies -mmma). Since the alignment changed, the P10 libc.so.6 from AT saw a different struct pthread layout than the dynamic loader (which is built with default flags):

$ /opt/at14.0/bin/gdb -batch -ex "file /opt/at14.0/lib64/ld64.so.2" -ex "ptype /o struct pthread"
[...]
                           /* total size (bytes): 1936 */
                         }

$ /opt/at14.0/bin/gdb -batch -ex "file /opt/at14.0/lib64/power10/libc.so.6" -ex "ptype /o struct pthread"
[...]
                           /* total size (bytes): 1984 */
                         } 

So there was likely a mismatch between the memory the loader initialized and what glibc was trying to access.

The change to the default alignment value has been reverted on GCC upstream (gcc-mirror/gcc@a37b5bc) and will be backported to GCC 10 branch (followed by AT 14). There is also discussion on a fix for glibc so that struct pthread does not suffer from this again in the future. The fix should be merged soon.

The fixes should land on AT 14 over the next weeks, and will be released as part of a future AT 14.0-2.

@mscastanho
Copy link
Contributor

The upstream has also been backported to glibc 2.32 branch and will be merged to the ibm branch before the next AT 14 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants