gbmv! segfaults for large matrices #1698

dlfivefifty · 2018-07-25T12:35:50Z

I'm getting the following issue calling gbmv! in Julia, that is only present when compiled with OpenBLAS:

julia> n = 100_000_000; A = randn(3,n); x = randn(n); y = similar(x); BLAS.gbmv!('N', n, 1, 1, 1.0, A, x, 0.0, y);
ERROR: ReadOnlyMemoryError()
Stacktrace:
 [1] gbmv!(::Char, ::Int64, ::Int64, ::Int64, ::Float64, ::Array{Float64,2}, ::Array{Float64,1}, ::Float64, ::Array{Float64,1}) at /Users/solver/Projects/julia7/usr/share/julia/stdlib/v0.7/LinearAlgebra/src/blas.jl:652
 [2] top-level scope at none:0

julia> n = 100_000_000; A = randn(3,n); x = randn(n); y = similar(x); BLAS.gbmv!('N', n, 1, 1, 1.0, A, x, 0.0, y);
Segmentation fault: 11

When compiled with Apple's system BLAS, the above examples work fine.

The text was updated successfully, but these errors were encountered:

martin-frbg · 2018-07-25T12:54:34Z

Which version of OpenBLAS is this ?

dlfivefifty · 2018-07-25T12:56:25Z

According to openblas.version file in Julia v0.7-beta, it's

OPENBLAS_BRANCH=v0.3.0
OPENBLAS_SHA1=939452ea9dcb57abdcc3f1278c6db668a4690465

martin-frbg · 2018-07-25T13:24:40Z

Possibly PR #1389 (original issue #1089) was not sufficient. How many cores does your hardware have ?

dlfivefifty · 2018-07-25T13:48:50Z

4 cores I think. It's a MacBook Pro 2017.

martin-frbg · 2018-07-25T19:51:36Z

reproduced with a git clone of julia. gdb claims the action happens in dscal_kernel_8_zero called from dscal_k_HASWELL. (So pretty much the same signature as #1089 indeed).

brada4 · 2018-07-26T06:33:08Z

seems like osx now adds mprotect-ed guard pages around allocations to catch over/under flows.

I suspect https://github.com/xianyi/OpenBLAS/blob/25f2d25cfecf6cf850aa5b989f279e5023b6234b/driver/level2/gbmv_thread.c#L98 where I removed buffer, the padding-looking adjustment should actually go to x, (?) and code was wrong before and after (?)

martin-frbg · 2018-07-26T08:29:35Z

Unlikely as that line is inside an #ifdef TRANSA that does not apply here. I suspect this is a different but related failure mode with range_n rather than range_m going out of bounds on the last thread of the set.

brada4 · 2018-07-26T20:55:49Z

Is L246 writing past end of queue? , same rationale as 529bfc3 (but not 1:1 refecting to files there)
@dlfivefifty can you quickly check if adding 1 to queue size in https://github.com/xianyi/OpenBLAS/blob/25f2d25cfecf6cf850aa5b989f279e5023b6234b/driver/level2/gbmv_thread.c#L179 solves the issue?

martin-frbg · 2018-07-26T22:45:38Z

Nope, does not help. (And interestingly it appears to be a random thread from the middle of the queue array that experiences the segfault. Valgrind is no use because of JuliaLang/julia#27131 )

brada4 · 2018-07-27T08:21:51Z

work divider behaves somewhat counter-intuitive, in particular case 25(0) 25(0)1 25(0) 24(9)
I will try to plant same call in other scriptable BLAS consumer

brada4 · 2018-07-27T18:01:34Z

...Does not crash with single thread where crashes with multiple

martin-frbg · 2018-07-27T18:24:34Z

Do you have a reproducer that does not need julia ?

brada4 · 2018-07-27T18:52:00Z

@martin-frbg should be identical to scipy dgbmv , but that is via cblas by preference. R exports it via R_ext.h, but no standard or popular module wraps it into scriptable resource.

martin-frbg · 2018-07-29T14:58:26Z

Unfortunately building with -fsanitize=address appears to make the bug go away. (it reappears when running julia from gdb, but address sanitizer does not provide any additional information in that context)

martin-frbg · 2018-07-30T11:45:30Z

Hmm. Perhaps BUFFER_SIZE as defined in common_x86_64.h is actually too small here ?

brada4 · 2018-07-30T18:58:21Z

It points to threading threshold in interface/scal.c
Once that is exceeded there is a jump to unmapped area from place I could not find in assemblies.

   0x00007fffe528ecb9 <+201>:   mov    %rcx,%rdi
   0x00007fffe528ecbc <+204>:   lea    0x8(%rsp),%rsi
   0x00007fffe528ecc1 <+209>:   jp     0x7fffe528ed98 <dscal_k_PILEDRIVER+424>
   0x00007fffe528ecc7 <+215>:   jne    0x7fffe528ed98 <dscal_k_PILEDRIVER+424>
   0x00007fffe528eccd <+221>:   callq  0x7fffe528ead0 <- call to unmapped
   0x00007fffe528ecd2 <+226>:   movsd  0x8(%rsp),%xmm0 <- crash eip
   0x00007fffe528ecd8 <+232>:   movsd  (%rsp),%xmm1
   0x00007fffe528ecdd <+237>:   ucomisd %xmm1,%xmm0

I dont know if that is thread safety issue (in principle threading at this level should be single thread dscal?)

martin-frbg · 2018-07-30T19:10:48Z

Not sure I understand your find - when scal is called from gbmv_thread, it gets passed an invalid buffer pointer ("y") already. (All the setup code in gbmv_thread.c seems to be concerned with partitioning a buffer that is apparently assumed to always be big enough. That is why I now suspect that setting BUFFER_SIZE to 64<<20 or 128<<20 (as seen in the ppc and ia64headers respectively) might help. I will not be able to test this before the next weekend unfortunately.

martin-frbg · 2018-08-04T18:18:04Z

Increasing BUFFER_SIZE does not help. (Increasing the stack size limit does, same as in #1089)

martin-frbg · 2018-09-01T17:06:18Z

Revisiting this with 0.3.3, for some weird reason I cannot reproduce this issue in a shell started from an X session, but it is still there when logged in without X (but with same ulimits set). Could this be another (non)alignment problem ? (Actually it is something about the (size of the) environment - works fine as a regular user or after a plain "su" from there, crashes with a login shell of the root user.
Ulimits appear to be the same in both settings, and there is no obvious difference in the paths or the
environment variables except that the regular user has more of both.)
Rebuilt julia for the generic target to get around the valgrind issue but still not wiser.

martin-frbg · 2018-09-03T11:12:34Z

Some of the valgrind traces look as if Julia's garbage collector moved the work array while the OpenBLAS threads were trying to access it, but I cannot really make sense of it yet. What seems clear however is that this condition already existed with 0.2.13 (and probably much earlier, but trying older versions gets tedious as they lacked the symbol suffixes for interface64 builds.)

martin-frbg · 2018-09-30T22:47:54Z

Revisiting this, it now looks to me as if this is a fundamental issue rather than a coding bug - each thread needs to allocate a memory range big enough to hold the entire matrix from within the "buffer" region that is defined by the BUFFER_SIZE constant in common_<platform>.h. It seems one would need to increase the default 32 << 20 in common_x86_64.h all the way to 1024UL << 20 to accomodate just 4 threads with the n=100_000_000 workload. (Perhaps there needs to be an upper limit on parallelization in at least some of the interface routines as well, where some already have lower bounds now ?)

martin-frbg · 2018-12-30T19:22:20Z

Moving to next milestone as this is still unclear to me, could be similar in concept to bug #173

martin-frbg · 2020-02-11T12:08:04Z

Made configurable in the build system for now to avoid having to hack common_x86_64.h.

martin-frbg · 2020-12-11T22:13:49Z

closing as the original issue is fixed by increasing the default size and making it configurable - the requirement for this big buffer as such is a fundamental design "feature" that will need revisiting at some point

dlfivefifty · 2020-12-12T09:11:37Z

Segfaulting by design seems like a weird choice...

martin-frbg · 2020-12-12T09:23:10Z

...from something like 20 years ago when computer memory was scarce and where it would have been the norm that users would configure the innards of the library for their particular workloads.

dlfivefifty · 2020-12-12T09:33:54Z

An error saying “increase the default size” seems much more sensible

martin-frbg · 2020-12-12T16:28:51Z

assuming an appropriate single location for such a check can be found (that does not cost performance during regular operation). Increasing the default to accomodate your case (and match what the various internal parameters would have required in the first place) has already led to certain misgivings in the python camp, so a more fundamental long-term resolution would certainly be desirable.

martin-frbg mentioned this issue Sep 3, 2018

trmv! gives warning for moderately large matrices #1748

Closed

martin-frbg added this to the 0.3.5 milestone Dec 2, 2018

martin-frbg modified the milestones: 0.3.5, 0.3.6 Dec 30, 2018

martin-frbg modified the milestones: 0.3.6, 0.3.7 Apr 27, 2019

martin-frbg modified the milestones: 0.3.7, 0.3.8 Aug 11, 2019

martin-frbg mentioned this issue Feb 4, 2020

OpenBLAS crash from GSL (only with very large matrix) #2387

Closed

martin-frbg modified the milestones: 0.3.8, 0.3.9 Feb 9, 2020

martin-frbg mentioned this issue Feb 9, 2020

Make BUFFER_SIZE configurable at build time #2399

Merged

martin-frbg modified the milestones: 0.3.9, 0.3.10 Mar 1, 2020

martin-frbg mentioned this issue Apr 2, 2020

ARM64v8-TSV110: dgemm sigfaults on large square matrices single thread #2538

Closed

martin-frbg modified the milestones: 0.3.10, 0.3.11 Jun 14, 2020

martin-frbg modified the milestones: 0.3.11, 0.3.12, 0.3.13 Oct 15, 2020

martin-frbg mentioned this issue Nov 3, 2020

maybe a problem with 0.3.12 and NumPy? #2970

Closed

martin-frbg closed this as completed Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gbmv! segfaults for large matrices #1698

gbmv! segfaults for large matrices #1698

dlfivefifty commented Jul 25, 2018

martin-frbg commented Jul 25, 2018

dlfivefifty commented Jul 25, 2018

martin-frbg commented Jul 25, 2018

dlfivefifty commented Jul 25, 2018

martin-frbg commented Jul 25, 2018

brada4 commented Jul 26, 2018

martin-frbg commented Jul 26, 2018

brada4 commented Jul 26, 2018 •

edited

martin-frbg commented Jul 26, 2018

brada4 commented Jul 27, 2018

brada4 commented Jul 27, 2018

martin-frbg commented Jul 27, 2018

brada4 commented Jul 27, 2018

martin-frbg commented Jul 29, 2018

martin-frbg commented Jul 30, 2018

brada4 commented Jul 30, 2018

martin-frbg commented Jul 30, 2018

martin-frbg commented Aug 4, 2018

martin-frbg commented Sep 1, 2018 •

edited

martin-frbg commented Sep 3, 2018

martin-frbg commented Sep 30, 2018 •

edited

martin-frbg commented Dec 30, 2018

martin-frbg commented Feb 11, 2020

martin-frbg commented Dec 11, 2020

dlfivefifty commented Dec 12, 2020

martin-frbg commented Dec 12, 2020

dlfivefifty commented Dec 12, 2020

martin-frbg commented Dec 12, 2020

gbmv! segfaults for large matrices #1698

gbmv! segfaults for large matrices #1698

Comments

dlfivefifty commented Jul 25, 2018

martin-frbg commented Jul 25, 2018

dlfivefifty commented Jul 25, 2018

martin-frbg commented Jul 25, 2018

dlfivefifty commented Jul 25, 2018

martin-frbg commented Jul 25, 2018

brada4 commented Jul 26, 2018

martin-frbg commented Jul 26, 2018

brada4 commented Jul 26, 2018 • edited

martin-frbg commented Jul 26, 2018

brada4 commented Jul 27, 2018

brada4 commented Jul 27, 2018

martin-frbg commented Jul 27, 2018

brada4 commented Jul 27, 2018

martin-frbg commented Jul 29, 2018

martin-frbg commented Jul 30, 2018

brada4 commented Jul 30, 2018

martin-frbg commented Jul 30, 2018

martin-frbg commented Aug 4, 2018

martin-frbg commented Sep 1, 2018 • edited

martin-frbg commented Sep 3, 2018

martin-frbg commented Sep 30, 2018 • edited

martin-frbg commented Dec 30, 2018

martin-frbg commented Feb 11, 2020

martin-frbg commented Dec 11, 2020

dlfivefifty commented Dec 12, 2020

martin-frbg commented Dec 12, 2020

dlfivefifty commented Dec 12, 2020

martin-frbg commented Dec 12, 2020

brada4 commented Jul 26, 2018 •

edited

martin-frbg commented Sep 1, 2018 •

edited

martin-frbg commented Sep 30, 2018 •

edited