fix(docker): unblock 8 GB VM target — OpenBLAS musl pthread + RTS memory discipline#60
Merged
Merged
Conversation
Statically-linked Alpine/musl builds segfault inside MUMPS factorization on the first BLAS3 call from a worker thread (exit 139). musl's default pthread stack is 128 KB, vs glibc's 8 MB read from RLIMIT_STACK. OpenBLAS worker threads inherit that 128 KB and overflow on DYNAMIC_ARCH Fortran kernels that hold large auto-arrays — typically dgemm/dtrsm on dense frontal blocks during sparse LU factorization. Reproduces on a 16 GB VM running volca-with-frontend, on the first impact request hitting Agribalyse 3.2 (21510 activities). The crash is not memory-bound (RSS stays low, exit code 139 = SIGSEGV, not 137). Patch driver/others/blas_server.c to call pthread_attr_setstacksize at 8 MB before pthread_create. The change aligns musl's behaviour with glibc's effective default and is a no-op on glibc rebuilds. Two grep guards bracket the sed: if a future OpenBLAS release moves the pthread_attr_init anchor, the Docker build fails loudly instead of silently producing a binary that crashes in production.
Two cheap RTS tweaks (cherry-picked from #59, minus the -M change): - Add -Fd1.0 (GHC 9.10+): decay free heap blocks back to the OS over ~1 idle period instead of the default 4.0, which keeps RSS pinned near peak for minutes after a parsing spike. - -I30 -> -I0.3 (GHC default): trigger idle-time major GC promptly. The previous 30 s deferral hid live-data drops and starved -Fd of free blocks to release. Keeping -M at 75 % of RAM for now: dropping to 50 % may be the right call eventually, but the OpenBLAS musl crash that motivated the change in #59 is fixed independently in the previous commit. Re-evaluate -M once we have RSS curves on the 8 GB target.
19a7e21 to
bf4ffd7
Compare
3 tasks
4 tasks
ccomb
added a commit
that referenced
this pull request
May 16, 2026
PR #60 shipped broken because nothing in the build pipeline verified the injected pthread_attr_setstacksize call survived compilation; the silent regression was only caught by a production SIGSEGV. The fix in this branch (-Wl,-z,stack-size=8388608) replaces that with a different invariant — "PT_GNU_STACK->p_memsz == 0x800000 in the shipped ELF" — which is again only checked manually. Add a readelf-based assertion right after UPX so a future linker-flag refactor (or a UPX-side header rewrite) fails the image build with a diagnostic naming the actual value found, instead of recurring as a runtime crash on a customer VM. The check runs on /build/output/volca (post-strip, post-UPX) because that's the binary that actually runs. binutils — which provides readelf — is already in the build-stage apk add.
ccomb
added a commit
that referenced
this pull request
May 16, 2026
PR #60 diagnosed the SIGSEGV correctly (musl's 128 KB default pthread stack overflows on the first BLAS3 call from MUMPS factorization) but patched the wrong place: the injected pthread_attr_setstacksize sat inside #ifdef NEED_STACKATTR, which blas_server.c #undef's on Linux. The code compiled out; the crash reproduced unchanged on a 16 GB VM and `OPENBLAS_NUM_THREADS=1` worked around it. Real fix: bake the desired default into the ELF's PT_GNU_STACK header via -Wl,-z,stack-size=8388608 in LINK_MODE=musl. musl reads p_memsz at process start and uses it as __default_stacksize, so every pthread created with NULL attr — OpenBLAS workers, GHC RTS capabilities, anything else — starts with 8 MB. Linker flag covers more ground than a source patch would and avoids the second pthread_create site in goto_set_num_threads that lacks the `attr` symbol. Also adds a readelf assertion in docker/Dockerfile right after UPX so a future linker-flag refactor — or a UPX-side header rewrite — fails the image build loudly instead of recurring as a runtime crash on a customer VM.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
Make VoLCA run on an 8 GB RAM VM with pre-loaded databases (no parser pressure). Two independent root causes were stopping us; both are addressed here, in two atomic commits.
Commit 1 — RTS memory return (subset of #59)
`docker/rts-flags.sh`:
`-M` left at 75 % of RAM for now. #59 also halves it to 50 %; that's likely the right call eventually (MUMPS Fortran workspace allocates outside the GHC heap so 75 % leaves no headroom on tight VMs), but the immediate motivator for #59 was the OOM-killer firing — which we now attribute to the OpenBLAS musl crash addressed in commit 2. Re-evaluate `-M` once we have RSS curves on the 8 GB target.
`-A`, `-c`, `-F1.5`, `-qg0`, `-n` left alone — they trade off with throughput and shouldn't move without benchmarking.
Commit 2 — OpenBLAS pthread stack on musl
Static Alpine/musl builds segfaulted (exit 139 / SIGSEGV) inside MUMPS factorization on the first dense BLAS3 call. musl's hardcoded 128 KB default pthread stack (vs glibc's 8 MB read from `RLIMIT_STACK`) is overflowed by OpenBLAS `DYNAMIC_ARCH` Fortran kernels with large auto-arrays.
Patch `driver/others/blas_server.c` during the Docker build to call `pthread_attr_setstacksize(&attr, 8 << 20)` right after `pthread_attr_init`. Two `grep` guards bracket the `sed` — a future upstream refactor fails the Docker build loudly instead of silently regressing the runtime.
Workaround for verification: `OPENBLAS_NUM_THREADS=1` (no worker pthreads spawned) makes the crash disappear without the patch — proves the worker-thread stack is the only contributor.
Why combine in one PR
The two changes solve different layers (Haskell RTS hygiene vs native BLAS thread setup), but the goal is shared: shipping a binary that runs on an 8 GB VM with Agribalyse-class workloads. The OpenBLAS fix is the load-bearing change; the RTS tweaks are cheap incremental hygiene on top.
Test plan