fix(docker): unblock 8 GB VM target — OpenBLAS musl pthread + RTS memory discipline by ccomb · Pull Request #60 · ccomb/volca

ccomb · 2026-05-16T15:41:36Z

Goal

Make VoLCA run on an 8 GB RAM VM with pre-loaded databases (no parser pressure). Two independent root causes were stopping us; both are addressed here, in two atomic commits.

Commit 1 — RTS memory return (subset of #59)

`docker/rts-flags.sh`:

Add `-Fd1.0` (GHC 9.10+). Decays free heap blocks back to the OS over ~1 idle period. Without it (default 4.0), RSS stays pinned near the peak for minutes after a spike.
`-I30` → `-I0.3` (GHC default). A 30 s deferred idle GC was hiding live-data drops and starving `-Fd` of free blocks.

`-M` left at 75 % of RAM for now. #59 also halves it to 50 %; that's likely the right call eventually (MUMPS Fortran workspace allocates outside the GHC heap so 75 % leaves no headroom on tight VMs), but the immediate motivator for #59 was the OOM-killer firing — which we now attribute to the OpenBLAS musl crash addressed in commit 2. Re-evaluate `-M` once we have RSS curves on the 8 GB target.

`-A`, `-c`, `-F1.5`, `-qg0`, `-n` left alone — they trade off with throughput and shouldn't move without benchmarking.

Commit 2 — OpenBLAS pthread stack on musl

Static Alpine/musl builds segfaulted (exit 139 / SIGSEGV) inside MUMPS factorization on the first dense BLAS3 call. musl's hardcoded 128 KB default pthread stack (vs glibc's 8 MB read from `RLIMIT_STACK`) is overflowed by OpenBLAS `DYNAMIC_ARCH` Fortran kernels with large auto-arrays.

Patch `driver/others/blas_server.c` during the Docker build to call `pthread_attr_setstacksize(&attr, 8 << 20)` right after `pthread_attr_init`. Two `grep` guards bracket the `sed` — a future upstream refactor fails the Docker build loudly instead of silently regressing the runtime.

Workaround for verification: `OPENBLAS_NUM_THREADS=1` (no worker pthreads spawned) makes the crash disappear without the patch — proves the worker-thread stack is the only contributor.

Why combine in one PR

The two changes solve different layers (Haskell RTS hygiene vs native BLAS thread setup), but the goal is shared: shipping a binary that runs on an 8 GB VM with Agribalyse-class workloads. The OpenBLAS fix is the load-bearing change; the RTS tweaks are cheap incremental hygiene on top.

Test plan

`./docker-build.sh --with-frontend` succeeds (both `grep` guards pass)
On an 8 GB Alpine VM: `docker run … volca-with-frontend …`, load Agribalyse 3.2 from cache, request `/impacts/Environmental Footprint 3.1 (adapted)` — completes without exit 139
No `OPENBLAS_NUM_THREADS` env override needed: parallel BLAS workers auto-scale to `nproc`
Printed `RTS: ... -> +RTS ...` summary shows `-Fd1.0` and `-I0.3` (`-M` stays at 75 % of cgroup limit)
Spot-check post-request RSS drops within seconds of going idle (`-Fd1.0` working)

Statically-linked Alpine/musl builds segfault inside MUMPS factorization on the first BLAS3 call from a worker thread (exit 139). musl's default pthread stack is 128 KB, vs glibc's 8 MB read from RLIMIT_STACK. OpenBLAS worker threads inherit that 128 KB and overflow on DYNAMIC_ARCH Fortran kernels that hold large auto-arrays — typically dgemm/dtrsm on dense frontal blocks during sparse LU factorization. Reproduces on a 16 GB VM running volca-with-frontend, on the first impact request hitting Agribalyse 3.2 (21510 activities). The crash is not memory-bound (RSS stays low, exit code 139 = SIGSEGV, not 137). Patch driver/others/blas_server.c to call pthread_attr_setstacksize at 8 MB before pthread_create. The change aligns musl's behaviour with glibc's effective default and is a no-op on glibc rebuilds. Two grep guards bracket the sed: if a future OpenBLAS release moves the pthread_attr_init anchor, the Docker build fails loudly instead of silently producing a binary that crashes in production.

Two cheap RTS tweaks (cherry-picked from #59, minus the -M change): - Add -Fd1.0 (GHC 9.10+): decay free heap blocks back to the OS over ~1 idle period instead of the default 4.0, which keeps RSS pinned near peak for minutes after a parsing spike. - -I30 -> -I0.3 (GHC default): trigger idle-time major GC promptly. The previous 30 s deferral hid live-data drops and starved -Fd of free blocks to release. Keeping -M at 75 % of RAM for now: dropping to 50 % may be the right call eventually, but the OpenBLAS musl crash that motivated the change in #59 is fixed independently in the previous commit. Re-evaluate -M once we have RSS curves on the 8 GB target.

PR #60 shipped broken because nothing in the build pipeline verified the injected pthread_attr_setstacksize call survived compilation; the silent regression was only caught by a production SIGSEGV. The fix in this branch (-Wl,-z,stack-size=8388608) replaces that with a different invariant — "PT_GNU_STACK->p_memsz == 0x800000 in the shipped ELF" — which is again only checked manually. Add a readelf-based assertion right after UPX so a future linker-flag refactor (or a UPX-side header rewrite) fails the image build with a diagnostic naming the actual value found, instead of recurring as a runtime crash on a customer VM. The check runs on /build/output/volca (post-strip, post-UPX) because that's the binary that actually runs. binutils — which provides readelf — is already in the build-stage apk add.

PR #60 diagnosed the SIGSEGV correctly (musl's 128 KB default pthread stack overflows on the first BLAS3 call from MUMPS factorization) but patched the wrong place: the injected pthread_attr_setstacksize sat inside #ifdef NEED_STACKATTR, which blas_server.c #undef's on Linux. The code compiled out; the crash reproduced unchanged on a 16 GB VM and `OPENBLAS_NUM_THREADS=1` worked around it. Real fix: bake the desired default into the ELF's PT_GNU_STACK header via -Wl,-z,stack-size=8388608 in LINK_MODE=musl. musl reads p_memsz at process start and uses it as __default_stacksize, so every pthread created with NULL attr — OpenBLAS workers, GHC RTS capabilities, anything else — starts with 8 MB. Linker flag covers more ground than a source patch would and avoids the second pthread_create site in goto_set_num_threads that lacks the `attr` symbol. Also adds a readelf assertion in docker/Dockerfile right after UPX so a future linker-flag refactor — or a UPX-side header rewrite — fails the image build loudly instead of recurring as a runtime crash on a customer VM.

ccomb changed the title ~~fix(docker): patch OpenBLAS pthread stack size for musl static build~~ fix(docker): unblock 8 GB VM target — OpenBLAS musl pthread + RTS memory discipline May 16, 2026

ccomb force-pushed the fix/openblas-musl-pthread-stack branch from 19a7e21 to bf4ffd7 Compare May 16, 2026 15:44

ccomb mentioned this pull request May 16, 2026

rts: halve max heap, return memory to OS after parsing spikes #59

Closed

3 tasks

ccomb merged commit 3b3cf5e into main May 16, 2026
5 checks passed

ccomb deleted the fix/openblas-musl-pthread-stack branch May 16, 2026 16:00

ccomb mentioned this pull request May 16, 2026

fix(docker): set musl pthread stack via PT_GNU_STACK (real fix for #60) #61

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docker): unblock 8 GB VM target — OpenBLAS musl pthread + RTS memory discipline#60

fix(docker): unblock 8 GB VM target — OpenBLAS musl pthread + RTS memory discipline#60
ccomb merged 2 commits into
mainfrom
fix/openblas-musl-pthread-stack

ccomb commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ccomb commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Commit 1 — RTS memory return (subset of #59)

Commit 2 — OpenBLAS pthread stack on musl

Why combine in one PR

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ccomb commented May 16, 2026 •

edited

Loading