You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v0.9.0 and v0.9.1 (both confirmed). Likely all releases from v0.8.1 onward (the first release including #282).
Operating System
Windows
macOS
Linux
FreeBSD
NetBSD
Android
iOS
Go Version (go version)
go 1.24 and go 1.26.1
What steps will reproduce the problem?
Two Go goroutines calling the same RegisterLibFunc-registered function from a shared library, signature func(uint64) *byte. Within seconds the goroutines observe each other's return pointers.
ultra.c — 5 lines of payload, returns a malloc'd buffer with the seed embedded:
Each goroutine's alloc(seed) call returns a pointer to a buffer whose embedded seed matches the seed that goroutine passed in. Across all workers × duration calls, the program prints mismatches=0 and exits cleanly. This is the observed behavior on v0.8.0 (153M dispatches at 32 workers, zero mismatches) and on plain C with 32 pthreads against the same kind of .so (16M+ ops clean).
What happens instead?
On v0.9.0 / v0.9.1 with workers ≥ 2, within seconds a goroutine reads back JSON whose embedded seed belongs to a different worker's seed range — the return pointer from another goroutine's in-flight alloc call has bled across. Often the run then aborts: two goroutines end up defer free-ing the same C pointer and glibc trips double free detected in tcache. Verbatim output from a -workers 32 -dur 15s run on v0.9.1:
Worker 16 and worker 12 swapped return pointers — each reads back the other's just-allocated buffer. The third line shows a further-degraded case: by the time worker 16 reads the buffer it's already been freed and reused, so the prefix is non-JSON garbage. The SIGABRT is glibc reacting to two goroutines free-ing the same pointer.
Anything else you feel useful to add?
What I've established
The race is introduced in #282 (all: improve memory usage). That PR replaced the stack-allocated syscall15Args in RegisterFunc's reflect closure with syscall := thePool.Get().(*syscall15Args) + defer thePool.Put(syscall) (func.go:310-311 in v0.9.0). Verified by direct version test: built the repro against v0.8.0 (the last release before all: improve memory usage #282 merged on 2024-10-17, with zero thePool references in func.go) and ran it at workers ∈ {2, 4, 8, 32} for 20s each — 153M total dispatches, zero mismatches. Same repro against v0.9.0 / v0.9.1 mismatches within seconds at any workers ≥ 2.
Reverting just those two lines at HEAD (back to the v0.8.0 stack allocation pattern) eliminates the race over 137M+ ops in 60s × 32 goroutines. Same workload that mismatches within seconds at HEAD goes mismatch-free.
The bug does not reproduce at workers=1 even at 19M+ ops — single-goroutine FFI is unaffected.
The bug does not reproduce when the same .so is called from plain C with 32 pthreads under identical workload (16M+ ops clean). So it's not the C side, not the system malloc, not the kernel.
The bug does reproduce with GOMAXPROCS=1 — Go still spawns separate OS threads for cgocalls.
A function returning (u64, u64) → u64 (no pointer return) at 32 workers does not race. The pointer-return path is required.
A function taking string args but returning u64 at 32 workers does not race. So arg marshaling is not the source — the racy path is the return-value extraction.
What I don't know
The precise interleaving. I haven't been able to construct from first principles how two sync.Pool.Get() calls could return overlapping items. sync.Pool is documented concurrent-safe, and *syscall.a1 is read before the deferred Put runs.
The regression points at the sync.Pool of *syscall15Args introduced in #282 (func.go:310-311) and mirrored on the parallel sysv path in #328 (syscall_sysv.go:18-19). Possibilities I considered but couldn't confirm:
The asm trampoline (syscall15XABI0) writes to the captured *syscall15Argsafterruntime_cgocall nominally returns to Go, opening a window where another goroutine has the same struct from the pool.
A subtle escape-analysis or stack-growth interaction (PR #328 notes that the nosplit annotation on the parallel syscall_sysv.go path was removed when the pool was reintroduced there).
Goroutine preemption between thePool.Get() and runtime_cgocall lets another goroutine Get the same item if the per-P cache misaligns with the in-flight call.
PureGo Version
v0.9.0 and v0.9.1 (both confirmed). Likely all releases from v0.8.1 onward (the first release including #282).
Operating System
Go Version (
go version)go 1.24 and go 1.26.1
What steps will reproduce the problem?
Two Go goroutines calling the same
RegisterLibFunc-registered function from a shared library, signaturefunc(uint64) *byte. Within seconds the goroutines observe each other's return pointers.ultra.c— 5 lines of payload, returns a malloc'd buffer with the seed embedded:ultra.go— N goroutines, each with a per-worker disjoint seed range, reads the JSON back and checks whether the embedded seed matches what it sent:What is the expected result?
Each goroutine's
alloc(seed)call returns a pointer to a buffer whose embedded seed matches the seed that goroutine passed in. Across allworkers × durationcalls, the program printsmismatches=0and exits cleanly. This is the observed behavior onv0.8.0(153M dispatches at 32 workers, zero mismatches) and on plain C with 32 pthreads against the same kind of.so(16M+ ops clean).What happens instead?
On
v0.9.0/v0.9.1withworkers ≥ 2, within seconds a goroutine reads back JSON whose embedded seed belongs to a different worker's seed range — the return pointer from another goroutine's in-flightalloccall has bled across. Often the run then aborts: two goroutines end updefer free-ing the same C pointer and glibc tripsdouble free detected in tcache. Verbatim output from a-workers 32 -dur 15srun onv0.9.1:Worker 16 and worker 12 swapped return pointers — each reads back the other's just-allocated buffer. The third line shows a further-degraded case: by the time worker 16 reads the buffer it's already been freed and reused, so the prefix is non-JSON garbage. The SIGABRT is glibc reacting to two goroutines
free-ing the same pointer.Anything else you feel useful to add?
What I've established
all: improve memory usage). That PR replaced the stack-allocatedsyscall15ArgsinRegisterFunc's reflect closure withsyscall := thePool.Get().(*syscall15Args)+defer thePool.Put(syscall)(func.go:310-311in v0.9.0). Verified by direct version test: built the repro againstv0.8.0(the last release before all: improve memory usage #282 merged on 2024-10-17, with zerothePoolreferences infunc.go) and ran it at workers ∈ {2, 4, 8, 32} for 20s each — 153M total dispatches, zero mismatches. Same repro againstv0.9.0/v0.9.1mismatches within seconds at anyworkers ≥ 2.workers=1even at 19M+ ops — single-goroutine FFI is unaffected..sois called from plain C with 32 pthreads under identical workload (16M+ ops clean). So it's not the C side, not the system malloc, not the kernel.GOMAXPROCS=1— Go still spawns separate OS threads for cgocalls.(u64, u64) → u64(no pointer return) at 32 workers does not race. The pointer-return path is required.u64at 32 workers does not race. So arg marshaling is not the source — the racy path is the return-value extraction.What I don't know
The precise interleaving. I haven't been able to construct from first principles how two
sync.Pool.Get()calls could return overlapping items.sync.Poolis documented concurrent-safe, and*syscall.a1is read before the deferredPutruns.The regression points at the
sync.Poolof*syscall15Argsintroduced in #282 (func.go:310-311) and mirrored on the parallel sysv path in #328 (syscall_sysv.go:18-19). Possibilities I considered but couldn't confirm:syscall15XABI0) writes to the captured*syscall15Argsafterruntime_cgocallnominally returns to Go, opening a window where another goroutine has the same struct from the pool.nosplitannotation on the parallelsyscall_sysv.gopath was removed when the pool was reintroduced there).thePool.Get()andruntime_cgocalllets another goroutineGetthe same item if the per-P cache misaligns with the in-flight call.