func.go: Concurrent calls swap return-value pointers (sync.Pool of *syscall15Args, regressed in #282)

### PureGo Version

v0.9.0 and v0.9.1 (both confirmed). Likely all releases from v0.8.1 onward (the first release including #282).

### Operating System

- [ ] Windows
- [ ] macOS
- [x] Linux
- [ ] FreeBSD
- [ ] NetBSD
- [ ] Android
- [ ] iOS

### Go Version (`go version`)

go 1.24 and go 1.26.1

### What steps will reproduce the problem?

Two Go goroutines calling the same `RegisterLibFunc`-registered function from a shared library, signature `func(uint64) *byte`. Within seconds the goroutines observe each other's return pointers.

**`ultra.c`** — 5 lines of payload, returns a malloc'd buffer with the seed embedded:

```c
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

char *ultra_alloc(uint64_t seed) {
    char *buf = (char *)malloc(256);
    if (!buf) return NULL;
    snprintf(buf, 256, "{\"seed\":%lu,\"_\":\"%064lu\"}",
             (unsigned long)seed, (unsigned long)seed);
    return buf;
}

void ultra_free(char *s) { free(s); }
```

```sh
cc -O2 -shared -fPIC -o libultra.so ultra.c
```

**`ultra.go`** — N goroutines, each with a per-worker disjoint seed range, reads the JSON back and checks whether the embedded seed matches what it sent:

```go
package main

import (
    "flag"
    "fmt"
    "os"
    "strings"
    "sync"
    "sync/atomic"
    "time"
    "unsafe"

    "github.com/ebitengine/purego"
)

func main() {
    workers := flag.Int("workers", 32, "")
    flag.Parse()

    h, err := purego.Dlopen("./libultra.so", purego.RTLD_NOW|purego.RTLD_GLOBAL)
    if err != nil {
        fmt.Println(err)
        os.Exit(2)
    }

    var alloc func(uint64) *byte
    var free func(*byte)
    purego.RegisterLibFunc(&alloc, h, "ultra_alloc")
    purego.RegisterLibFunc(&free, h, "ultra_free")

    var ops, mismatches atomic.Uint64
    var wg sync.WaitGroup
    done := make(chan struct{})

    for i := 0; i < *workers; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            seed := uint64(id) * 1_000_000_000 // disjoint per worker
            for {
                select {
                case <-done:
                    return
                default:
                }
                seed++
                ptr := alloc(seed)
                if ptr == nil {
                    continue
                }
                s := string(unsafe.Slice(ptr, 96))
                want := fmt.Sprintf(`{"seed":%d,`, seed)
                if !strings.HasPrefix(s, want) {
                    if n := mismatches.Add(1); n <= 5 {
                        fmt.Fprintf(os.Stderr,
                            "worker %d sent=%d head=%q\n",
                            id, seed, s[:48])
                    }
                }
                free(ptr)
                ops.Add(1)
            }
        }(i)
    }

    time.Sleep(15 * time.Second)
    close(done)
    wg.Wait()
    fmt.Printf("ops=%d mismatches=%d\n", ops.Load(), mismatches.Load())
}
```

### What is the expected result?

Each goroutine's `alloc(seed)` call returns a pointer to a buffer whose embedded seed matches the seed that goroutine passed in. Across all `workers × duration` calls, the program prints `mismatches=0` and exits cleanly. This is the observed behavior on `v0.8.0` (153M dispatches at 32 workers, zero mismatches) and on plain C with 32 pthreads against the same kind of `.so` (16M+ ops clean).

### What happens instead?

On `v0.9.0` / `v0.9.1` with `workers ≥ 2`, within seconds a goroutine reads back JSON whose embedded seed belongs to a *different* worker's seed range — the return pointer from another goroutine's in-flight `alloc` call has bled across. Often the run then aborts: two goroutines end up `defer free`-ing the same C pointer and glibc trips `double free detected in tcache`. Verbatim output from a `-workers 32 -dur 15s` run on `v0.9.1`:

```
worker 16 sent=16000003040 head="{\"seed\":12000007214,\"_\":\"00000000000000000000000"
worker 12 sent=12000007219 head="{\"seed\":16000003063,\"_\":\"00000000000000000000000"
worker 16 sent=16000003092 head="\xa1Lk%\x11y\x00\x00Ӗ\x8d\xdbo\xa53\x02312,\"_\":\"00000000000000000000000"
free(): double free detected in tcache 2
SIGABRT: abort
```

Worker 16 and worker 12 swapped return pointers — each reads back the other's just-allocated buffer. The third line shows a further-degraded case: by the time worker 16 reads the buffer it's already been freed and reused, so the prefix is non-JSON garbage. The SIGABRT is glibc reacting to two goroutines `free`-ing the same pointer.

### Anything else you feel useful to add?

## What I've established

- **The race is introduced in [#282](https://github.com/ebitengine/purego/pull/282) (`all: improve memory usage`).** That PR replaced the stack-allocated `syscall15Args` in `RegisterFunc`'s reflect closure with `syscall := thePool.Get().(*syscall15Args)` + `defer thePool.Put(syscall)` (`func.go:310-311` in v0.9.0). Verified by direct version test: built the repro against `v0.8.0` (the last release before #282 merged on 2024-10-17, with zero `thePool` references in `func.go`) and ran it at workers ∈ {2, 4, 8, 32} for 20s each — **153M total dispatches, zero mismatches**. Same repro against `v0.9.0` / `v0.9.1` mismatches within seconds at any `workers ≥ 2`.
- **Reverting just those two lines** at HEAD (back to the v0.8.0 stack allocation pattern) eliminates the race over 137M+ ops in 60s × 32 goroutines. Same workload that mismatches within seconds at HEAD goes mismatch-free.
- The bug does **not** reproduce at `workers=1` even at 19M+ ops — single-goroutine FFI is unaffected.
- The bug does **not** reproduce when the same `.so` is called from plain C with 32 pthreads under identical workload (16M+ ops clean). So it's not the C side, not the system malloc, not the kernel.
- The bug **does** reproduce with `GOMAXPROCS=1` — Go still spawns separate OS threads for cgocalls.
- A function returning `(u64, u64) → u64` (no pointer return) at 32 workers does not race. The pointer-return path is required.
- A function taking string args but returning `u64` at 32 workers does not race. So arg marshaling is not the source — the racy path is the return-value extraction.

## What I don't know

The precise interleaving. I haven't been able to construct from first principles how two `sync.Pool.Get()` calls could return overlapping items. `sync.Pool` is documented concurrent-safe, and `*syscall.a1` is read before the deferred `Put` runs.

The regression points at the `sync.Pool` of `*syscall15Args` introduced in [#282](https://github.com/ebitengine/purego/pull/282) (`func.go:310-311`) and mirrored on the parallel sysv path in [#328](https://github.com/ebitengine/purego/pull/328) (`syscall_sysv.go:18-19`). Possibilities I considered but couldn't confirm:

1. The asm trampoline (`syscall15XABI0`) writes to the captured `*syscall15Args` *after* `runtime_cgocall` nominally returns to Go, opening a window where another goroutine has the same struct from the pool.
2. A subtle escape-analysis or stack-growth interaction (PR [#328](https://github.com/ebitengine/purego/pull/328) notes that the `nosplit` annotation on the parallel `syscall_sysv.go` path was removed when the pool was reintroduced there).
3. Goroutine preemption between `thePool.Get()` and `runtime_cgocall` lets another goroutine `Get` the same item if the per-P cache misaligns with the in-flight call.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

func.go: Concurrent calls swap return-value pointers (sync.Pool of *syscall15Args, regressed in #282) #451

PureGo Version

Operating System

Go Version (`go version`)

What steps will reproduce the problem?

What is the expected result?

What happens instead?

Anything else you feel useful to add?

What I've established

What I don't know

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

func.go: Concurrent calls swap return-value pointers (sync.Pool of *syscall15Args, regressed in #282) #451

Description

PureGo Version

Operating System

Go Version (go version)

What steps will reproduce the problem?

What is the expected result?

What happens instead?

Anything else you feel useful to add?

What I've established

What I don't know

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Go Version (`go version`)