Handle cross-thread slab frees by SS-42 · Pull Request #782 · DeusData/codebase-memory-mcp

SS-42 · 2026-07-02T18:42:17Z

Summary

This fixes slab allocator ownership for tree-sitter allocations that are freed on a different parser thread than the one that allocated them.

The allocator now keeps a global slab-page registry. Each page still has an owner thread for reuse, but cross-thread frees are recognized as slab pointers instead of falling through to plain free(). Reclaim/destroy detaches pages with no live chunks and retires pages that still have live chunks, freeing them when the final chunk is returned.

The slab payload is explicitly aligned to max_align_t, and resolve workers now tear down their thread-local parser/slab state before exiting so registered pages do not keep an owner pointer into dead TLS.

Observed environment: macOS 26.5.2 (25F84), Darwin 25.5.0 arm64, Apple clang 21.0.0.

Crash evidence

Local macOS DiagnosticReports show in-pipeline watcher/indexer aborts with SIGABRT / ___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED under tree-sitter frames, including:

2026-07-02 19:31:58 +0300, PID 74258: ts_stack_delete -> ts_parser_delete -> cbm_destroy_thread_parser -> extract_worker -> cbm_parallel_extract -> cbm_pipeline_run -> watcher_index_fn -> cbm_watcher_poll_once -> cbm_watcher_run -> watcher_thread
2026-07-02 19:31:59 +0300, PID 92286: _realloc -> ts_subtree_compress -> ts_parser_parse -> cbm_extract_file -> extract_worker -> cbm_parallel_extract -> cbm_pipeline_run -> watcher_index_fn -> cbm_watcher_poll_once -> cbm_watcher_run -> watcher_thread
2026-07-03 18:48:33 +0300, PID 96664: stack_node_release -> ts_stack_renumber_version -> ts_parser_parse -> cbm_extract_file -> extract_worker -> cbm_parallel_extract -> cbm_pipeline_run_incremental -> cbm_pipeline_run -> watcher_index_fn -> cbm_watcher_poll_once -> cbm_watcher_run -> watcher_thread

The 2026-07-03 reports came from local build UUID 459D6782-328E-308C-BDFB-A117FEC793ED. That build is useful provenance for the same watcher/tree-sitter failure class, but it is not byte-identical to this PR head: this PR head additionally has alignas(max_align_t) and resolve-worker terminal slab teardown.

Tests

make -f Makefile.cbm build/c/test-runner CC=clang CXX=clang++
ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem (34 passed)
ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem pipeline (243 passed)
red-first allocator regressions cover:
- cross-thread free of a slab chunk
- reclaim while a foreign thread still holds a live chunk
- destroy while a foreign thread still holds a live chunk

Local indexer timing smoke

Same macOS arm64 host, same checkout, fresh CBM_CACHE_DIR per run, cli index_repository --mode fast:

`CBM_WORKERS`	main	PR head	result
1	15.238s	14.949s	PR -1.9%
2	5.410s	5.329s	PR -1.5%
4	3.381s	3.307s	PR -2.2%
8	5.322s	5.536s	PR +4.0%
10	8.244s	9.246s	PR +12.2%

A separate default-worker smoke on the same checkout produced main 8.700s / 8.537s and PR head 13.675s / 9.116s; the first PR run appears cold/outlier, while the second is about +7% versus main. The global registry remains a deliberate correctness/performance tradeoff in this version rather than a proven no-cost fast-path change.

DeusData · 2026-07-03T13:59:55Z

Thanks for the allocator fix. Triage: high-priority memory-safety/stability PR.

Review will be fairly strict here: cross-thread slab ownership is low-level and can create subtle corruption if half-fixed. We will look for ASan/UBSan coverage, cross-thread free regression coverage, and confirmation that normal same-thread allocation/free behavior is unchanged.

DeusData · 2026-07-03T16:40:57Z

Thank you — this is a serious, well-written PR and the two unit tests are honest red-first reproductions of the allocator invariant (cross-thread free of a slab chunk falls through to plain free() of an interior pointer; foreign-live chunk across reclaim). Before we can take a redesign of this hot path, though, we need two things: (1) Evidence of an in-pipeline trigger. We traced the current pipeline and found parse-thread==free-thread holds everywhere (extract workers free trees before caching; resolve workers re-parse on their own thread; the sequential path is single-threaded) — so as far as we can tell this hardens a latent class rather than fixing an observed in-tree crash. Could you share the macOS malloc-diagnostics stack/repro that motivated it? If there IS a real path, that changes the calculus entirely. (2) Benchmarks. The fix replaces a lock-free O(1) TLS fast path with a global spinlock plus a linear global-page-registry scan on every small tree-sitter alloc/free across all workers — on large indexes that's exactly our hottest loop, and we gate on an indexer baseline (~4.8M nodes / 218s single-run). We'd want numbers showing parity, or a redesign that keeps the fast path lock-free (e.g. per-page atomic remote-free queue drained by the owner, or aligned pages giving O(1) pointer→page so no global scan). One residual within the current design: page->owner points into a thread's TLS and can dangle if a thread exits with registered pages (resolve workers have no terminal destroy) — the fix converts main's leak-corner into a potential write-into-freed-TLS. Also a small heads-up: cbm_slab_reset_thread silently changes semantics (rebuild→retire/free); no callers today, but worth documenting. Happy to work through this with you — allocator correctness matters a lot here, and your unit-level guards are exactly the right foundation. Thanks!

SS-42 · 2026-07-03T17:32:47Z

Updated in head a1e5aa7.

Changes since the review comment:

added terminal parser/slab teardown for resolve_worker, so registered pages do not retain an owner pointer into dead worker TLS;
added slab_destroy_with_foreign_live_chunk_is_safe alongside the existing cross-thread-free and reclaim-with-foreign-live-chunk regressions;
refreshed the PR body with the local macOS DiagnosticReports watcher/indexer stack evidence, including the build UUID caveat;
added a local CBM_WORKERS timing table. The 8/10-worker numbers show the global registry path is still a visible correctness/performance tradeoff, not a no-cost fast-path change.

Fresh local checks on macOS arm64:

ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem -> 34 passed
ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem pipeline -> 243 passed

Signed-off-by: SS-42 <noreply@incogni.to>

DeusData · 2026-07-03T22:32:43Z

Heads-up: the security / codeql-gate failure on this PR is not your change — it was a repo-side gate bug (the check scanned only the 5 newest CodeQL runs and lost track of PRs on a busy queue). That's fixed on main now (#820). Any push to your branch — or simply clicking GitHub's Update branch button — will trigger a fresh run with the fixed gate and it should go green. Sorry for the noise, and thanks for your patience!

SS-42 requested a review from DeusData as a code owner July 2, 2026 18:42

SS-42 force-pushed the fix/slab-cross-thread-free branch 3 times, most recently from 5f891fe to b391b01 Compare July 2, 2026 19:22

DeusData added bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. labels Jul 3, 2026

SS-42 added 3 commits July 3, 2026 20:34

fix(memory): handle cross-thread slab frees

cc8553c

Signed-off-by: SS-42 <noreply@incogni.to>

fix(memory): destroy resolve worker slab state

76e8399

Signed-off-by: SS-42 <noreply@incogni.to>

test(memory): cover slab destroy with foreign live chunk

a1e5aa7

Signed-off-by: SS-42 <noreply@incogni.to>

SS-42 force-pushed the fix/slab-cross-thread-free branch from 48008c5 to a1e5aa7 Compare July 3, 2026 17:34

SS-42 mentioned this pull request Jul 3, 2026

Fix CodeQL gate lookup for busy queues #820

Merged

Merge branch 'main' into fix/slab-cross-thread-free

bac535b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle cross-thread slab frees#782

Handle cross-thread slab frees#782
SS-42 wants to merge 4 commits into
DeusData:mainfrom
SS-42:fix/slab-cross-thread-free

SS-42 commented Jul 2, 2026 •

edited

Loading

Uh oh!

DeusData commented Jul 3, 2026

Uh oh!

DeusData commented Jul 3, 2026

Uh oh!

SS-42 commented Jul 3, 2026 •

edited

Loading

Uh oh!

DeusData commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SS-42 commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Crash evidence

Tests

Local indexer timing smoke

Uh oh!

DeusData commented Jul 3, 2026

Uh oh!

DeusData commented Jul 3, 2026

Uh oh!

SS-42 commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DeusData commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-42 commented Jul 2, 2026 •

edited

Loading

SS-42 commented Jul 3, 2026 •

edited

Loading