Handle cross-thread slab frees#782
Conversation
5f891fe to
b391b01
Compare
|
Thanks for the allocator fix. Triage: high-priority memory-safety/stability PR. Review will be fairly strict here: cross-thread slab ownership is low-level and can create subtle corruption if half-fixed. We will look for ASan/UBSan coverage, cross-thread free regression coverage, and confirmation that normal same-thread allocation/free behavior is unchanged. |
|
Thank you — this is a serious, well-written PR and the two unit tests are honest red-first reproductions of the allocator invariant (cross-thread free of a slab chunk falls through to plain free() of an interior pointer; foreign-live chunk across reclaim). Before we can take a redesign of this hot path, though, we need two things: (1) Evidence of an in-pipeline trigger. We traced the current pipeline and found parse-thread==free-thread holds everywhere (extract workers free trees before caching; resolve workers re-parse on their own thread; the sequential path is single-threaded) — so as far as we can tell this hardens a latent class rather than fixing an observed in-tree crash. Could you share the macOS malloc-diagnostics stack/repro that motivated it? If there IS a real path, that changes the calculus entirely. (2) Benchmarks. The fix replaces a lock-free O(1) TLS fast path with a global spinlock plus a linear global-page-registry scan on every small tree-sitter alloc/free across all workers — on large indexes that's exactly our hottest loop, and we gate on an indexer baseline (~4.8M nodes / 218s single-run). We'd want numbers showing parity, or a redesign that keeps the fast path lock-free (e.g. per-page atomic remote-free queue drained by the owner, or aligned pages giving O(1) pointer→page so no global scan). One residual within the current design: page->owner points into a thread's TLS and can dangle if a thread exits with registered pages (resolve workers have no terminal destroy) — the fix converts main's leak-corner into a potential write-into-freed-TLS. Also a small heads-up: cbm_slab_reset_thread silently changes semantics (rebuild→retire/free); no callers today, but worth documenting. Happy to work through this with you — allocator correctness matters a lot here, and your unit-level guards are exactly the right foundation. Thanks! |
|
Updated in head Changes since the review comment:
Fresh local checks on macOS arm64:
|
Signed-off-by: SS-42 <noreply@incogni.to>
Signed-off-by: SS-42 <noreply@incogni.to>
Signed-off-by: SS-42 <noreply@incogni.to>
48008c5 to
a1e5aa7
Compare
|
Heads-up: the |
Summary
This fixes slab allocator ownership for tree-sitter allocations that are freed on a different parser thread than the one that allocated them.
The allocator now keeps a global slab-page registry. Each page still has an owner thread for reuse, but cross-thread frees are recognized as slab pointers instead of falling through to plain
free(). Reclaim/destroy detaches pages with no live chunks and retires pages that still have live chunks, freeing them when the final chunk is returned.The slab payload is explicitly aligned to
max_align_t, and resolve workers now tear down their thread-local parser/slab state before exiting so registered pages do not keep an owner pointer into dead TLS.Observed environment: macOS 26.5.2 (25F84), Darwin 25.5.0 arm64, Apple clang 21.0.0.
Crash evidence
Local macOS DiagnosticReports show in-pipeline watcher/indexer aborts with
SIGABRT/___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATEDunder tree-sitter frames, including:2026-07-02 19:31:58 +0300, PID 74258:ts_stack_delete -> ts_parser_delete -> cbm_destroy_thread_parser -> extract_worker -> cbm_parallel_extract -> cbm_pipeline_run -> watcher_index_fn -> cbm_watcher_poll_once -> cbm_watcher_run -> watcher_thread2026-07-02 19:31:59 +0300, PID 92286:_realloc -> ts_subtree_compress -> ts_parser_parse -> cbm_extract_file -> extract_worker -> cbm_parallel_extract -> cbm_pipeline_run -> watcher_index_fn -> cbm_watcher_poll_once -> cbm_watcher_run -> watcher_thread2026-07-03 18:48:33 +0300, PID 96664:stack_node_release -> ts_stack_renumber_version -> ts_parser_parse -> cbm_extract_file -> extract_worker -> cbm_parallel_extract -> cbm_pipeline_run_incremental -> cbm_pipeline_run -> watcher_index_fn -> cbm_watcher_poll_once -> cbm_watcher_run -> watcher_threadThe
2026-07-03reports came from local build UUID459D6782-328E-308C-BDFB-A117FEC793ED. That build is useful provenance for the same watcher/tree-sitter failure class, but it is not byte-identical to this PR head: this PR head additionally hasalignas(max_align_t)and resolve-worker terminal slab teardown.Tests
make -f Makefile.cbm build/c/test-runner CC=clang CXX=clang++ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem(34 passed)ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem pipeline(243 passed)Local indexer timing smoke
Same macOS arm64 host, same checkout, fresh
CBM_CACHE_DIRper run,cli index_repository --mode fast:CBM_WORKERSA separate default-worker smoke on the same checkout produced main
8.700s / 8.537sand PR head13.675s / 9.116s; the first PR run appears cold/outlier, while the second is about +7% versus main. The global registry remains a deliberate correctness/performance tradeoff in this version rather than a proven no-cost fast-path change.