Skip to content

SIGSEGV in lsp_cross pass when all_defs exceeds ~1189 entries (scale-dependent, not file-specific) #344

@Codyzzz-zach

Description

@Codyzzz-zach

Bug: SIGSEGV (exit 139) in lsp_cross pass when indexing TS/JS monorepos with 1189+ definitions

Summary

The lsp_cross pass crashes with SIGSEGV when indexing TypeScript/JavaScript monorepos where pxc_collect_all_defs produces a CBMLSPDef[] array exceeding ~1189 entries. The crash is scale-dependent, not file-specific — any subset of files that stays below the threshold indexes successfully.

Environment

  • OS: macOS 15 (Darwin 24.0.0, arm64)
  • Binary: v0.6.1 darwin-arm64 (prebuilt release)
  • Project: Vue 3 + TypeScript monorepo (pnpm workspace)
    • 1832 files total, 4864 defs extracted
    • 899 TS/JS/TSX files requiring lsp_cross processing
    • Backend (Kotlin) indexes fine in full mode — Kotlin has no cross-file LSP, so pxc_has_cross_lsp() returns false

Reproduction

# Clone a Vue 3 + TS monorepo with 1800+ files
codebase-memory-mcp index --project my-frontend --repo /path/to/frontend --mode full
# → SIGSEGV at lsp_cross pass (exit code 139)

Scale Threshold Isolation

Systematic binary-search testing isolates the crash to the total defs count passed to cbm_run_ts_lsp_cross:

Scope Files Defs Result
Minimal 2-file TS project 2 4 OK
apps/ alone 917 1160 OK
packages/ alone 471 1189 CRASH
packages/ui-kit alone 337 ~600 CRASH
packages/ui-kit/shadcn-ui alone 251 25 OK
Any single packages/* sub-package varies <200 OK
Combined sub-packages (total defs >1189) varies 1189+ CRASH

The crash triggers regardless of which specific files are included — it's the accumulated all_defs array size that matters.

Root Cause Analysis

The crash path is:

cbm_pipeline_pass_lsp_cross()           [pass_lsp_cross.c:400]
  → pxc_collect_all_defs()              [pass_lsp_cross.c:149]
      // Creates single shared CBMLSPDef[1189+] array
  → for each TS/JS file:
      → pxc_run_one_ts()                [pass_lsp_cross.c:381]
          → cbm_run_ts_lsp_cross()      [ts_lsp.c:4230]
              // Registers ALL defs into CBMTypeRegistry
              // → SIGSEGV

Key observations from source analysis (pass_lsp_cross.c):

  1. pxc_collect_all_defs (line 149) collects ALL project definitions into a single CBMLSPDef[] array — every Class, Interface, Function, Method, Enum, Type, Protocol, Trait across all files.

  2. cbm_pipeline_pass_lsp_cross (line 400) passes this entire all_defs array to every pxc_run_one_ts / pxc_run_one call. For 899 TS files, the same 1189+ def array is registered into 899 separate type registries.

  3. pxc_run_one_ts (line 381) uses a per-file scratch arena, but the all_defs array and the type registry built from it are not isolated — memory corruption in one iteration can cascade.

  4. The per-file scratch arena in pxc_run_one/pxc_run_one_ts was added to prevent O(N×project_size) memory growth (noted in comment at line 326-330), but it doesn't protect against corruption in the shared defs array or the type registry registration path.

Suggested Fix: Batch Processing

Split the TS/JS file loop in cbm_pipeline_pass_lsp_cross into batches. Each batch:

  1. Collects only the definitions relevant to its subset of files (direct defs + imported defs)
  2. Creates a fresh CBMLSPDef[] subset and scratch arena
  3. Processes its file subset independently

This limits the all_defs array size per batch, avoiding the memory corruption threshold while preserving cross-file resolution within each batch.

Relationship to Other Issues

Workaround

Index with mode: "moderate" (skips lsp_cross pass) or index sub-packages individually below the ~1189 def threshold.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions