MCP T18 — Tree-sitter symbol resolver for Python (replace jedi in second_pass)

## Context

`second_pass` in `api/analyzers/source_analyzer.py` dominates indexing wall-time on real Python repos (e.g. sympy ~30 min, pytest ~4 min). The cost is dominated by `lsp.request_definition` calls funneled through multilspy's `SyncLanguageServer`, which in turn shells out to jedi.

Two structural problems make this expensive:

1. **~80% of jedi calls return `None`** ("Unexpected response from Language Server: None"). We pay 50–200 ms per call for nothing.
2. **multilspy serializes everything** through one asyncio loop on one daemon thread (verified empirically — see #687 comment). Threading the caller side gives zero speedup.

We already pay for tree-sitter to parse every file in `first_pass`. We have the parse tree, the entity table, and the import list. A tree-sitter-based static resolver can replace jedi for ~90% of resolution cases (module-local names, intra-project imports, attribute access on known classes) at orders-of-magnitude lower per-call cost.

## Goal

Add a `TreeSitterPythonResolver` that implements the same `resolve_symbol(...)` contract as the jedi path, selectable at runtime via an env var. When enabled, indexing should:

- Match jedi's edge counts within ~5% on a representative corpus (pytest, sympy, xarray).
- Run with no LSP server process at all.
- Be trivially parallelizable across files (closes the loop on #687).

## Scope (in)

1. **New `api/analyzers/python/tree_sitter_resolver.py`** — `TreeSitterPythonResolver` implementing the analyzer `resolve_symbol(files, lsp, file_path, project_path, key, symbol)` contract. The `lsp` argument is accepted but ignored.
2. **Project-wide symbol table** — built once in `first_pass`: `{(module_path, name) -> (file_path, node)}`. Subclasses `TreeSitterAnalyzer` from T15 to share parser/query plumbing.
3. **Resolution rules (initial scope, Python only)**:
   - Module-local names (function / class defined in same file).
   - Names imported via `from X import Y` (resolve to module `X` then `Y`).
   - Names imported via `import X` then `X.Y`.
   - Method calls on instances whose class is statically inferable (assignment from `Cls()`).
   - Class-attribute / static method access (`Cls.method`).
4. **Runtime selection** — `CODE_GRAPH_PY_RESOLVER=tree_sitter|jedi` env var. Default `jedi` so existing behavior is unchanged until we A/B.
5. **Parallel indexing** — once tree-sitter resolver works, switch `second_pass` to `ProcessPoolExecutor` over files. Resolver is a pure function of `(file, project_symbol_table)`, no shared mutable state.
6. **Bench harness A/B** — extend `bench_index_test.py` to compare jedi vs tree-sitter on `node_count`, `edge_count`, and wall-time for pytest-6202, sympy-20154, xarray-3993.

## Scope (out)

- Non-Python languages (JS/Kotlin/Java/C# stay on their current path).
- Dynamic resolution (decorators, metaclasses, `getattr`, monkey-patching) — falls back to "unresolved", matching jedi's `None` behavior.
- Type inference beyond direct assignment.
- Changes to graph schema.

## Files

- new `api/analyzers/python/tree_sitter_resolver.py`
- modified `api/analyzers/python/analyzer.py` — emit the project symbol table during `first_pass`
- modified `api/analyzers/source_analyzer.py` — resolver selection
- modified `bench_index_test.py` — A/B harness
- new `tests/analyzers/test_tree_sitter_resolver.py`

## Acceptance criteria

- [ ] On pytest-6202: tree-sitter edge count within ±5% of jedi baseline (1976 CALLS / 4509 DEFINES / 71 EXTENDS).
- [ ] On the same repo: tree-sitter wall-time < 30% of jedi wall-time, single-threaded.
- [ ] With `CODE_GRAPH_INDEX_WORKERS=4`: additional 2–3× speedup over single-threaded tree-sitter.
- [ ] `make test` and `make lint` clean.
- [ ] No regression on the calibration corpus (token usage on the code_graph bench track stable within ±5%).

## Dependencies

- **Depends on #663 (T15)** — uses the `TreeSitterAnalyzer` base class.
- **Unblocks #687** — once resolution is GIL-free, parallel indexing is trivial.
- **Related to #688** — the resolve/write split lands the structural scaffolding for parallel resolution; this PR replaces the resolver itself.

## Notes for the implementer

- Build the project symbol table in `first_pass` so `second_pass` doesn't re-traverse. Estimate ~1 KB per defined symbol.
- For imports, use tree-sitter queries on `import_statement` / `import_from_statement` nodes; resolve module paths via `sys.path`-like lookup rooted at the project root (no installed-deps resolution — out of scope).
- Match jedi's "miss → no edge" semantics exactly; don't fabricate edges to make numbers look better. The A/B compares quality honestly.
- The 5% edge-count tolerance accounts for jedi-only wins (dynamic dispatch) and tree-sitter-only wins (cases jedi returns None on).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP T18 — Tree-sitter symbol resolver for Python (replace jedi in second_pass) #689

Context

Goal

Scope (in)

Scope (out)

Files

Acceptance criteria

Dependencies

Notes for the implementer

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MCP T18 — Tree-sitter symbol resolver for Python (replace jedi in second_pass) #689

Description

Context

Goal

Scope (in)

Scope (out)

Files

Acceptance criteria

Dependencies

Notes for the implementer

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions