Skip to content

MCP T16 — Add 5 new tree-sitter languages (Go, Rust, TypeScript, Ruby, C++) + re-enable C #664

@DvirDukhan

Description

@DvirDukhan

Context

After T15 extracts a TreeSitterAnalyzer base class, adding new languages becomes a small subclass per language. This ticket adds 5 new languages (Go, Rust, TypeScript, Ruby, C++) and re-enables C (currently commented out at api/analyzers/source_analyzer.py:28). With this change the indexer covers 11 languages total: Python, JavaScript, TypeScript, Java, C#, Kotlin, Go, Rust, Ruby, C, C++.

This dramatically widens the set of repos cgraph (and the new MCP server) can usefully index.

Scope (in)

Six small subclasses of TreeSitterAnalyzer:

  1. Goapi/analyzers/go/analyzer.py, deps tree-sitter-go, extension .go
  2. Rustapi/analyzers/rust/analyzer.py, deps tree-sitter-rust, extension .rs
  3. TypeScriptapi/analyzers/typescript/analyzer.py, deps tree-sitter-typescript, extensions .ts and .tsx
  4. Rubyapi/analyzers/ruby/analyzer.py, deps tree-sitter-ruby, extension .rb
  5. C++api/analyzers/cpp/analyzer.py, deps tree-sitter-cpp, extensions .cc, .cpp, .cxx, .hpp, .hh
  6. C (re-enable) — uncomment and fix any rot in api/analyzers/c/analyzer.py; ensure it uses the new base class from T15

Each subclass:

  • Declares its tree-sitter Language instance, node-type-to-label map, query templates, and file extensions.
  • Registers itself in api/analyzers/source_analyzer.py's extension dispatch dict.

Per-language test fixtures in tests/analyzers/fixtures/<lang>/ with a small known call graph and assertion contract (similar to T3 but per language).

Scope (out)

  • Replacing the LSP-based Java/C# analyzers with tree-sitter (these stay on multilspy).
  • Languages beyond the 6 listed (Swift, PHP, Scala, etc. — Phase 2).
  • Cross-language call resolution (e.g. JS calling a TS function).

Files

  • new api/analyzers/go/analyzer.py
  • new api/analyzers/rust/analyzer.py
  • new api/analyzers/typescript/analyzer.py
  • new api/analyzers/ruby/analyzer.py
  • new api/analyzers/cpp/analyzer.py
  • modified api/analyzers/c/analyzer.py (re-enable, port to base class)
  • modified api/analyzers/source_analyzer.py (register new extensions; uncomment C)
  • modified pyproject.toml (add tree-sitter-go, tree-sitter-rust, tree-sitter-typescript, tree-sitter-ruby, tree-sitter-cpp; check existing tree-sitter-c version)
  • new tests/analyzers/fixtures/go/, rust/, typescript/, ruby/, cpp/, c/
  • new tests/analyzers/test_new_languages.py
  • modified README.md (Supported languages section)

Acceptance criteria

  • Indexing a multi-language fixture covering all 6 new/re-enabled languages produces nodes for each.
  • Per-language test asserts a known call chain in each fixture is correctly captured (entry point → at least one downstream function).
  • All 11 languages appear in source_analyzer.py's extension dispatch.
  • README "Supported languages" section updated to list 11 languages.
  • make lint and make test clean.
  • Existing tests for Python/JS/Kotlin/Java/C# pass unchanged.
  • pyproject.toml declares all new tree-sitter grammar deps with sane version pins.

Dependencies

Notes for the implementer

  • Per-language fixtures should be tiny — 3-4 files with a known entrypoint → service → repo call chain is enough. Don't pull in real-world projects.
  • Tree-sitter grammar packages are pre-built wheels for most platforms. Verify the CI matrix can install all 5 new ones.
  • When re-enabling C, find out why it was commented out — look at git blame on source_analyzer.py:28. If there's a known bug, fix it before re-enabling.
  • TypeScript shares much of its grammar with JavaScript. Be careful that .ts files don't get routed to the JS analyzer by accident.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmcpMCP server (model context protocol) work

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions