context-link

Local MCP Context Gateway for AI Coding Agents

context-link is a local MCP server that serves structured code context to AI agents, dramatically reducing token consumption compared to reading entire files. It indexes codebases using a language-agnostic Tree-sitter adapter system, builds a symbol + dependency graph, and exposes tools over the Model Context Protocol.

Supported languages: TypeScript (.ts), TSX/JSX (.tsx, .jsx), Go (.go), Python (.py), JavaScript (.js, .mjs), Rust (.rs), Java (.java), C (.c, .h), C++ (.cpp, .hpp, .cc, .cxx, .hxx, .hh), C# (.cs) — extensible via the LanguageAdapter interface.

The Problem

AI coding agents read entire files to understand context. This brute-force approach is expensive, slow, and prone to context-window overflow.

The Solution

context-link acts as a structural intermediary that extracts and serves only the relevant code symbols, dependencies, and historical notes an agent needs. By eliminating blind file reads, it reduces token consumption by over 85%.

Built-in token savings tracking: Every tool response includes tokens_saved_est, cost_avoided_est, session_tokens_saved, and session_cost_avoided in the metadata field—giving you real-time visibility into your efficiency gains.

Scenario	Avg Token Reduction	Target
Single symbol lookup (depth=0)	91.0%	>80%
Aggregate (all symbols vs all files)	92.7%	>80%
Symbol + dependencies (depth=1)	85.7%	>80%

How It Works: The Context Funnel

  Natural Language Query
         |
         v
  +-----------------------+
  | Stage 1: Semantic Scout |  Discovers matching symbol names via local
  | (Discovery)            |  vector embeddings — no file contents read
  +-----------------------+
         |
         v
  +-----------------------+
  | Stage 2: Structural    |  Extracts exact function/class body, call-graph
  | Surgeon (Extraction)   |  dependencies, and import statements via AST
  +-----------------------+
         |
         v
  +-----------------------+
  | Stage 3: The Historian |  Injects developer/agent memory notes linked
  | (Persistence)          |  to AST nodes, auto-flagged stale on changes
  +-----------------------+
         |
         v
  Minimal, precise context → LLM

Installation

Homebrew (macOS / Linux)

brew install context-link-mcp/tap/context-link

Download Binary

Pre-built binaries for Linux, macOS, and Windows (amd64) are available on the GitHub Releases page.

Build from Source

See the Building section below.

Building

Semantic search works out of the box using the built-in Model2Vec embedder (embedded in the binary). ONNX Runtime is only needed if you want to use a custom ONNX model as an override.

Prerequisites

Dependency	Version	Purpose
Go	1.22+	Runtime and build toolchain
GCC (C compiler)	Any recent	Required by `smacker/go-tree-sitter` CGo bindings
Git	Any	Dependency fetching

The SQLite driver (modernc.org/sqlite) is pure Go and does not require CGo.

Platform-Specific Setup

Windows

winget install -e --id niXman.mingw-w64-ucrt
# Restart terminal, then verify:
gcc --version

macOS

xcode-select --install

Linux (Debian/Ubuntu)

sudo apt-get install build-essential

Linux (Fedora/RHEL)

sudo dnf install gcc gcc-c++ make

Build Commands

# Development build
CGO_ENABLED=1 go build -o ./bin/context-link ./cmd/context-link

# Release build (stripped binary, with version)
CGO_ENABLED=1 go build -ldflags="-s -w -X main.version=v0.4.0" -o ./bin/context-link ./cmd/context-link

# Or use the Makefile (auto-detects version from git tags)
make build

On Windows, the output binary will be context-link.exe.

Running Tests

CGO_ENABLED=1 go test ./... -count=1

Update snapshot golden files:

CGO_ENABLED=1 go test ./internal/indexer/ -args -update-golden

Troubleshooting

CGO_ENABLED=0 or "gcc not found" — Tree-sitter grammars are C libraries compiled via CGo. Ensure a C compiler is installed and on PATH.

Windows: "cc1.exe: sorry, unimplemented: 64-bit mode not compiled in" — Use the 64-bit UCRT variant: winget install -e --id niXman.mingw-w64-ucrt

Slow first build — The first build compiles all Tree-sitter C sources. Subsequent builds use the Go build cache.

Quick Start

The fastest way to get started is using a pre-built binary. (If you prefer to compile from source, see the Building section).

Step 1: Install context-link

macOS / Linux:

brew install context-link-mcp/tap/context-link

Windows & Others: Download the latest executable for your OS from the GitHub Releases page and place it in your system's PATH.

Step 2: Index Your Project

Navigate to your codebase and run the indexer. This creates a local .context-link.db file containing your vector embeddings and AST mappings.

cd /path/to/your/project
context-link index --project-root .

(Note: You can re-run this anytime; it incrementally processes only changed files.)

Step 3: Connect Your AI IDE

context-link communicates via the Model Context Protocol (MCP) over stdio. Configure your preferred AI agent to launch the server:

For Claude Code: Create or edit .mcp.json in your project root:

{
  "mcpServers": {
    "context-link": {
      "command": "context-link",
      "args": ["serve", "--project-root", "."]
    }
  }
}

For Cursor:

Go to Settings > Features > MCP.

Click + Add New MCP Server.

Name: context-link

Type: command

Command: context-link serve --project-root /absolute/path/to/your/project

For Windsurf: Add to your mcp_config.json:

{
  "mcpServers": {
    "context-link": {
      "command": "context-link",
      "args": ["serve", "--project-root", "/absolute/path/to/your/project"]
    }
  }
}

Step 4: Start Coding

Once connected, your AI agent now has access to the full suite of context-link tools. You can instruct your agent to use the recommended workflow by adding this to your prompt or .cursorrules file:

The explore_codebase MCP prompt encodes this protocol and can be invoked directly in supported clients.

Once connected, your AI agent has access to the full suite of context-link tools. To ensure your agent uses them efficiently, apply the Recommended Agent Workflow (see below) to your .cursorrules or system prompt.

Recommended Agent Workflow

To achieve maximum token efficiency, your AI needs to know how to use the context-link tools.

You can either invoke the built-in explore_codebase MCP prompt directly in supported clients, or copy and paste the following block into your .cursorrules, agent custom instructions, or system prompt:

When exploring and modifying this codebase, you must prioritize the context-link MCP tools to minimize token consumption. Do not read raw files directly unless absolutely necessary. Follow this structural workflow:

1. Call `read_architecture_rules` at the start of a session to understand the project's constraints.
2. Call `reindex_project` after modifying files to keep the symbol graph current (safe to call repeatedly).
3. Use `get_modified_symbols` to discover what code has changed in the working tree (git-aware context).
4. Use `semantic_search_symbols` to discover relevant symbols by intent.
5. Use `get_file_skeleton` to understand a file's structure before diving in.
6. Use `get_code_by_symbol` to retrieve only the specific code you need, along with its dependencies.
7. Use `get_symbol_usages` and `get_call_tree` to explore call hierarchies and reverse dependencies.
8. Use `get_tests_for_symbol` to find test functions for a symbol you're modifying.
9. Use `find_dead_code` to discover unused symbols and `get_blast_radius` to assess the impact of your planned changes.
10. Use `find_http_routes` to discover REST route definitions and match them to call sites.
11. Always check the `memories` array in tool responses for prior human or agent findings about a symbol.
12. After completing a significant feature or fix, call `save_symbol_memory` to persist your architectural findings for future sessions.

Advanced Usage

While the Quick Start covers standard setups, context-link supports advanced workflows for complex environments.

Advanced Indexing

The index is stored in .context-link.db in your current directory. Re-run anytime — only changed files are re-processed. Semantic search embeddings are generated automatically.

./bin/context-link index --project-root /path/to/your/project

Force a full re-index (e.g., after switching embedder):

If you switch embedding models or want to cleanly rebuild the database (bypassing the incremental file hash check), use the --force flag:

./bin/context-link index --project-root /path/to/your/project --force

Index multiple repos into the same database:

./bin/context-link index --project-root /path/to/repo-a --repo-name repo-a
./bin/context-link index --project-root /path/to/repo-b --repo-name repo-b

Manual Server Testing

The server communicates via the MCP protocol over stdio. While it is meant to be launched by your IDE, you can start the server manually for debugging purposes or to verify initialization:

./bin/context-link serve --project-root /path/to/your/project

Custom ONNX Model (Optional)

The built-in Model2Vec embedder works out of the box. For higher-quality embeddings (all-MiniLM-L6-v2, 384-dim), override with ONNX:

Download all-MiniLM-L6-v2.onnx and vocab.txt from Hugging Face
Download OnnxRuntime from ONNX Runtime releases
Pass --model-path and --vocab-path to both index and serve

Switching between Model2Vec and ONNX requires --force re-indexing (128 vs 384 dimensions).

Maintenance & Uninstallation

context-link is entirely self-contained. It does not run background telemetry or scatter hidden configuration files across your system.

To clear a project's index:

Simply delete the local SQLite database in the root of your project. This will completely remove all vector embeddings and AST mappings for that repository.

rm .context-link.db

To completely uninstall:

Delete the context-link binary from your system path (or run brew uninstall context-link).
Remove the configuration entry from your IDE's MCP settings (e.g., mcp_config.json or .mcp.json).

Tech Stack

Component	Technology
Runtime	Go 1.22+
Protocol	MCP via stdio (`mcp-go` v0.44.1)
AST Parser	`go-tree-sitter` (language-agnostic via `LanguageAdapter` registry)
Database	SQLite 3 (WAL mode, pure-Go driver via `modernc.org/sqlite`)
Search Engine	Hybrid search: SQLite FTS5 (BM25) + Go-side Vector KNN, merged via Reciprocal Rank Fusion (RRF)
Embeddings	Built-in `potion-base-4M` Model2Vec (128-dim, zero-config); optional ONNX override (`all-MiniLM-L6-v2`)

CLI Reference

Global flags (all subcommands):

Flag	Default	Description
`--db-path`	`.context-link.db`	Path to SQLite database
`--project-root`	Current directory	Root directory of the project
`--log-level`	`info`	Log verbosity: `debug`, `info`, `warn`, `error`
`--config`	`.context-link.yaml`	Path to config file

index subcommand flags:

Category	Flag	Default	Description
Execution	`--force`	`false`	Force full re-index, bypassing incremental file hash checks
	`--repo-name`	Directory name	Repository name for multi-repo namespacing
Performance	`--workers`	`4`	Number of parallel worker goroutines for parsing
Embeddings (Advanced)	`--model-path`	(built-in)	Path to custom ONNX model (overrides built-in Model2Vec)
	`--vocab-path`	(built-in)	Path to `vocab.txt` for the ONNX tokenizer
	`--ort-lib-path`	(system)	Path to OnnxRuntime shared library

serve subcommand flags:

Category	Flag	Default	Description
Behavior	`--watch`	`false`	Auto re-index on file changes (fsnotify with 500ms debounce)
	`--tools`	(all)	Comma-separated list of MCP tools to enable (e.g., `ping,get_code_by_symbol`)
Embeddings (Advanced)	`--model-path`	(built-in)	Path to custom ONNX model (overrides built-in Model2Vec)
	`--vocab-path`	(built-in)	Path to `vocab.txt` for the ONNX tokenizer
	`--ort-lib-path`	(system)	Path to OnnxRuntime shared library

Configuration File

db_path: .context-link.db
project_root: .
log_level: info

# Semantic search uses built-in Model2Vec by default (zero-config).
# Uncomment below to override with a custom ONNX model:
# model_path: /path/to/all-MiniLM-L6-v2.onnx
# vocab_path: /path/to/vocab.txt
# ort_lib_path: ""

# Control which MCP tools are exposed (default: all).
# Use this to reduce prompt token budget by disabling unused tools.
# tools:
#   - ping
#   - reindex_project
#   - get_modified_symbols
#   - get_tests_for_symbol
#   - semantic_search_symbols
#   - get_code_by_symbol
#   - get_file_skeleton
#   - get_symbol_usages
#   - get_call_tree
#   - read_architecture_rules
#   - memory  # registers save_symbol_memory, get_symbol_memories, purge_stale_memories
#   - find_dead_code
#   - get_blast_radius
#   - find_http_routes

Environment variables with the CONTEXT_LINK_ prefix also work (e.g., CONTEXT_LINK_LOG_LEVEL=debug).

MCP Tool Reference

All tools return structured JSON with a metadata object including timing_ms for observability, plus tokens_saved_est, cost_avoided_est, session_tokens_saved, and session_cost_avoided for token savings tracking.

`ping`

Health-check tool.

Parameters: none

{ "status": "ok", "metadata": { "timing_ms": 1 } }

`reindex_project`

Triggers an incremental re-index of the project. Only re-parses files that changed since the last index. Call this after modifying files to ensure the symbol graph, dependencies, and search index are up to date.

Parameters: none

{
  "files_scanned": 142,
  "files_changed": 3,
  "files_deleted": 0,
  "files_unchanged": 139,
  "symbols_added": 7,
  "symbols_updated": 7,
  "dependencies_updated": 12,
  "fts_updated": true,
  "embeddings_updated": true,
  "duration_ms": 1240,
  "metadata": { "timing_ms": 1242 }
}

Note: This operation is idempotent — calling it twice with no file changes returns files_changed: 0 in ~10ms.

`get_modified_symbols`

Returns symbols (functions, methods, classes) that overlap with locally modified lines in the git working tree. Use this to orient yourself at the start of a session — it shows exactly what's being actively worked on.

Param	Type	Required	Default	Description
`base_ref`	string	no	`HEAD`	Git ref to diff against (use `main` for branch diff, or a commit SHA)
`include_staged`	boolean	no	`true`	Include staged (git add) changes in addition to unstaged

{
  "base_ref": "HEAD",
  "files_changed": 2,
  "symbols": [
    {
      "name": "ProcessOrder",
      "qualified_name": "OrderService.ProcessOrder",
      "kind": "method",
      "file_path": "internal/orders/service.go",
      "start_line": 42,
      "end_line": 78,
      "changed_lines": [45, 46, 47, 52],
      "change_type": "modified"
    }
  ],
  "metadata": { "timing_ms": 34 }
}

Tip: Call reindex_project first to ensure the index reflects the latest file state.

`get_tests_for_symbol`

Finds test functions associated with a given symbol. Uses the dependency graph (tests that call the target) and naming conventions as fallback. Helps locate tests to update after modifying a function.

Param	Type	Required	Default	Description
`symbol_name`	string	yes	—	Name or qualified name of the symbol to find tests for
`include_code`	boolean	no	`false`	Include full test function bodies (default: false saves tokens)

{
  "symbol": {
    "name": "ProcessOrder",
    "qualified_name": "OrderService.ProcessOrder",
    "kind": "method",
    "file_path": "internal/orders/service.go"
  },
  "tests": [
    {
      "name": "TestProcessOrder_Success",
      "qualified_name": "TestProcessOrder_Success",
      "file_path": "internal/orders/service_test.go",
      "start_line": 15,
      "end_line": 42,
      "match_reason": "calls_target"
    }
  ],
  "test_count": 1,
  "metadata": { "timing_ms": 8 }
}

Match reasons: calls_target (high confidence: proven call in dependency graph), name_match (lower confidence: naming convention only).

`read_architecture_rules`

Returns ARCHITECTURE.md as structured sections. Use at session start.

Parameters: none

{
  "sections": [
    { "title": "Overview", "content": "..." }
  ],
  "metadata": { "timing_ms": 3, "source": "/path/to/ARCHITECTURE.md" }
}

`get_code_by_symbol`

Extracts exact source code of one or more named symbols with transitive dependencies and import statements. Supports batch operations for multiple symbols.

Param	Type	Required	Default	Description
`symbol_name`	string OR array	yes	—	Single symbol name OR array of symbol names (max 50). Supports qualified names (e.g. `UserAuth.validateToken`)
`depth`	number	no	`1`	Dependency depth: `0` = symbol only, `1` = direct deps, max `3`. Applies to all symbols in batch

Single symbol (backward compatible):

{ "symbol_name": "validateToken", "depth": 1 }

Batch operation:

{ "symbol_name": ["validateToken", "UserAuth.login", "formatError"], "depth": 0 }

Response format (batch):

{
  "results": [
    {
      "input": "validateToken",
      "data": {
        "symbol": { "name": "validateToken", "code_block": "...", ... },
        "dependencies": [...],
        "memories": [...]
      }
    },
    {
      "input": "nonExistent",
      "error": "symbol \"nonExistent\" not found in repository \"repo\""
    }
  ],
  "total_symbols": 2,
  "success_count": 1,
  "error_count": 1,
  "metadata": { "timing_ms": 18, "tokens_saved_est": 4200 }
}

Single symbol response (legacy format for backward compatibility):

{
  "symbol": {
    "name": "validateToken", "qualified_name": "auth.validateToken",
    "kind": "function", "file_path": "internal/auth/token.go",
    "code_block": "func validateToken(tok string) error { ... }",
    "start_line": 42, "end_line": 61, "language": "go"
  },
  "dependencies": [ { "name": "parseJWT", "code_block": "..." } ],
  "dependency_count": 1,
  "memories": [ { "id": 7, "note": "Uses RS256.", "is_stale": false } ],
  "memory_count": 1,
  "metadata": { "timing_ms": 18 }
}

`semantic_search_symbols`

Discovers symbols by natural-language intent via vector embeddings. Does not return code — call get_code_by_symbol next.

Param	Type	Required	Default	Description
`query`	string	yes	—	Natural-language description of what you're looking for
`top_k`	number	no	`10`	Max results (max `50`)
`kind`	string	no	(all)	Filter: `function`, `class`, `interface`, `type`, `variable`
`file_path_prefix`	string	no	(all)	Filter by file path prefix
`min_similarity`	number	no	`0.3`	Minimum cosine similarity (0.0–1.0)

{
  "results": [
    {
      "symbol_name": "validateToken", "kind": "function",
      "file_path": "internal/auth/token.go",
      "similarity_score": 0.87, "memory_count": 1
    }
  ],
  "metadata": { "timing_ms": 0, "total_results": 1, "query": "token validation" }
}

`save_symbol_memory`

Attaches a persistent note to a symbol. Survives re-indexing; auto-flagged stale on changes. Duplicates are deduplicated.

Param	Type	Required	Default	Description
`symbol_name`	string	yes	—	Symbol to annotate
`note`	string	yes	—	The note (max 2000 chars)
`author`	string	no	`"agent"`	`"agent"` or `"developer"`

{
  "memory_id": 12, "symbol_name": "auth.validateToken",
  "file_path": "internal/auth/token.go",
  "metadata": { "timing_ms": 5 }
}

`get_symbol_memories`

Retrieves notes for a symbol or file. At least one of symbol_name or file_path required.

Param	Type	Required	Default	Description
`symbol_name`	string	no*	—	Symbol to look up
`file_path`	string	no*	—	File path for all symbols
`offset`	number	no	`0`	Pagination offset
`limit`	number	no	`20`	Max results (max `100`)

When is_stale is true: stale_reason explains why, last_known_symbol / last_known_file show the symbol's location at time of staling.

`purge_stale_memories`

Deletes stale memories. Use orphaned_only=true to only delete memories with no linked symbol.

Param	Type	Required	Default	Description
`orphaned_only`	boolean	no	`false`	Only delete orphaned memories

`get_file_skeleton`

Returns a structural outline of one or more files — symbol names, kinds, and signatures (first line of code block only). No full code bodies. Use to understand file structure before extracting specific symbols. Supports batch operations for multiple files.

Param	Type	Required	Default	Description
`file_path`	string OR array	yes	—	Single file path OR array of file paths (max 50). Relative paths (e.g. `internal/store/symbols.go`)

Single file (backward compatible):

{ "file_path": "internal/store/symbols.go" }

Batch operation:

{ "file_path": ["internal/store/symbols.go", "internal/store/files.go", "internal/store/deps.go"] }

Response format (batch):

{
  "results": [
    {
      "input": "internal/store/symbols.go",
      "data": {
        "file_path": "internal/store/symbols.go",
        "symbols": [
          { "name": "GetSymbolByName", "kind": "function", "signature": "func GetSymbolByName(...", "start_line": 42 }
        ],
        "symbol_count": 15
      }
    },
    {
      "input": "nonexistent.go",
      "error": "file does not exist: nonexistent.go"
    }
  ],
  "total_files": 2,
  "success_count": 1,
  "error_count": 1,
  "metadata": { "timing_ms": 5, "tokens_saved_est": 3200 }
}

Single file response (legacy format for backward compatibility):

{
  "file_path": "internal/store/symbols.go",
  "symbols": [
    { "name": "GetSymbolByName", "kind": "function", "signature": "func GetSymbolByName(ctx context.Context, db *DB, repoName, name string) (*models.Symbol, error) {", "start_line": 42 }
  ],
  "symbol_count": 15,
  "metadata": { "timing_ms": 2 }
}

`get_symbol_usages`

Reverse dependency lookup — finds all callers/references of a symbol.

Param	Type	Required	Default	Description
`symbol_name`	string	yes	—	Name or qualified name of the symbol

{
  "symbol": { "name": "hashFile", "qualified_name": "walker.hashFile", "kind": "function", "file_path": "internal/indexer/walker.go" },
  "usages": [
    { "caller_name": "Walk", "caller_qualified_name": "Walker.Walk", "caller_kind": "method", "file_path": "internal/indexer/walker.go", "start_line": 55, "dep_kind": "call" }
  ],
  "usage_count": 3,
  "metadata": { "timing_ms": 4 }
}

`get_call_tree`

Traverses the dependency graph to show a call hierarchy. Use direction='callees' to see what a symbol calls, or direction='callers' to see what calls it.

Param	Type	Required	Default	Description
`symbol_name`	string	yes	—	Root symbol name or qualified name
`direction`	string	no	`callees`	`callees` or `callers`
`depth`	number	no	`1`	Max traversal depth (max `3`)

{
  "root": { "name": "Walk", "qualified_name": "Walker.Walk", "kind": "method", "file_path": "internal/indexer/walker.go" },
  "edges": [
    { "depth": 1, "name": "hashFile", "qualified_name": "walker.hashFile", "kind": "function", "file_path": "internal/indexer/walker.go", "start_line": 120, "dep_kind": "call" }
  ],
  "edge_count": 5,
  "direction": "callees",
  "metadata": { "timing_ms": 3 }
}

`find_dead_code`

Discovers symbols with zero inbound dependency edges (no callers). Entry points (main, init) and variables are excluded by default.

Param	Type	Required	Default	Description
`file_path`	string	no	(all)	Limit search to a specific file
`kind`	string	no	(all)	Filter: `function`, `class`, `method`, `interface`, `type`
`exclude_exported`	boolean	no	`true`	Exclude exported symbols (uppercase-initial in Go)
`limit`	number	no	`50`	Max results (max `200`)

{
  "dead_symbols": [
    { "name": "unusedHelper", "qualified_name": "pkg.unusedHelper", "kind": "function",
      "file_path": "internal/pkg/helper.go", "start_line": 42, "language": "go" }
  ],
  "count": 1,
  "metadata": { "timing_ms": 5, "tokens_saved_est": 1200 }
}

`get_blast_radius`

BFS through callers to show everything affected by changing a symbol. Groups results by file and depth.

Param	Type	Required	Default	Description
`symbol_name`	string	yes	—	Name or qualified name of the symbol to analyze
`depth`	number	no	`2`	Max traversal depth (max `5`)

{
  "root": { "name": "hashFile", "kind": "function", "file_path": "internal/indexer/walker.go" },
  "affected_files": {
    "internal/indexer/walker.go": [
      { "name": "Walk", "kind": "method", "depth": 1, "dep_kind": "call" }
    ]
  },
  "total_affected": 3,
  "files_affected": 2,
  "by_depth": { "1": 2, "2": 1 },
  "metadata": { "timing_ms": 8, "tokens_saved_est": 2400 }
}

`search_code_patterns`

Database-driven regex search across indexed symbol code blocks. Useful for finding error sentinels, retry logic, specific function calls, or any code pattern matching a regex.

CRITICAL LIMITATION: This tool searches indexed symbol code blocks only (functions, classes, methods, types, variables). It does NOT search file-level code outside symbols such as:

Decorators (e.g., @app.route in Flask)
Top-level statements
Module docstrings
Configuration dictionaries
Import statements

For file-level patterns, read the file directly using the Read tool.

Param	Type	Required	Default	Description
`pattern`	string	yes	—	Regex pattern (Go RE2 syntax) to search for in code blocks
`file_path_prefix`	string	no	(all)	Filter to symbols in files starting with this prefix
`kind`	string	no	(all)	Filter: `function`, `class`, `interface`, `type`, `variable`, `method`
`limit`	number	no	`50`	Max results (max `200`)

{
  "results": [
    {
      "symbol_name": "ErrNotFound",
      "qualified_name": "store.ErrNotFound",
      "kind": "variable",
      "file_path": "internal/store/errors.go",
      "start_line": 12,
      "end_line": 12,
      "match_snippet": "var ErrNotFound = errors.New(\"not found\")",
      "match_indices": [18, 40]
    }
  ],
  "result_count": 1,
  "pattern": "errors\\.New\\(",
  "metadata": { "timing_ms": 8, "tokens_saved_est": 4200 }
}

Example patterns:

Find error sentinels: errors\\.New\\(
Find retry logic: retry.*(?:backoff|timeout)
Find SQL queries: SELECT.*FROM
Find unsafe string concatenation: \\+.*\\+

`find_http_routes`

Discovers HTTP route definitions and call sites in the codebase. Supports Express, Gin, FastAPI, Flask, and similar frameworks. Matches route definitions to their call sites with confidence scoring.

Param	Type	Required	Default	Description
`method`	string	no	(all)	Filter by HTTP method (GET, POST, PUT, DELETE, PATCH)
`path`	string	no	(all)	Filter routes by path substring (e.g., `/api/users`)
`file_path`	string	no	(all)	Limit search to a specific file path

{
  "routes": [
    { "method": "GET", "path_pattern": "/api/users/:id", "handler": "getUser",
      "file_path": "src/routes/users.ts", "start_line": 15, "framework": "express", "kind": "definition" }
  ],
  "route_count": 1,
  "matches": [
    { "definition": { "method": "GET", "path_pattern": "/api/users/:id", "file_path": "src/routes/users.ts" },
      "call_site": { "method": "GET", "path_pattern": "/api/users/123", "file_path": "src/client/api.ts" },
      "confidence": 0.9 }
  ],
  "match_count": 1,
  "metadata": { "timing_ms": 12 }
}

Limitations & Known Issues

While context-link dramatically reduces token consumption for symbol-based queries, it has specific limitations you should understand to use it effectively.

Symbol-Centric Index Architecture

What's Indexed:

Functions, classes, methods, interfaces, types, variables with AST nodes
Code blocks that Tree-sitter identifies as named declarations
Dependencies between these symbols (call graph)

What's NOT Indexed:

File-level code outside symbols (top-level statements in Python __main__, Go init blocks)
Import/require statements (not considered symbols)
Decorators on classes (e.g., @app.route('/api/users') in Flask)
Module docstrings (top-level documentation)
Configuration dictionaries at file scope (e.g., SETTINGS = {...} in Python)
Inline comments (only code blocks are indexed)

Impact: Tools like search_code_patterns only search indexed symbol code blocks. If you need to find decorators, imports, or top-level configuration, fall back to reading the file directly with the Read tool.

Example Scenario:

# This Flask route decorator is NOT indexed:
@app.route('/api/users', methods=['GET'])
def get_users():  # ← This function IS indexed
    return jsonify(users)

Searching for @app.route patterns will return zero results. Read routes.py directly to find decorator patterns.

Batch Parameters & Polymorphic Types

The batch-enabled tools (get_file_skeleton, get_code_by_symbol) accept either a single string OR an array of strings:

// Single file (backward compatible)
{ "file_path": "internal/store/symbols.go" }

// Multiple files (batch operation)
{ "file_path": ["internal/store/symbols.go", "internal/store/files.go"] }

Automatic JSON Array Parsing: If you accidentally pass a JSON-serialized array as a string (e.g., "[\"file1.go\", \"file2.go\"]"), the tool will automatically detect and parse it. This prevents confusing "not indexed" errors when parameters are incorrectly formatted.

Batch Limits:

Max 50 files per get_file_skeleton call
Max 50 symbols per get_code_by_symbol call
Per-item error handling: partial failures don't abort the entire batch

Tool Independence (No Compound Queries)

Each MCP tool call is independent—there are no compound queries like "find all callers of X that don't check the error return value."

Workaround: Chain tools manually:

search_code_patterns to find error-returning functions
get_symbol_usages to find callers
get_code_by_symbol to inspect each caller
Manually verify error handling logic

Example: To find functions that call store.GetUser() but don't use errors.Is():

1. search_code_patterns(pattern: "store\\.GetUser\\(")  → 12 callers
2. get_code_by_symbol(symbol_name: each caller)       → inspect code
3. Manually grep each code block for "errors.Is"

This is inherently token-expensive but still cheaper than reading all files.

No Negative Queries

You cannot search for the absence of a pattern. Queries like "functions that DON'T call errors.Is" are not supported.

Workaround:

Get the full list of target symbols (semantic_search_symbols)
Search for the positive pattern (search_code_patterns)
Manually compute the set difference

Trust the result count: If search_code_patterns returns result_count: 2, you can trust that only 2 symbols match. There's no hidden "maybe more" ambiguity.

search_code_patterns Scope Limitation

Tool: search_code_patterns

What it searches: The code_block column of the symbols table—only indexed symbol bodies.

What it CANNOT find:

Decorators (Flask @app.route, FastAPI @app.get)
Import statements (from typing import Optional)
Top-level variable assignments outside functions (e.g., logger = logging.getLogger() in Python)
Main entrypoint code (if __name__ == "__main__": blocks)

The tool description explicitly warns about this limitation to prevent confusion when searches return zero results.

When to fall back to direct file reads:

Searching for import patterns: Use Read + manual grep
Finding decorator patterns: Use find_http_routes (specialized) or Read
Analyzing main entrypoints: Read main.py, main.go, etc. directly

Performance Characteristics

Fast operations (<10ms):

semantic_search_symbols (in-memory vector cache, 0.2ms average)
get_file_skeleton (signature extraction only)
reindex_project (incremental, no file changes)

Moderate operations (10–100ms):

get_code_by_symbol with depth=1 (BFS dependency resolution)
search_code_patterns with broad regex (SQL LIKE prefiltering helps)

Slow operations (>100ms):

get_blast_radius with depth=5 (BFS through large call graphs)
reindex_project after modifying 50+ files (re-parses all changed files)

Tip: Use get_file_skeleton before get_code_by_symbol to understand file structure first. Avoids guessing symbol names.

Known Edge Cases

Overloaded symbols: If multiple symbols share the same name (e.g., validate in different classes), get_code_by_symbol(symbol_name: "validate") returns the first match by insertion order. Use qualified names (ClassName.validate) for disambiguation.
Stale embeddings: If you switch between Model2Vec (128-dim) and ONNX (384-dim) embeddings, you must run index --force to regenerate all embeddings. Dimension mismatch causes search to fail silently.
Case sensitivity: Symbol names are case-sensitive. GetUser and getUser are distinct. Use semantic_search_symbols with natural language queries if unsure of exact casing.
Multi-repo namespacing: When indexing multiple repos into one database (--repo-name), always specify repo_name in queries. Omitting it searches across all repos, which may return unexpected matches.

Best Practices for Token Efficiency

Built-in Observability: Every tool response includes real-time token savings metrics in the metadata field:

{
  "metadata": {
    "timing_ms": 18,
    "tokens_saved_est": 4200,
    "cost_avoided_est": "$0.05",
    "session_tokens_saved": 12500,
    "session_cost_avoided": "$0.15"
  }
}

Use these metrics to verify actual savings during your workflow. If tokens_saved_est is low or negative, you may be using the wrong tool for the task (e.g., searching for file-level patterns with search_code_patterns instead of Read).

DO:

✅ Use semantic_search_symbols for discovery (metadata only, ~388 tokens for 10 results)
✅ Use get_file_skeleton to understand file structure (signatures only, <200 tokens)
✅ Batch operations when inspecting multiple symbols (get_code_by_symbol with arrays)
✅ Trust result_count fields—if a search returns 2 matches, there are exactly 2 matches
✅ Check memories arrays for prior findings before re-analyzing code
✅ Monitor tokens_saved_est in responses—it's your efficiency compass

DON'T:

❌ Read entire files for symbol lookup (defeats the purpose)
❌ Search for file-level patterns with search_code_patterns (use Read instead)
❌ Expect compound queries ("callers that don't X")—chain tools manually
❌ Re-index unnecessarily—incremental updates are fast, --force is slow
❌ Ignore low or negative tokens_saved_est—it signals you're using the wrong approach

Token Savings Rule of Thumb:

Single symbol lookup: 91% reduction vs. reading full file
Symbol + dependencies (depth=1): 86% reduction
Aggregate discovery (search → skeleton → extract): 80–85% reduction

For a 70-80% token reduction across a real-world audit task (as reported in user feedback), the key is using the right tool for the job: semantic search for discovery, skeleton for structure, code extraction for implementation, and direct file reads only for file-level patterns.

Adding a New Language

Import the grammar — add the Tree-sitter C-binding (e.g., smacker/go-tree-sitter/python)
Write query files — create .scm queries in internal/indexer/adapters/queries/
Implement the adapter — satisfy LanguageAdapter interface in internal/indexer/adapters/
Register it — call registry.Register(adapter) in buildLanguageRegistry() in cmd/context-link/main.go
Add fixtures — create testdata/langs/<lang>/ and update golden snapshots

See ARCHITECTURE.md for the full component map and project structure.

Benchmarks

Measured after Phase 5 optimizations (batch DB operations, eliminated double-parsing). All benchmarks run on a single machine with SQLite WAL mode.

Indexing Performance

Repository	Language	Files	Symbols	Embeddings	Index Time	DB Size
context-link (self)	Go	59	539	528	1.05s	1.5 MB
echo	Go	90	1,901	1,576	1.85s	3.1 MB
gin	Go	99	1,892	1,722	3.54s	2.4 MB
tRPC	TypeScript	381	772	767	6.82s	—

Semantic Search Latency

Measured on context-link (560 symbols, 545 embeddings), 100 iterations per query, 10 diverse queries.

Mode	P50	Avg	Description
Uncached (DB scan)	1,880µs	1,914µs	Full SQLite BLOB read + decode per query
Cached cold	2,180µs	2,159µs	First call loads vectors into memory
Cached warm	197µs	187µs	In-memory KNN dot-product scan
End-to-end	202µs	196µs	Embed query + cached KNN search

150x improvement over pre-optimization baseline (~30ms → 0.2ms). The in-memory vector cache eliminates all SQLite I/O after the first query.

Semantic Search Quality

Example queries and top results:

Repo	Query	Top Results
echo	"static file serving"	`Static`, `StaticFileHandler`, `serveFile`
gin	"route parameter binding"	`Param`, `Params`, `ShouldBindUri`
tRPC	"middleware pipeline"	`createMiddlewareFactory`, `createBuilder`
context-link	"tree-sitter parsing"	`processFile`, `extractSymbolsAndDeps`

Optimization Impact

context-link is aggressively optimized for speed and low latency, ensuring it never bottlenecks your AI agent.

Core Benchmarks:

150x Search Speedup: Semantic search latency reduced from ~30ms to 0.2ms via in-memory vector caching and heap-based Top-K selection.
4.9x Indexing Speedup: Parallel file hashing and SQLite PRAGMA tuning process large repositories in ~1 second.
Instant Incremental Updates: Re-indexing detects changes and updates the graph in <10ms.

See ARCHITECTURE.md for detailed breakdown.

Semantic Search Token Efficiency

A semantic_search_symbols call returning 10 results averages ~388 tokens — this is a metadata-only listing (symbol names, kinds, file paths, similarity scores) with no code. The full end-to-end flow (search → extract top result with dependencies) compares favorably to reading source files directly:

Scenario	Avg Tokens	vs. Full File
Search only (10 results)	~388	Discovery without reading any file
Search + `get_code_by_symbol` (depth=0)	~615	Top result extraction
Search + `get_code_by_symbol` (depth=1)	~691	Top result with dependencies
Reading the full source file	~867	Baseline comparison

Measured on context-link itself (59 files, 1,056 symbols). A typical get_code_by_symbol call returns ~9% of what reading the full file would require.

For large files the savings are dramatic (86%+ reduction for a 3,200-token file). The comparison is conservative — in practice, an agent without semantic search would read multiple files to find the right symbol, making real-world savings significantly higher.

Security

Zero external network calls. The binary functions fully air-gapped.
All file paths are validated against the project root to prevent traversal attacks.
SQLite database is created with 0600 permissions (owner read/write only).
All SQL queries use parameterized statements.

License

Apache-2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
cmd/context-link		cmd/context-link
internal		internal
pkg/models		pkg/models
testdata		testdata
.gitignore		.gitignore
.golangci-lint.yml		.golangci-lint.yml
.goreleaser.yml		.goreleaser.yml
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

context-link

The Problem

The Solution

How It Works: The Context Funnel

Installation

Homebrew (macOS / Linux)

Download Binary

Build from Source

Building

Prerequisites

Platform-Specific Setup

Build Commands

Running Tests

Troubleshooting

Quick Start

Step 1: Install context-link

Step 2: Index Your Project

Step 3: Connect Your AI IDE

Step 4: Start Coding

Recommended Agent Workflow

Advanced Usage

Advanced Indexing

Manual Server Testing

Custom ONNX Model (Optional)

Maintenance & Uninstallation

To clear a project's index:

To completely uninstall:

Tech Stack

CLI Reference

Configuration File

MCP Tool Reference

ping

reindex_project

get_modified_symbols

get_tests_for_symbol

read_architecture_rules

get_code_by_symbol

semantic_search_symbols

save_symbol_memory

get_symbol_memories

purge_stale_memories

get_file_skeleton

get_symbol_usages

get_call_tree

find_dead_code

get_blast_radius

search_code_patterns

find_http_routes

Limitations & Known Issues

Symbol-Centric Index Architecture

Batch Parameters & Polymorphic Types

Tool Independence (No Compound Queries)

No Negative Queries

search_code_patterns Scope Limitation

Performance Characteristics

Known Edge Cases

Best Practices for Token Efficiency

Adding a New Language

Benchmarks

Indexing Performance

Semantic Search Latency

Semantic Search Quality

Optimization Impact

Semantic Search Token Efficiency

Security

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

`ping`

`reindex_project`

`get_modified_symbols`

`get_tests_for_symbol`

`read_architecture_rules`

`get_code_by_symbol`

`semantic_search_symbols`

`save_symbol_memory`

`get_symbol_memories`

`purge_stale_memories`

`get_file_skeleton`

`get_symbol_usages`

`get_call_tree`

`find_dead_code`

`get_blast_radius`

`search_code_patterns`

`find_http_routes`

Packages