Local MCP Context Gateway for AI Coding Agents
context-link is a local MCP server that serves structured code context to AI agents, dramatically reducing token consumption compared to reading entire files. It indexes codebases using a language-agnostic Tree-sitter adapter system, builds a symbol + dependency graph, and exposes tools over the Model Context Protocol.
Supported languages: TypeScript (.ts), TSX/JSX (.tsx, .jsx), Go (.go), Python (.py), JavaScript (.js, .mjs), Rust (.rs), Java (.java), C (.c, .h), C++ (.cpp, .hpp, .cc, .cxx, .hxx, .hh), C# (.cs) — extensible via the LanguageAdapter interface.
AI coding agents read entire files to understand context. This brute-force approach is expensive, slow, and prone to context-window overflow.
context-link acts as a structural intermediary that extracts and serves only the relevant code symbols, dependencies, and historical notes an agent needs. By eliminating blind file reads, it reduces token consumption by over 85%.
Built-in token savings tracking: Every tool response includes tokens_saved_est, cost_avoided_est, session_tokens_saved, and session_cost_avoided in the metadata field—giving you real-time visibility into your efficiency gains.
| Scenario | Avg Token Reduction | Target |
|---|---|---|
| Single symbol lookup (depth=0) | 91.0% | >80% |
| Aggregate (all symbols vs all files) | 92.7% | >80% |
| Symbol + dependencies (depth=1) | 85.7% | >80% |
Natural Language Query
|
v
+-----------------------+
| Stage 1: Semantic Scout | Discovers matching symbol names via local
| (Discovery) | vector embeddings — no file contents read
+-----------------------+
|
v
+-----------------------+
| Stage 2: Structural | Extracts exact function/class body, call-graph
| Surgeon (Extraction) | dependencies, and import statements via AST
+-----------------------+
|
v
+-----------------------+
| Stage 3: The Historian | Injects developer/agent memory notes linked
| (Persistence) | to AST nodes, auto-flagged stale on changes
+-----------------------+
|
v
Minimal, precise context → LLM
brew install context-link-mcp/tap/context-linkPre-built binaries for Linux, macOS, and Windows (amd64) are available on the GitHub Releases page.
See the Building section below.
Semantic search works out of the box using the built-in Model2Vec embedder (embedded in the binary). ONNX Runtime is only needed if you want to use a custom ONNX model as an override.
| Dependency | Version | Purpose |
|---|---|---|
| Go | 1.22+ | Runtime and build toolchain |
| GCC (C compiler) | Any recent | Required by smacker/go-tree-sitter CGo bindings |
| Git | Any | Dependency fetching |
The SQLite driver (modernc.org/sqlite) is pure Go and does not require CGo.
Windows
winget install -e --id niXman.mingw-w64-ucrt
# Restart terminal, then verify:
gcc --versionmacOS
xcode-select --installLinux (Debian/Ubuntu)
sudo apt-get install build-essentialLinux (Fedora/RHEL)
sudo dnf install gcc gcc-c++ make# Development build
CGO_ENABLED=1 go build -o ./bin/context-link ./cmd/context-link
# Release build (stripped binary, with version)
CGO_ENABLED=1 go build -ldflags="-s -w -X main.version=v0.4.0" -o ./bin/context-link ./cmd/context-link
# Or use the Makefile (auto-detects version from git tags)
make buildOn Windows, the output binary will be context-link.exe.
CGO_ENABLED=1 go test ./... -count=1Update snapshot golden files:
CGO_ENABLED=1 go test ./internal/indexer/ -args -update-goldenCGO_ENABLED=0 or "gcc not found" — Tree-sitter grammars are C libraries compiled via CGo. Ensure a C compiler is installed and on PATH.
Windows: "cc1.exe: sorry, unimplemented: 64-bit mode not compiled in" — Use the 64-bit UCRT variant: winget install -e --id niXman.mingw-w64-ucrt
Slow first build — The first build compiles all Tree-sitter C sources. Subsequent builds use the Go build cache.
The fastest way to get started is using a pre-built binary. (If you prefer to compile from source, see the Building section).
macOS / Linux:
brew install context-link-mcp/tap/context-linkWindows & Others: Download the latest executable for your OS from the GitHub Releases page and place it in your system's PATH.
Navigate to your codebase and run the indexer. This creates a local .context-link.db file containing your vector embeddings and AST mappings.
cd /path/to/your/project
context-link index --project-root .(Note: You can re-run this anytime; it incrementally processes only changed files.)
context-link communicates via the Model Context Protocol (MCP) over stdio. Configure your preferred AI agent to launch the server:
For Claude Code:
Create or edit .mcp.json in your project root:
{
"mcpServers": {
"context-link": {
"command": "context-link",
"args": ["serve", "--project-root", "."]
}
}
}For Cursor:
Go to Settings > Features > MCP.
Click + Add New MCP Server.
Name: context-link
Type: command
Command: context-link serve --project-root /absolute/path/to/your/project
For Windsurf:
Add to your mcp_config.json:
{
"mcpServers": {
"context-link": {
"command": "context-link",
"args": ["serve", "--project-root", "/absolute/path/to/your/project"]
}
}
}Once connected, your AI agent now has access to the full suite of context-link tools. You can instruct your agent to use the recommended workflow by adding this to your prompt or .cursorrules file:
The
explore_codebaseMCP prompt encodes this protocol and can be invoked directly in supported clients.
Once connected, your AI agent has access to the full suite of context-link tools. To ensure your agent uses them efficiently, apply the Recommended Agent Workflow (see below) to your .cursorrules or system prompt.
To achieve maximum token efficiency, your AI needs to know how to use the context-link tools.
You can either invoke the built-in explore_codebase MCP prompt directly in supported clients, or copy and paste the following block into your .cursorrules, agent custom instructions, or system prompt:
When exploring and modifying this codebase, you must prioritize the context-link MCP tools to minimize token consumption. Do not read raw files directly unless absolutely necessary. Follow this structural workflow:
1. Call `read_architecture_rules` at the start of a session to understand the project's constraints.
2. Call `reindex_project` after modifying files to keep the symbol graph current (safe to call repeatedly).
3. Use `get_modified_symbols` to discover what code has changed in the working tree (git-aware context).
4. Use `semantic_search_symbols` to discover relevant symbols by intent.
5. Use `get_file_skeleton` to understand a file's structure before diving in.
6. Use `get_code_by_symbol` to retrieve only the specific code you need, along with its dependencies.
7. Use `get_symbol_usages` and `get_call_tree` to explore call hierarchies and reverse dependencies.
8. Use `get_tests_for_symbol` to find test functions for a symbol you're modifying.
9. Use `find_dead_code` to discover unused symbols and `get_blast_radius` to assess the impact of your planned changes.
10. Use `find_http_routes` to discover REST route definitions and match them to call sites.
11. Always check the `memories` array in tool responses for prior human or agent findings about a symbol.
12. After completing a significant feature or fix, call `save_symbol_memory` to persist your architectural findings for future sessions.
While the Quick Start covers standard setups, context-link supports advanced workflows for complex environments.
The index is stored in .context-link.db in your current directory. Re-run anytime — only changed files are re-processed. Semantic search embeddings are generated automatically.
./bin/context-link index --project-root /path/to/your/projectForce a full re-index (e.g., after switching embedder):
If you switch embedding models or want to cleanly rebuild the database (bypassing the incremental file hash check), use the --force flag:
./bin/context-link index --project-root /path/to/your/project --forceIndex multiple repos into the same database:
./bin/context-link index --project-root /path/to/repo-a --repo-name repo-a
./bin/context-link index --project-root /path/to/repo-b --repo-name repo-bThe server communicates via the MCP protocol over stdio. While it is meant to be launched by your IDE, you can start the server manually for debugging purposes or to verify initialization:
./bin/context-link serve --project-root /path/to/your/projectThe built-in Model2Vec embedder works out of the box. For higher-quality embeddings (all-MiniLM-L6-v2, 384-dim), override with ONNX:
- Download
all-MiniLM-L6-v2.onnxandvocab.txtfrom Hugging Face - Download OnnxRuntime from ONNX Runtime releases
- Pass
--model-pathand--vocab-pathto bothindexandserve
Switching between Model2Vec and ONNX requires --force re-indexing (128 vs 384 dimensions).
context-link is entirely self-contained. It does not run background telemetry or scatter hidden configuration files across your system.
Simply delete the local SQLite database in the root of your project. This will completely remove all vector embeddings and AST mappings for that repository.
rm .context-link.db- Delete the context-link binary from your system path (or run
brew uninstall context-link). - Remove the configuration entry from your IDE's MCP settings (e.g.,
mcp_config.jsonor.mcp.json).
| Component | Technology |
|---|---|
| Runtime | Go 1.22+ |
| Protocol | MCP via stdio (mcp-go v0.44.1) |
| AST Parser | go-tree-sitter (language-agnostic via LanguageAdapter registry) |
| Database | SQLite 3 (WAL mode, pure-Go driver via modernc.org/sqlite) |
| Search Engine | Hybrid search: SQLite FTS5 (BM25) + Go-side Vector KNN, merged via Reciprocal Rank Fusion (RRF) |
| Embeddings | Built-in potion-base-4M Model2Vec (128-dim, zero-config); optional ONNX override (all-MiniLM-L6-v2) |
Global flags (all subcommands):
| Flag | Default | Description |
|---|---|---|
--db-path |
.context-link.db |
Path to SQLite database |
--project-root |
Current directory | Root directory of the project |
--log-level |
info |
Log verbosity: debug, info, warn, error |
--config |
.context-link.yaml |
Path to config file |
index subcommand flags:
| Category | Flag | Default | Description |
|---|---|---|---|
| Execution | --force |
false |
Force full re-index, bypassing incremental file hash checks |
--repo-name |
Directory name | Repository name for multi-repo namespacing | |
| Performance | --workers |
4 |
Number of parallel worker goroutines for parsing |
| Embeddings (Advanced) | --model-path |
(built-in) | Path to custom ONNX model (overrides built-in Model2Vec) |
--vocab-path |
(built-in) | Path to vocab.txt for the ONNX tokenizer |
|
--ort-lib-path |
(system) | Path to OnnxRuntime shared library |
serve subcommand flags:
| Category | Flag | Default | Description |
|---|---|---|---|
| Behavior | --watch |
false |
Auto re-index on file changes (fsnotify with 500ms debounce) |
--tools |
(all) | Comma-separated list of MCP tools to enable (e.g., ping,get_code_by_symbol) |
|
| Embeddings (Advanced) | --model-path |
(built-in) | Path to custom ONNX model (overrides built-in Model2Vec) |
--vocab-path |
(built-in) | Path to vocab.txt for the ONNX tokenizer |
|
--ort-lib-path |
(system) | Path to OnnxRuntime shared library |
db_path: .context-link.db
project_root: .
log_level: info
# Semantic search uses built-in Model2Vec by default (zero-config).
# Uncomment below to override with a custom ONNX model:
# model_path: /path/to/all-MiniLM-L6-v2.onnx
# vocab_path: /path/to/vocab.txt
# ort_lib_path: ""
# Control which MCP tools are exposed (default: all).
# Use this to reduce prompt token budget by disabling unused tools.
# tools:
# - ping
# - reindex_project
# - get_modified_symbols
# - get_tests_for_symbol
# - semantic_search_symbols
# - get_code_by_symbol
# - get_file_skeleton
# - get_symbol_usages
# - get_call_tree
# - read_architecture_rules
# - memory # registers save_symbol_memory, get_symbol_memories, purge_stale_memories
# - find_dead_code
# - get_blast_radius
# - find_http_routesEnvironment variables with the CONTEXT_LINK_ prefix also work (e.g., CONTEXT_LINK_LOG_LEVEL=debug).
All tools return structured JSON with a metadata object including timing_ms for observability, plus tokens_saved_est, cost_avoided_est, session_tokens_saved, and session_cost_avoided for token savings tracking.
Health-check tool.
Parameters: none
{ "status": "ok", "metadata": { "timing_ms": 1 } }Triggers an incremental re-index of the project. Only re-parses files that changed since the last index. Call this after modifying files to ensure the symbol graph, dependencies, and search index are up to date.
Parameters: none
{
"files_scanned": 142,
"files_changed": 3,
"files_deleted": 0,
"files_unchanged": 139,
"symbols_added": 7,
"symbols_updated": 7,
"dependencies_updated": 12,
"fts_updated": true,
"embeddings_updated": true,
"duration_ms": 1240,
"metadata": { "timing_ms": 1242 }
}Note: This operation is idempotent — calling it twice with no file changes returns files_changed: 0 in ~10ms.
Returns symbols (functions, methods, classes) that overlap with locally modified lines in the git working tree. Use this to orient yourself at the start of a session — it shows exactly what's being actively worked on.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
base_ref |
string | no | HEAD |
Git ref to diff against (use main for branch diff, or a commit SHA) |
include_staged |
boolean | no | true |
Include staged (git add) changes in addition to unstaged |
{
"base_ref": "HEAD",
"files_changed": 2,
"symbols": [
{
"name": "ProcessOrder",
"qualified_name": "OrderService.ProcessOrder",
"kind": "method",
"file_path": "internal/orders/service.go",
"start_line": 42,
"end_line": 78,
"changed_lines": [45, 46, 47, 52],
"change_type": "modified"
}
],
"metadata": { "timing_ms": 34 }
}Tip: Call reindex_project first to ensure the index reflects the latest file state.
Finds test functions associated with a given symbol. Uses the dependency graph (tests that call the target) and naming conventions as fallback. Helps locate tests to update after modifying a function.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
symbol_name |
string | yes | — | Name or qualified name of the symbol to find tests for |
include_code |
boolean | no | false |
Include full test function bodies (default: false saves tokens) |
{
"symbol": {
"name": "ProcessOrder",
"qualified_name": "OrderService.ProcessOrder",
"kind": "method",
"file_path": "internal/orders/service.go"
},
"tests": [
{
"name": "TestProcessOrder_Success",
"qualified_name": "TestProcessOrder_Success",
"file_path": "internal/orders/service_test.go",
"start_line": 15,
"end_line": 42,
"match_reason": "calls_target"
}
],
"test_count": 1,
"metadata": { "timing_ms": 8 }
}Match reasons: calls_target (high confidence: proven call in dependency graph), name_match (lower confidence: naming convention only).
Returns ARCHITECTURE.md as structured sections. Use at session start.
Parameters: none
{
"sections": [
{ "title": "Overview", "content": "..." }
],
"metadata": { "timing_ms": 3, "source": "/path/to/ARCHITECTURE.md" }
}Extracts exact source code of one or more named symbols with transitive dependencies and import statements. Supports batch operations for multiple symbols.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
symbol_name |
string OR array | yes | — | Single symbol name OR array of symbol names (max 50). Supports qualified names (e.g. UserAuth.validateToken) |
depth |
number | no | 1 |
Dependency depth: 0 = symbol only, 1 = direct deps, max 3. Applies to all symbols in batch |
Single symbol (backward compatible):
{ "symbol_name": "validateToken", "depth": 1 }Batch operation:
{ "symbol_name": ["validateToken", "UserAuth.login", "formatError"], "depth": 0 }Response format (batch):
{
"results": [
{
"input": "validateToken",
"data": {
"symbol": { "name": "validateToken", "code_block": "...", ... },
"dependencies": [...],
"memories": [...]
}
},
{
"input": "nonExistent",
"error": "symbol \"nonExistent\" not found in repository \"repo\""
}
],
"total_symbols": 2,
"success_count": 1,
"error_count": 1,
"metadata": { "timing_ms": 18, "tokens_saved_est": 4200 }
}Single symbol response (legacy format for backward compatibility):
{
"symbol": {
"name": "validateToken", "qualified_name": "auth.validateToken",
"kind": "function", "file_path": "internal/auth/token.go",
"code_block": "func validateToken(tok string) error { ... }",
"start_line": 42, "end_line": 61, "language": "go"
},
"dependencies": [ { "name": "parseJWT", "code_block": "..." } ],
"dependency_count": 1,
"memories": [ { "id": 7, "note": "Uses RS256.", "is_stale": false } ],
"memory_count": 1,
"metadata": { "timing_ms": 18 }
}Discovers symbols by natural-language intent via vector embeddings. Does not return code — call get_code_by_symbol next.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | yes | — | Natural-language description of what you're looking for |
top_k |
number | no | 10 |
Max results (max 50) |
kind |
string | no | (all) | Filter: function, class, interface, type, variable |
file_path_prefix |
string | no | (all) | Filter by file path prefix |
min_similarity |
number | no | 0.3 |
Minimum cosine similarity (0.0–1.0) |
{
"results": [
{
"symbol_name": "validateToken", "kind": "function",
"file_path": "internal/auth/token.go",
"similarity_score": 0.87, "memory_count": 1
}
],
"metadata": { "timing_ms": 0, "total_results": 1, "query": "token validation" }
}Attaches a persistent note to a symbol. Survives re-indexing; auto-flagged stale on changes. Duplicates are deduplicated.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
symbol_name |
string | yes | — | Symbol to annotate |
note |
string | yes | — | The note (max 2000 chars) |
author |
string | no | "agent" |
"agent" or "developer" |
{
"memory_id": 12, "symbol_name": "auth.validateToken",
"file_path": "internal/auth/token.go",
"metadata": { "timing_ms": 5 }
}Retrieves notes for a symbol or file. At least one of symbol_name or file_path required.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
symbol_name |
string | no* | — | Symbol to look up |
file_path |
string | no* | — | File path for all symbols |
offset |
number | no | 0 |
Pagination offset |
limit |
number | no | 20 |
Max results (max 100) |
When is_stale is true: stale_reason explains why, last_known_symbol / last_known_file show the symbol's location at time of staling.
Deletes stale memories. Use orphaned_only=true to only delete memories with no linked symbol.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
orphaned_only |
boolean | no | false |
Only delete orphaned memories |
Returns a structural outline of one or more files — symbol names, kinds, and signatures (first line of code block only). No full code bodies. Use to understand file structure before extracting specific symbols. Supports batch operations for multiple files.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
file_path |
string OR array | yes | — | Single file path OR array of file paths (max 50). Relative paths (e.g. internal/store/symbols.go) |
Single file (backward compatible):
{ "file_path": "internal/store/symbols.go" }Batch operation:
{ "file_path": ["internal/store/symbols.go", "internal/store/files.go", "internal/store/deps.go"] }Response format (batch):
{
"results": [
{
"input": "internal/store/symbols.go",
"data": {
"file_path": "internal/store/symbols.go",
"symbols": [
{ "name": "GetSymbolByName", "kind": "function", "signature": "func GetSymbolByName(...", "start_line": 42 }
],
"symbol_count": 15
}
},
{
"input": "nonexistent.go",
"error": "file does not exist: nonexistent.go"
}
],
"total_files": 2,
"success_count": 1,
"error_count": 1,
"metadata": { "timing_ms": 5, "tokens_saved_est": 3200 }
}Single file response (legacy format for backward compatibility):
{
"file_path": "internal/store/symbols.go",
"symbols": [
{ "name": "GetSymbolByName", "kind": "function", "signature": "func GetSymbolByName(ctx context.Context, db *DB, repoName, name string) (*models.Symbol, error) {", "start_line": 42 }
],
"symbol_count": 15,
"metadata": { "timing_ms": 2 }
}Reverse dependency lookup — finds all callers/references of a symbol.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
symbol_name |
string | yes | — | Name or qualified name of the symbol |
{
"symbol": { "name": "hashFile", "qualified_name": "walker.hashFile", "kind": "function", "file_path": "internal/indexer/walker.go" },
"usages": [
{ "caller_name": "Walk", "caller_qualified_name": "Walker.Walk", "caller_kind": "method", "file_path": "internal/indexer/walker.go", "start_line": 55, "dep_kind": "call" }
],
"usage_count": 3,
"metadata": { "timing_ms": 4 }
}Traverses the dependency graph to show a call hierarchy. Use direction='callees' to see what a symbol calls, or direction='callers' to see what calls it.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
symbol_name |
string | yes | — | Root symbol name or qualified name |
direction |
string | no | callees |
callees or callers |
depth |
number | no | 1 |
Max traversal depth (max 3) |
{
"root": { "name": "Walk", "qualified_name": "Walker.Walk", "kind": "method", "file_path": "internal/indexer/walker.go" },
"edges": [
{ "depth": 1, "name": "hashFile", "qualified_name": "walker.hashFile", "kind": "function", "file_path": "internal/indexer/walker.go", "start_line": 120, "dep_kind": "call" }
],
"edge_count": 5,
"direction": "callees",
"metadata": { "timing_ms": 3 }
}Discovers symbols with zero inbound dependency edges (no callers). Entry points (main, init) and variables are excluded by default.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
file_path |
string | no | (all) | Limit search to a specific file |
kind |
string | no | (all) | Filter: function, class, method, interface, type |
exclude_exported |
boolean | no | true |
Exclude exported symbols (uppercase-initial in Go) |
limit |
number | no | 50 |
Max results (max 200) |
{
"dead_symbols": [
{ "name": "unusedHelper", "qualified_name": "pkg.unusedHelper", "kind": "function",
"file_path": "internal/pkg/helper.go", "start_line": 42, "language": "go" }
],
"count": 1,
"metadata": { "timing_ms": 5, "tokens_saved_est": 1200 }
}BFS through callers to show everything affected by changing a symbol. Groups results by file and depth.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
symbol_name |
string | yes | — | Name or qualified name of the symbol to analyze |
depth |
number | no | 2 |
Max traversal depth (max 5) |
{
"root": { "name": "hashFile", "kind": "function", "file_path": "internal/indexer/walker.go" },
"affected_files": {
"internal/indexer/walker.go": [
{ "name": "Walk", "kind": "method", "depth": 1, "dep_kind": "call" }
]
},
"total_affected": 3,
"files_affected": 2,
"by_depth": { "1": 2, "2": 1 },
"metadata": { "timing_ms": 8, "tokens_saved_est": 2400 }
}Database-driven regex search across indexed symbol code blocks. Useful for finding error sentinels, retry logic, specific function calls, or any code pattern matching a regex.
CRITICAL LIMITATION: This tool searches indexed symbol code blocks only (functions, classes, methods, types, variables). It does NOT search file-level code outside symbols such as:
- Decorators (e.g.,
@app.routein Flask) - Top-level statements
- Module docstrings
- Configuration dictionaries
- Import statements
For file-level patterns, read the file directly using the Read tool.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
pattern |
string | yes | — | Regex pattern (Go RE2 syntax) to search for in code blocks |
file_path_prefix |
string | no | (all) | Filter to symbols in files starting with this prefix |
kind |
string | no | (all) | Filter: function, class, interface, type, variable, method |
limit |
number | no | 50 |
Max results (max 200) |
{
"results": [
{
"symbol_name": "ErrNotFound",
"qualified_name": "store.ErrNotFound",
"kind": "variable",
"file_path": "internal/store/errors.go",
"start_line": 12,
"end_line": 12,
"match_snippet": "var ErrNotFound = errors.New(\"not found\")",
"match_indices": [18, 40]
}
],
"result_count": 1,
"pattern": "errors\\.New\\(",
"metadata": { "timing_ms": 8, "tokens_saved_est": 4200 }
}Example patterns:
- Find error sentinels:
errors\\.New\\( - Find retry logic:
retry.*(?:backoff|timeout) - Find SQL queries:
SELECT.*FROM - Find unsafe string concatenation:
\\+.*\\+
Discovers HTTP route definitions and call sites in the codebase. Supports Express, Gin, FastAPI, Flask, and similar frameworks. Matches route definitions to their call sites with confidence scoring.
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
method |
string | no | (all) | Filter by HTTP method (GET, POST, PUT, DELETE, PATCH) |
path |
string | no | (all) | Filter routes by path substring (e.g., /api/users) |
file_path |
string | no | (all) | Limit search to a specific file path |
{
"routes": [
{ "method": "GET", "path_pattern": "/api/users/:id", "handler": "getUser",
"file_path": "src/routes/users.ts", "start_line": 15, "framework": "express", "kind": "definition" }
],
"route_count": 1,
"matches": [
{ "definition": { "method": "GET", "path_pattern": "/api/users/:id", "file_path": "src/routes/users.ts" },
"call_site": { "method": "GET", "path_pattern": "/api/users/123", "file_path": "src/client/api.ts" },
"confidence": 0.9 }
],
"match_count": 1,
"metadata": { "timing_ms": 12 }
}While context-link dramatically reduces token consumption for symbol-based queries, it has specific limitations you should understand to use it effectively.
What's Indexed:
- Functions, classes, methods, interfaces, types, variables with AST nodes
- Code blocks that Tree-sitter identifies as named declarations
- Dependencies between these symbols (call graph)
What's NOT Indexed:
- File-level code outside symbols (top-level statements in Python
__main__, Go init blocks) - Import/require statements (not considered symbols)
- Decorators on classes (e.g.,
@app.route('/api/users')in Flask) - Module docstrings (top-level documentation)
- Configuration dictionaries at file scope (e.g.,
SETTINGS = {...}in Python) - Inline comments (only code blocks are indexed)
Impact: Tools like search_code_patterns only search indexed symbol code blocks. If you need to find decorators, imports, or top-level configuration, fall back to reading the file directly with the Read tool.
Example Scenario:
# This Flask route decorator is NOT indexed:
@app.route('/api/users', methods=['GET'])
def get_users(): # ← This function IS indexed
return jsonify(users)Searching for @app.route patterns will return zero results. Read routes.py directly to find decorator patterns.
The batch-enabled tools (get_file_skeleton, get_code_by_symbol) accept either a single string OR an array of strings:
// Single file (backward compatible)
{ "file_path": "internal/store/symbols.go" }
// Multiple files (batch operation)
{ "file_path": ["internal/store/symbols.go", "internal/store/files.go"] }Automatic JSON Array Parsing: If you accidentally pass a JSON-serialized array as a string (e.g., "[\"file1.go\", \"file2.go\"]"), the tool will automatically detect and parse it. This prevents confusing "not indexed" errors when parameters are incorrectly formatted.
Batch Limits:
- Max 50 files per
get_file_skeletoncall - Max 50 symbols per
get_code_by_symbolcall - Per-item error handling: partial failures don't abort the entire batch
Each MCP tool call is independent—there are no compound queries like "find all callers of X that don't check the error return value."
Workaround: Chain tools manually:
search_code_patternsto find error-returning functionsget_symbol_usagesto find callersget_code_by_symbolto inspect each caller- Manually verify error handling logic
Example: To find functions that call store.GetUser() but don't use errors.Is():
1. search_code_patterns(pattern: "store\\.GetUser\\(") → 12 callers
2. get_code_by_symbol(symbol_name: each caller) → inspect code
3. Manually grep each code block for "errors.Is"
This is inherently token-expensive but still cheaper than reading all files.
You cannot search for the absence of a pattern. Queries like "functions that DON'T call errors.Is" are not supported.
Workaround:
- Get the full list of target symbols (
semantic_search_symbols) - Search for the positive pattern (
search_code_patterns) - Manually compute the set difference
Trust the result count: If search_code_patterns returns result_count: 2, you can trust that only 2 symbols match. There's no hidden "maybe more" ambiguity.
Tool: search_code_patterns
What it searches: The code_block column of the symbols table—only indexed symbol bodies.
What it CANNOT find:
- Decorators (Flask
@app.route, FastAPI@app.get) - Import statements (
from typing import Optional) - Top-level variable assignments outside functions (e.g.,
logger = logging.getLogger()in Python) - Main entrypoint code (
if __name__ == "__main__":blocks)
The tool description explicitly warns about this limitation to prevent confusion when searches return zero results.
When to fall back to direct file reads:
- Searching for import patterns: Use
Read+ manual grep - Finding decorator patterns: Use
find_http_routes(specialized) orRead - Analyzing main entrypoints: Read
main.py,main.go, etc. directly
Fast operations (<10ms):
semantic_search_symbols(in-memory vector cache, 0.2ms average)get_file_skeleton(signature extraction only)reindex_project(incremental, no file changes)
Moderate operations (10–100ms):
get_code_by_symbolwithdepth=1(BFS dependency resolution)search_code_patternswith broad regex (SQL LIKE prefiltering helps)
Slow operations (>100ms):
get_blast_radiuswithdepth=5(BFS through large call graphs)reindex_projectafter modifying 50+ files (re-parses all changed files)
Tip: Use get_file_skeleton before get_code_by_symbol to understand file structure first. Avoids guessing symbol names.
-
Overloaded symbols: If multiple symbols share the same name (e.g.,
validatein different classes),get_code_by_symbol(symbol_name: "validate")returns the first match by insertion order. Use qualified names (ClassName.validate) for disambiguation. -
Stale embeddings: If you switch between Model2Vec (128-dim) and ONNX (384-dim) embeddings, you must run
index --forceto regenerate all embeddings. Dimension mismatch causes search to fail silently. -
Case sensitivity: Symbol names are case-sensitive.
GetUserandgetUserare distinct. Usesemantic_search_symbolswith natural language queries if unsure of exact casing. -
Multi-repo namespacing: When indexing multiple repos into one database (
--repo-name), always specifyrepo_namein queries. Omitting it searches across all repos, which may return unexpected matches.
Built-in Observability:
Every tool response includes real-time token savings metrics in the metadata field:
{
"metadata": {
"timing_ms": 18,
"tokens_saved_est": 4200,
"cost_avoided_est": "$0.05",
"session_tokens_saved": 12500,
"session_cost_avoided": "$0.15"
}
}Use these metrics to verify actual savings during your workflow. If tokens_saved_est is low or negative, you may be using the wrong tool for the task (e.g., searching for file-level patterns with search_code_patterns instead of Read).
DO:
- ✅ Use
semantic_search_symbolsfor discovery (metadata only, ~388 tokens for 10 results) - ✅ Use
get_file_skeletonto understand file structure (signatures only, <200 tokens) - ✅ Batch operations when inspecting multiple symbols (
get_code_by_symbolwith arrays) - ✅ Trust
result_countfields—if a search returns 2 matches, there are exactly 2 matches - ✅ Check
memoriesarrays for prior findings before re-analyzing code - ✅ Monitor
tokens_saved_estin responses—it's your efficiency compass
DON'T:
- ❌ Read entire files for symbol lookup (defeats the purpose)
- ❌ Search for file-level patterns with
search_code_patterns(useReadinstead) - ❌ Expect compound queries ("callers that don't X")—chain tools manually
- ❌ Re-index unnecessarily—incremental updates are fast,
--forceis slow - ❌ Ignore low or negative
tokens_saved_est—it signals you're using the wrong approach
Token Savings Rule of Thumb:
- Single symbol lookup: 91% reduction vs. reading full file
- Symbol + dependencies (depth=1): 86% reduction
- Aggregate discovery (search → skeleton → extract): 80–85% reduction
For a 70-80% token reduction across a real-world audit task (as reported in user feedback), the key is using the right tool for the job: semantic search for discovery, skeleton for structure, code extraction for implementation, and direct file reads only for file-level patterns.
- Import the grammar — add the Tree-sitter C-binding (e.g.,
smacker/go-tree-sitter/python) - Write query files — create
.scmqueries ininternal/indexer/adapters/queries/ - Implement the adapter — satisfy
LanguageAdapterinterface ininternal/indexer/adapters/ - Register it — call
registry.Register(adapter)inbuildLanguageRegistry()incmd/context-link/main.go - Add fixtures — create
testdata/langs/<lang>/and update golden snapshots
See ARCHITECTURE.md for the full component map and project structure.
Measured after Phase 5 optimizations (batch DB operations, eliminated double-parsing). All benchmarks run on a single machine with SQLite WAL mode.
| Repository | Language | Files | Symbols | Embeddings | Index Time | DB Size |
|---|---|---|---|---|---|---|
| context-link (self) | Go | 59 | 539 | 528 | 1.05s | 1.5 MB |
| echo | Go | 90 | 1,901 | 1,576 | 1.85s | 3.1 MB |
| gin | Go | 99 | 1,892 | 1,722 | 3.54s | 2.4 MB |
| tRPC | TypeScript | 381 | 772 | 767 | 6.82s | — |
Measured on context-link (560 symbols, 545 embeddings), 100 iterations per query, 10 diverse queries.
| Mode | P50 | Avg | Description |
|---|---|---|---|
| Uncached (DB scan) | 1,880µs | 1,914µs | Full SQLite BLOB read + decode per query |
| Cached cold | 2,180µs | 2,159µs | First call loads vectors into memory |
| Cached warm | 197µs | 187µs | In-memory KNN dot-product scan |
| End-to-end | 202µs | 196µs | Embed query + cached KNN search |
150x improvement over pre-optimization baseline (~30ms → 0.2ms). The in-memory vector cache eliminates all SQLite I/O after the first query.
Example queries and top results:
| Repo | Query | Top Results |
|---|---|---|
| echo | "static file serving" | Static, StaticFileHandler, serveFile |
| gin | "route parameter binding" | Param, Params, ShouldBindUri |
| tRPC | "middleware pipeline" | createMiddlewareFactory, createBuilder |
| context-link | "tree-sitter parsing" | processFile, extractSymbolsAndDeps |
context-link is aggressively optimized for speed and low latency, ensuring it never bottlenecks your AI agent.
Core Benchmarks:
- 150x Search Speedup: Semantic search latency reduced from ~30ms to 0.2ms via in-memory vector caching and heap-based Top-K selection.
- 4.9x Indexing Speedup: Parallel file hashing and SQLite PRAGMA tuning process large repositories in ~1 second.
- Instant Incremental Updates: Re-indexing detects changes and updates the graph in <10ms.
See ARCHITECTURE.md for detailed breakdown.
A semantic_search_symbols call returning 10 results averages ~388 tokens — this is a metadata-only listing (symbol names, kinds, file paths, similarity scores) with no code. The full end-to-end flow (search → extract top result with dependencies) compares favorably to reading source files directly:
| Scenario | Avg Tokens | vs. Full File |
|---|---|---|
| Search only (10 results) | ~388 | Discovery without reading any file |
Search + get_code_by_symbol (depth=0) |
~615 | Top result extraction |
Search + get_code_by_symbol (depth=1) |
~691 | Top result with dependencies |
| Reading the full source file | ~867 | Baseline comparison |
Measured on context-link itself (59 files, 1,056 symbols). A typical get_code_by_symbol call returns ~9% of what reading the full file would require.
For large files the savings are dramatic (86%+ reduction for a 3,200-token file). The comparison is conservative — in practice, an agent without semantic search would read multiple files to find the right symbol, making real-world savings significantly higher.
- Zero external network calls. The binary functions fully air-gapped.
- All file paths are validated against the project root to prevent traversal attacks.
- SQLite database is created with
0600permissions (owner read/write only). - All SQL queries use parameterized statements.
Apache-2.0 — see LICENSE.