This project is a prototype for reducing LLM context cost when analyzing large TypeScript repositories.
Instead of loading full source files up front, it builds a compact structural index of exports and reads implementation details only when needed.
Large monorepos can consume a massive number of tokens before an assistant answers a single question. This project demonstrates a practical workflow to keep that cost predictable:
- Build a typed repository skeleton from exported symbols.
- Let the agent reason over the skeleton first.
- Read only the files and line ranges needed for deeper answers.
The codebase currently has three primary pieces:
-
src/repoIndex.tsWalks a target directory, parses TypeScript withts-morph, and extracts exported signatures for functions, classes, interfaces, type aliases, and enums. -
src/LazyFileReader.tsReads file content on demand with controls for maximum lines, optional line ranges, and symbols-only mode. It also enforces a base directory boundary to prevent path traversal. -
src/demo.tsEnd-to-end demonstration script. It builds the skeleton, simulates selective file reads, and logs benchmark output tobenchmark-results.json.
The project now also includes a standard Gemini CLI extension manifest at the repository root, so the repo can be linked directly as an extension during development.
It now includes layered caching for both repository indexing and on-demand file reads:
- In-memory + disk cache for
repo_index - In-memory cache for
read_fileraw snapshots and symbols-only views - MCP tools for cache stats and explicit cache invalidation
src/
repoIndex.ts
LazyFileReader.ts
demo.ts
gemini-extension.json
GEMINI.md
gemini-extension/
mcp/mcp-server.example.json
tools/index.ts
tests/
repoIndex.test.ts
mcpLazyServer.test.ts
benchmark-results.json
benchmark-results-mcp.json
Requirements:
- Node.js 18+
- npm
Install dependencies:
npm installBuild the project (required for extension runtime):
npm run buildCreate local environment file:
copy .env.example .envThen set your key in .env:
GEMINI_API_KEY=your-key-herePowerShell alternative (session-only):
$env:GEMINI_API_KEY = "your-key-here"This method runs everything locally from this repository and is the baseline implementation.
Run the demo against a target directory:
npx tsx src/demo.ts ./src "What are the main exports in this codebase?"Arguments:
- Arg 1: target directory (default:
.) - Arg 2: question string (default: a generic exports question)
What the demo does:
- Builds an index of exported symbols.
- Estimates skeleton token cost vs naive full-read cost.
- Simulates reading only selected files.
- Appends a run record to
benchmark-results.json.
Option 2 moves repository reads to an MCP server so gemini-cli can call tools instead of loading large file sets directly into prompt context.
This is useful for questions like:
- How are messages sent in Rocket.Chat?
- How does user authentication work?
- How are permissions checked?
- What is the E2E encryption flow?
-
src/mcpLazyServer.tsMCP stdio server exposing two tools:repo_indexto return a typed skeleton for a target directoryread_fileto lazily fetch only needed file contentindex_cache_statsto inspect index/read cache statusindex_cache_invalidateto clear stale cache state
-
gemini-extension/mcp/mcp-server.example.jsonExample MCP server registration file for gemini-cli style configurations.
npm run mcp:serverIf you typed npm runmcp:server, that command will fail. Use npm run mcp:server with a space after run.
- Build the extension once:
npm run build- Link this repository as a Gemini extension:
gemini extensions link .- Restart gemini-cli.
- Verify the extension is active:
gemini extensions list- Ask gemini-cli to use MCP tools with an instruction like:
Use MCP tools for code analysis.
Call repo_index first for targetDir="<ABSOLUTE_PATH_TO_TARGET_REPO_SUBDIR>".
Then call read_file only when implementation details are needed.
- Start Gemini CLI:
gemini- In the interactive prompt, ask a scoped question and force tool usage:
How does message sending work in Rocket.Chat?
Use MCP tools.
Call repo_index first with targetDir="<ABSOLUTE_PATH_TO_TARGET_REPO_SUBDIR>".
Then call read_file only for relevant files.
-
Confirm the tool calls appear in output (
repo_index, thenread_file). -
For deep architecture questions such as message flow, auth flow, permissions, and E2E encryption, keep the same pattern:
repo_indexonce at the beginningread_fileonly for specific files and sections
The gemini-extension.json manifest uses ${extensionPath} so it runs cross-platform without hardcoded absolute paths.
- Open PowerShell in this repo:
cd "<ABSOLUTE_PATH_TO_CODE_ANALYZER>"- Install dependencies once:
npm install- Build before starting MCP server:
npm run build- Start MCP server (correct command):
npm run mcp:server- If you typed
npm runmcp:server, it fails becauserunand script name must be separate. - For extension-based integration, run
gemini extensions link .once and restart gemini-cli. - Ask one of your target questions and explicitly request MCP tool usage:
How are messages sent in Rocket.Chat?
Use MCP tools.
Call repo_index first for targetDir="<ABSOLUTE_PATH_TO_TARGET_REPO_SUBDIR>".
Then call read_file only for required files.
- Verify in gemini-cli output that tool calls appear for
repo_indexandread_file. - Record MCP benchmark run separately:
npx tsx src/demo.ts --mode mcp "<ABSOLUTE_PATH_TO_TARGET_REPO_SUBDIR>" "How are messages sent in Rocket.Chat?"- MCP mode appends results to
benchmark-results-mcp.jsonand keepsbenchmark-results.jsonunchanged.
- First
repo_indexcall on a target directory: cache miss (index build). - Repeated
repo_indexcall in the same process: memory cache hit. - Repeated
repo_indexcall after restart with no relevant changes: disk cache hit. - Any indexable file change: cache invalidates and rebuilds automatically.
- Use
forceRefresh=trueinrepo_indexto bypass cache manually. - Use
index_cache_invalidateto clear index cache and optionally clearread_filecache.
Cache metadata is returned in the repo_index response as:
cache.enabledcache.hitcache.layercache.cacheFilecache.fingerprint
- Start server in one terminal:
npm run mcp:server- In gemini-cli, run a prompt that explicitly requests tool usage.
- You should see tool calls to
repo_indexandread_fileinstead of broad source dumps.
- Run analysis with MCP enabled for your target question.
- Use
npx tsx src/demo.ts --mode mcp <targetDir> "<question>"to append a run tobenchmark-results-mcp.json. - Keep
benchmark-results.jsonas your local baseline and mock comparison.
Current MCP benchmark snapshot is included in benchmark-results-mcp.json.
Method 1 (Local sparse index + lazy reader):
benchmark-results.jsoncontains local baseline runs.- Example measured run: 309,357 naive tokens reduced to 14,252 total session tokens.
Method 2 (MCP + Gemini CLI lazy loading):
benchmark-results-mcp.jsoncontains MCP-specific runs.- Current snapshot preserves the same measured token profile while moving retrieval to MCP tool calls.
- Skeleton first: the model gets compact exported signatures instead of full source files.
- Lazy fetches: implementation is retrieved only when necessary.
- Scoped reads:
symbolsOnly,lineRange, andmaxLineskeep payloads bounded.
This project was validated against the Rocket.Chat codebase to trace how messages are sent through the system. The analysis demonstrates both the skeletal index approach and live MCP tool-calling.
The MCP server successfully extracted and analyzed the complete message pipeline:
-
Entry Point (Meteor Method): The client calls the
sendMessageMeteor method, which performs initial checks, enforces rate limits, and triggersexecuteSendMessage. -
Validation & Preparation (executeSendMessage): This step validates the message size, ensures the room exists, checks timestamps, and confirms the sender's identity. It also verifies if the user has permission to send messages in the specific room.
-
Core Logic (sendMessage Function):
- Apps-Engine Hooks: Triggers
IPreMessageSentPrevent,IPreMessageSentExtend, andIPreMessageSentModifyevents. - beforeSave Hooks: Executes various filters (bad words, markdown, mentions, etc.) through the
Message.beforeSaveservice call. - Persistence: The message is inserted into the Messages collection.
- Post-Persistence Apps-Engine: Triggers
IPostMessageSentorIPostSystemMessageSent.
- Apps-Engine Hooks: Triggers
-
Post-Save Actions (afterSaveMessage):
- Callbacks: Runs the
afterSaveMessagecallback, which includesnotifyUsersOnMessage. - Notifications & Updates: Updates room activity trackers, adjusts user subscription unread counts/alerts, and broadcasts changes to clients via DDP (e.g.,
notifyOnRoomChangedById). - Service-Level Post-Save:
Message.afterSavehandles additional asynchronous tasks like OEmbed link parsing.
- Callbacks: Runs the
The MCP server is running and successfully integrated with gemini-cli:
Configured MCP servers:
- rocketChatLazyIndex - Ready (4 tools)
Tools:
- mcp_rocketChatLazyIndex_read_file
- mcp_rocketChatLazyIndex_repo_index
- mcp_rocketChatLazyIndex_index_cache_stats
- mcp_rocketChatLazyIndex_index_cache_invalidate
-
Latest measured run (
Rocket.Chat/apps/meteor/server): 307,582 naive tokens reduced to 12,002 total session tokens. -
Files indexed: 148
-
Index cache: enabled (
indexCacheHit: falseon rebuild run) -
Session ID: f1718aad-c001-4b0f-9bbd-27b662c82aa0
-
Tool Calls: 10 (9 successful, 1 duplicate)
-
Success Rate: 90.0%
-
Latest measured run (
Rocket.Chat/apps/meteor/server): 307,582 naive tokens reduced to 11,595 total session tokens. -
Files indexed: 148
-
Index cache: enabled (
indexCacheHit: true)
Wall Time: 2m 42s
Agent Active: 47.7s
- API Time: 24.0s (50.2%)
- Tool Time: 23.7s (49.8%)
Token Efficiency:
- gemini-2.5-flash-lite: 1 request → 1,087 input tokens + 86 output tokens
- gemini-3-flash-preview: 11 requests → 81,037 input tokens (207,415 from cache) + 1,412 output tokens
Savings Highlight: 207,415 (71.6%) of input tokens were served from cache, directly demonstrating the lazy-loading efficiency of the MCP approach.
npm run demo
npm run mcp:server
npm run mcp:server:dev
npm test
npm run build- Replace the mock loop in
src/demo.tswith a live tool-calling flow so the model can decide when to callread_file. - Add query intent routing (planned classifier layer) to scope indexing by domain before parsing, reducing initial index size.
- Improve index fidelity with richer class details (constructors, overloads, visibility filters) while preserving compact output.
- Expand tests for
src/LazyFileReader.ts, especially path boundary checks, symbols-only output, and line-range edge cases. - Add optional TTL/size limits and cleanup for
.cache/repo-indexin long-running environments. - Document a release checklist for publishing this extension with versioned GitHub releases.