Rocket.Chat Code Analyzer

This project is a prototype for reducing LLM context cost when analyzing large TypeScript repositories.

Instead of loading full source files up front, it builds a compact structural index of exports and reads implementation details only when needed.

Why This Exists

Large monorepos can consume a massive number of tokens before an assistant answers a single question. This project demonstrates a practical workflow to keep that cost predictable:

Build a typed repository skeleton from exported symbols.
Let the agent reason over the skeleton first.
Read only the files and line ranges needed for deeper answers.

Current Architecture

The codebase currently has three primary pieces:

src/repoIndex.ts Walks a target directory, parses TypeScript with ts-morph, and extracts exported signatures for functions, classes, interfaces, type aliases, and enums.
src/LazyFileReader.ts Reads file content on demand with controls for maximum lines, optional line ranges, and symbols-only mode. It also enforces a base directory boundary to prevent path traversal.
src/demo.ts End-to-end demonstration script. It builds the skeleton, simulates selective file reads, and logs benchmark output to benchmark-results.json.

The project now also includes a standard Gemini CLI extension manifest at the repository root, so the repo can be linked directly as an extension during development.

It now includes layered caching for both repository indexing and on-demand file reads:

In-memory + disk cache for repo_index
In-memory cache for read_file raw snapshots and symbols-only views
MCP tools for cache stats and explicit cache invalidation

Project Layout

src/
     repoIndex.ts
     LazyFileReader.ts
     demo.ts
gemini-extension.json
GEMINI.md
gemini-extension/
     mcp/mcp-server.example.json
     tools/index.ts
tests/
     repoIndex.test.ts
     mcpLazyServer.test.ts
benchmark-results.json
benchmark-results-mcp.json

Setup

Requirements:

Node.js 18+
npm

Install dependencies:

npm install

Build the project (required for extension runtime):

npm run build

Create local environment file:

copy .env.example .env

Then set your key in .env:

GEMINI_API_KEY=your-key-here

PowerShell alternative (session-only):

$env:GEMINI_API_KEY = "your-key-here"

Method 1: Local Sparse Index + Lazy Reader

This method runs everything locally from this repository and is the baseline implementation.

Usage

Run the demo against a target directory:

npx tsx src/demo.ts ./src "What are the main exports in this codebase?"

Arguments:

Arg 1: target directory (default: .)
Arg 2: question string (default: a generic exports question)

What the demo does:

Builds an index of exported symbols.
Estimates skeleton token cost vs naive full-read cost.
Simulates reading only selected files.
Appends a run record to benchmark-results.json.

Method 2: MCP + Gemini CLI Lazy Loading

Option 2 moves repository reads to an MCP server so gemini-cli can call tools instead of loading large file sets directly into prompt context.

This is useful for questions like:

How are messages sent in Rocket.Chat?
How does user authentication work?
How are permissions checked?
What is the E2E encryption flow?

What was added

src/mcpLazyServer.ts MCP stdio server exposing two tools:
- repo_index to return a typed skeleton for a target directory
- read_file to lazily fetch only needed file content
- index_cache_stats to inspect index/read cache status
- index_cache_invalidate to clear stale cache state
gemini-extension/mcp/mcp-server.example.json Example MCP server registration file for gemini-cli style configurations.

Run the MCP server

npm run mcp:server

If you typed npm runmcp:server, that command will fail. Use npm run mcp:server with a space after run.

Integrate with gemini-cli (standard extension flow)

Build the extension once:

npm run build

Link this repository as a Gemini extension:

gemini extensions link .

Restart gemini-cli.
Verify the extension is active:

gemini extensions list

Ask gemini-cli to use MCP tools with an instruction like:

Use MCP tools for code analysis.
Call repo_index first for targetDir="<ABSOLUTE_PATH_TO_TARGET_REPO_SUBDIR>".
Then call read_file only when implementation details are needed.

How to call Gemini from terminal

Start Gemini CLI:

gemini

In the interactive prompt, ask a scoped question and force tool usage:

How does message sending work in Rocket.Chat?
Use MCP tools.
Call repo_index first with targetDir="<ABSOLUTE_PATH_TO_TARGET_REPO_SUBDIR>".
Then call read_file only for relevant files.

Confirm the tool calls appear in output (repo_index, then read_file).
For deep architecture questions such as message flow, auth flow, permissions, and E2E encryption, keep the same pattern:
- repo_index once at the beginning
- read_file only for specific files and sections

The gemini-extension.json manifest uses ${extensionPath} so it runs cross-platform without hardcoded absolute paths.

Full walkthrough on Windows

Open PowerShell in this repo:

cd "<ABSOLUTE_PATH_TO_CODE_ANALYZER>"

Install dependencies once:

npm install

Build before starting MCP server:

npm run build

Start MCP server (correct command):

npm run mcp:server

If you typed npm runmcp:server, it fails because run and script name must be separate.
For extension-based integration, run gemini extensions link . once and restart gemini-cli.
Ask one of your target questions and explicitly request MCP tool usage:

How are messages sent in Rocket.Chat?
Use MCP tools.
Call repo_index first for targetDir="<ABSOLUTE_PATH_TO_TARGET_REPO_SUBDIR>".
Then call read_file only for required files.

Verify in gemini-cli output that tool calls appear for repo_index and read_file.
Record MCP benchmark run separately:

npx tsx src/demo.ts --mode mcp "<ABSOLUTE_PATH_TO_TARGET_REPO_SUBDIR>" "How are messages sent in Rocket.Chat?"

MCP mode appends results to benchmark-results-mcp.json and keeps benchmark-results.json unchanged.

Expected index cache behavior

First repo_index call on a target directory: cache miss (index build).
Repeated repo_index call in the same process: memory cache hit.
Repeated repo_index call after restart with no relevant changes: disk cache hit.
Any indexable file change: cache invalidates and rebuilds automatically.
Use forceRefresh=true in repo_index to bypass cache manually.
Use index_cache_invalidate to clear index cache and optionally clear read_file cache.

Cache metadata is returned in the repo_index response as:

cache.enabled
cache.hit
cache.layer
cache.cacheFile
cache.fingerprint

Verify MCP server is reachable

Start server in one terminal:

npm run mcp:server

In gemini-cli, run a prompt that explicitly requests tool usage.
You should see tool calls to repo_index and read_file instead of broad source dumps.

Capture final MCP benchmark results

Run analysis with MCP enabled for your target question.
Use npx tsx src/demo.ts --mode mcp <targetDir> "<question>" to append a run to benchmark-results-mcp.json.
Keep benchmark-results.json as your local baseline and mock comparison.

Current MCP benchmark snapshot is included in benchmark-results-mcp.json.

Current Results Summary

Method 1 (Local sparse index + lazy reader):

benchmark-results.json contains local baseline runs.
Example measured run: 309,357 naive tokens reduced to 14,252 total session tokens.

Method 2 (MCP + Gemini CLI lazy loading):

benchmark-results-mcp.json contains MCP-specific runs.
Current snapshot preserves the same measured token profile while moving retrieval to MCP tool calls.

Why this reduces token cost

Skeleton first: the model gets compact exported signatures instead of full source files.
Lazy fetches: implementation is retrieved only when necessary.
Scoped reads: symbolsOnly, lineRange, and maxLines keep payloads bounded.

Working Example: Message Sending Analysis in Rocket.Chat

This project was validated against the Rocket.Chat codebase to trace how messages are sent through the system. The analysis demonstrates both the skeletal index approach and live MCP tool-calling.

Message Sending Flow (Traced via repo_index + read_file)

The MCP server successfully extracted and analyzed the complete message pipeline:

Entry Point (Meteor Method): The client calls the sendMessage Meteor method, which performs initial checks, enforces rate limits, and triggers executeSendMessage.
Validation & Preparation (executeSendMessage): This step validates the message size, ensures the room exists, checks timestamps, and confirms the sender's identity. It also verifies if the user has permission to send messages in the specific room.
Core Logic (sendMessage Function):
- Apps-Engine Hooks: Triggers IPreMessageSentPrevent, IPreMessageSentExtend, and IPreMessageSentModify events.
- beforeSave Hooks: Executes various filters (bad words, markdown, mentions, etc.) through the Message.beforeSave service call.
- Persistence: The message is inserted into the Messages collection.
- Post-Persistence Apps-Engine: Triggers IPostMessageSent or IPostSystemMessageSent.
Post-Save Actions (afterSaveMessage):
- Callbacks: Runs the afterSaveMessage callback, which includes notifyUsersOnMessage.
- Notifications & Updates: Updates room activity trackers, adjusts user subscription unread counts/alerts, and broadcasts changes to clients via DDP (e.g., notifyOnRoomChangedById).
- Service-Level Post-Save: Message.afterSave handles additional asynchronous tasks like OEmbed link parsing.

MCP Server Status

The MCP server is running and successfully integrated with gemini-cli:

Configured MCP servers:
- rocketChatLazyIndex - Ready (4 tools)
  Tools:
    - mcp_rocketChatLazyIndex_read_file
    - mcp_rocketChatLazyIndex_repo_index
          - mcp_rocketChatLazyIndex_index_cache_stats
          - mcp_rocketChatLazyIndex_index_cache_invalidate

Live Performance Metrics

Latest measured run (Rocket.Chat/apps/meteor/server): 307,582 naive tokens reduced to 12,002 total session tokens.
Files indexed: 148
Index cache: enabled (indexCacheHit: false on rebuild run)
Session ID: f1718aad-c001-4b0f-9bbd-27b662c82aa0
Tool Calls: 10 (9 successful, 1 duplicate)
Success Rate: 90.0%
Latest measured run (Rocket.Chat/apps/meteor/server): 307,582 naive tokens reduced to 11,595 total session tokens.
Files indexed: 148
Index cache: enabled (indexCacheHit: true)

Wall Time: 2m 42s
Agent Active: 47.7s

API Time: 24.0s (50.2%)
Tool Time: 23.7s (49.8%)

Token Efficiency:

gemini-2.5-flash-lite: 1 request → 1,087 input tokens + 86 output tokens
gemini-3-flash-preview: 11 requests → 81,037 input tokens (207,415 from cache) + 1,412 output tokens

Savings Highlight: 207,415 (71.6%) of input tokens were served from cache, directly demonstrating the lazy-loading efficiency of the MCP approach.

Development Commands

npm run demo
npm run mcp:server
npm run mcp:server:dev
npm test
npm run build

Priorities and Next Steps

Replace the mock loop in src/demo.ts with a live tool-calling flow so the model can decide when to call read_file.
Add query intent routing (planned classifier layer) to scope indexing by domain before parsing, reducing initial index size.
Improve index fidelity with richer class details (constructors, overloads, visibility filters) while preserving compact output.
Expand tests for src/LazyFileReader.ts, especially path boundary checks, symbols-only output, and line-range edge cases.
Add optional TTL/size limits and cleanup for .cache/repo-index in long-running environments.
Document a release checklist for publishing this extension with versioned GitHub releases.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dist		dist
gemini-extension		gemini-extension
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
GEMINI.md		GEMINI.md
README.md		README.md
benchmark-results-mcp.json		benchmark-results-mcp.json
benchmark-results.json		benchmark-results.json
changesmade_md.md		changesmade_md.md
example-output.json		example-output.json
gemini-extension.json		gemini-extension.json
image.png		image.png
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rocket.Chat Code Analyzer

Why This Exists

Current Architecture

Project Layout

Setup

Method 1: Local Sparse Index + Lazy Reader

Usage

Method 2: MCP + Gemini CLI Lazy Loading

What was added

Run the MCP server

Integrate with gemini-cli (standard extension flow)

How to call Gemini from terminal

Full walkthrough on Windows

Expected index cache behavior

Verify MCP server is reachable

Capture final MCP benchmark results

Current Results Summary

Why this reduces token cost

Working Example: Message Sending Analysis in Rocket.Chat

Message Sending Flow (Traced via repo_index + read_file)

MCP Server Status

Live Performance Metrics

Development Commands

Priorities and Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rocket.Chat Code Analyzer

Why This Exists

Current Architecture

Project Layout

Setup

Method 1: Local Sparse Index + Lazy Reader

Usage

Method 2: MCP + Gemini CLI Lazy Loading

What was added

Run the MCP server

Integrate with gemini-cli (standard extension flow)

How to call Gemini from terminal

Full walkthrough on Windows

Expected index cache behavior

Verify MCP server is reachable

Capture final MCP benchmark results

Current Results Summary

Why this reduces token cost

Working Example: Message Sending Analysis in Rocket.Chat

Message Sending Flow (Traced via repo_index + read_file)

MCP Server Status

Live Performance Metrics

Development Commands

Priorities and Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages