Skip to content

UnitBuilds-CC/MCP-Lite

Repository files navigation

Agentic Browser (MCP Lite) 🚀

A highly-efficient, standalone, graph-native version of the Agentic Browser MCP Server. It provides direct LLM control over Google Chrome via the Model Context Protocol (MCP) using chromedp. It integrates with Neo4j to build site graphs, query shortest action paths, and automate complex workflows with looping and parameter binding.

Features

  1. Raw MCP Browser Automation: Perform standard interactions (navigate, click, type, scroll, wait, drag, screenshot, list frames) using standard logical pixel offsets.
  2. Neo4j Graph Logging: Automatically logs page navigation data (titles, links, scripts, and cookies) and AOM state-transition graphs directly to a Neo4j database.
  3. BFS Site Crawling: Traverses a site using Breadth-First Search (BFS) to map structural accessibility states and link action transitions.
  4. Shortest Path Execution: Query Neo4j for the shortest sequence of actions from any start state/URL to a target state/URL, and execute the transitions automatically.
  5. Workflow Recording & Execution: Record sequences of interactions and replay them. Supports iteration looping and parameter mapping datasets (e.g., executing bulk tasks like 1000+ tax returns automatically).

Quantitative Performance Benchmarks 📊

We benchmarked the AOM (Accessibility Object Model) context extraction of MCP Lite against standard full-HTML extraction methods (the default way standard LLMs fetch web pages) across different domains.

You can run the official benchmark tool locally at any time:

# Build the benchmarking utility (if needed)
go build -o benchmark.exe ./cmd/benchmark/...

# Run benchmark against any URL
.\benchmark.exe -url https://github.com

Benchmark Results

Target URL Metric Standard Full-HTML MCP Lite (AOM) Context Reduction / Speedup
wikipedia.org Payload Size
Est. Tokens
435 KB
108,863
47 KB
11,956
89.02% reduction in LLM input cost and context usage
Latency (3 Steps) 9.0s (100% Agent Loop) 3.0s (0% Agent Loop) 3.0x execution speedup using workflow scripts
github.com Payload Size
Est. Tokens
594 KB
148,625
19 KB
4,774
96.79% reduction (Perfect for large modern SPAs)
Latency (3 Steps) 9.0s (100% Agent Loop) 3.0s (0% Agent Loop) 3.0x execution speedup

Note: Tokens are estimated using a standard 4 bytes per token ratio. The standard browser tooling assumes the LLM receives the raw outer HTML and performs reasoning in a sequential loop. Workflow Script Playback executes native steps in the browser instantly without round-trips to the LLM during execution.

Architectural Comparison: How We Compare to Traditional Browsing Methods

Capability / Metric Puppeteer/Playwright HTML Scraping VLM-based Screenshotting MCP Lite (AOM + Workflows)
Payload Size & Cost Very High
(100K+ tokens of raw HTML, scripts, CSS tags, inline SVG/styles)
High
(Requires uploading large images, high VLM tile/token cost)
Extremely Low
(Average 90%+ token reduction, text-only pruned AOM)
Execution Latency Slow
(Requires full DOM parsing and round-trip LLM loop for every step)
Very Slow
(High image upload latency + slow VLM vision reasoning times)
Instant
(Local workflow playbacks execute actions with 0% LLM overhead)
Interaction Accuracy Moderate
(Prone to clicking wrong selectors, dynamic classes, or hidden elements)
Low
(VLM coordinate hallucination on high-density or scrollable pages)
Perfect
(Surgical clicks on backend-bound logical AOM nodeIds)
Page Layout Mapping Poor
(Requires LLMs to build page relationships from flat code trees)
Fair
(Visual layout is clear, but lacks structured semantic hierarchies)
Excellent
(Deterministic AOM state-hashing logs paths to Neo4j graph)
Agent Loop Overhead 100%
(Requires the LLM to process and decide at every single action step)
100%
(Requires the LLM to process and decide at every single action step)
0%
(Recorded workflows execute locally with dynamic dataset variables)

Installation & Setup

Prerequisites

  • Go 1.24+ / 1.26+
  • Google Chrome (standard installation path)
  • Neo4j Database (optional, defaults to bolt://localhost:7687)

Clone & Build

Since this project is standalone, navigate to the mcp-lite directory and build:

cd mcp-lite
go mod tidy
go build -o mcp-server.exe ./cmd/mcp/...

Running with Neo4j

Configure the connection properties using environment variables before executing:

# Windows (PowerShell)
$env:NEO4J_URI="bolt://localhost:7687"
$env:NEO4J_USER="neo4j"
$env:NEO4J_PASS="agentic_secure_password"
.\mcp-server.exe

If Neo4j is not running or accessible, the server will log a warning to mcp_debug.log and automatically fallback to Standalone Browser Mode (allowing all normal browser MCP tools to function, but returning warnings for Neo4j-dependent tools).


Model Context Protocol (MCP) Integration

To use this server inside an MCP client (such as Claude Desktop), add the following configuration to your claude_desktop_config.json:

{
  "mcpServers": {
    "agentic-browser-mcp-lite": {
      "command": "E:\\LLM-Browser\\mcp-lite\\mcp-server.exe",
      "env": {
        "NEO4J_URI": "bolt://localhost:7687",
        "NEO4J_USER": "neo4j",
        "NEO4J_PASS": "agentic_secure_password"
      }
    }
  }
}

Available Tools

1. Standard Browser Navigation & Actions

  • navigate (url): Navigate to a webpage and retrieve its accessibility tree (AOM).
  • fast_navigate (url): Navigate without rendering the AOM.
  • get_aom (withSpatial, withStyles): Retrieve current pruned AOM.
  • click (nodeId): Perform a surgical logical click on an AOM node.
  • type_text (nodeId, text): Type text into a textbox.
  • scroll (direction, amount): Scroll the viewport.
  • wait (ms): Wait for a specified timeframe.
  • press_key (key): Send a keyboard keystroke (e.g. Enter).
  • take_screenshot: Capture viewport PNG.
  • take_node_screenshot (nodeId): Take cropped screenshot of a specific element.
  • inspect_node (nodeId): Retrieve computed CSS styling and coordinates.
  • click_xy (x, y) / drag_xy (x1, y1, x2, y2): Coordinate-based mouse actions.

2. Workflow Recording & Scripting

  • workflow_record_start: Start recording user actions.
  • workflow_record_stop (name): Stop recording and save to the ./workflows/ folder as JSON.
  • workflow_list: List saved workflows.
  • workflow_delete (name): Remove a saved workflow.
  • workflow_play (name, iterations, dataset): Replay a workflow script.
    • If a dataset is provided (array of JSON objects), the workflow will loop and bind variable placeholders ({{variable_name}}) to the values in each row.

3. Graphing & BFS Site Traversal

  • graph_start_crawl (url, maxDepth): Start BFS crawler mapping the site graph to Neo4j.
  • graph_get_shortest_path (start, end): Find action steps between states or URLs.
  • graph_traverse_path (startUrl, steps): Replay steps on the browser context.

Directory Structure

  • cmd/mcp/main.go: JSON-RPC protocol parser and crawler orchestrator.
  • cmd/benchmark/main.go: Benchmarking tool comparing standard browser payloads with AOM.
  • cmd/test_mcp_lite/main.go: End-to-end local validation test script.
  • pkg/browser/: Browser CDP session control, stealth emulation, and AOM serialization.
  • pkg/graph/: Neo4j graph database interface and action mapping logic.
  • extension/: Stealth / fingerprint-masking extension injected into Chrome.
  • workflows/: Local workspace folder storing JSON workflow templates.

Technical Details

1. Stealth Architecture & Anti-Bot Bypass

The browser layer automatically registers an anti-bot injection extension (extension/) at startup. This extension injects a high-priority content script (inject.js) into all frames before any page scripts load, hardening Chrome against bot detectors:

  • navigator.webdriver Masking: Replaces the default webdriver getter with a mocked version. Its .toString() representation is fully sanitized to mimic a native-code function (function get webdriver() { [native code] }), bypassing advanced string-comparison tests.
  • Plugin Prototype Alignment: Mocks navigator.plugins and navigator.mimeTypes using actual prototype links to PluginArray.prototype and Plugin.prototype. This ensures checks like navigator.plugins instanceof PluginArray resolve to true.
  • Permissions API Mocking: Hardens Notification.permission and navigator.permissions.query to always return prompt rather than denied or inconsistent state flags common in automated browsers.
  • WebRTC & Device Info: Controls hardware concurrency reports and WebGL renderer descriptors to resemble a real user's environment.

2. Neo4j Graph Schema

When Neo4j logging is enabled, every interaction dynamically constructs and expands a site mapping graph using a state-transition model:

graph LR
    A["(:State {url, hash, title})"] -- "[:ACTION {type: 'click', target: '280', label: 'Go'}]" --> B["(:State {url, hash, title})"]
Loading
  • State Node: Represents a unique page state defined by its URL, page title, and a deterministic SHA-256 hash of its structural AOM hierarchy.
  • ACTION Edge: Captures the transition path between states. Attributes logged include action type (click, type, etc.), target nodeId, coordinate positions (x, y), values typed, and timestamp metrics.

This graph model enables the BFS crawling engine to automatically query optimal navigation paths via Cypher shortest-path queries.

3. Workflow JSON Structure & Variables

Recorded workflows are stored in JSON format inside the ./workflows/ folder. Placeholders inside workflow scripts can be dynamically bound to dataset values.

Example workflows/login_test.json:

{
  "name": "login_test",
  "steps": [
    {
      "type": "navigate",
      "url": "https://example.com/login"
    },
    {
      "type": "type_text",
      "nodeId": "145",
      "text": "{{username}}"
    },
    {
      "type": "type_text",
      "nodeId": "146",
      "text": "{{password}}"
    },
    {
      "type": "click",
      "nodeId": "148"
    }
  ]
}

When playing this workflow with a dataset:

[
  {"username": "user1", "password": "passSecure1"},
  {"username": "user2", "password": "passSecure2"}
]

The engine runs the sequence twice, dynamically interpolating the double-brace placeholders for each loop iteration.

4. Local Testing & Verification

You can run the end-to-end verification client to validate that the local MCP server compiles and performs all expected standard and stealth tests correctly:

# Build the test harness
go build -o test_mcp_lite.exe ./cmd/test_mcp_lite/...

# Execute tests
.\test_mcp_lite.exe

This automatically initializes mcp-server.exe, runs handshake validation, creates a mock workflow, replays it, and tears down test files.

About

AOM driven Agentic browser MCP

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages