Skip to content

dickwu/ai-web-debugger

Repository files navigation

AI Web Debugger

CI Release

AI Web Debugger is an Electron desktop application that embeds a sandboxed browser (via Chromium DevTools Protocol) and exposes a structured tool layer so a large-language model can observe and diagnose live web pages. The LLM never touches DOM, network, or storage directly — everything flows through a ToolRegistry with Zod schemas, risk labels, ActionPolicy gating, ContentBoundaryWrapper nonce isolation, and a Redactor that strips secrets before any data reaches the model. Designed for developers and QA engineers who want an AI co-pilot for debugging unfamiliar or flaky pages.

Quickstart

This project uses bun as the package manager and script runner.

# 1 — Install dependencies
bun install

# 2 — Start the app in development mode
bun run dev

# 3 — In a separate terminal, start the fixture server (optional but useful for testing)
bun run fixture

# 4 — In the app address bar navigate to:
#     http://127.0.0.1:4321/

The fixture server at http://127.0.0.1:4321/ serves a pre-wired debug site with intentional errors, slow endpoints, and secret-leaking routes so you can exercise every tool without a real target.

Install With Homebrew

brew tap dickwu/tap
brew install --cask ai-web-debugger

The macOS build is unsigned and not notarized. If macOS shows a damaged app or quarantine warning, run:

sudo xattr -d com.apple.quarantine /Applications/AI\ Web\ Debugger.app/

Architecture

The app is split into three Electron processes. The main process owns all privileged work: CDP attachment, network/console recording, the ToolRegistry, AgentRunner, and all file I/O. It communicates with the renderer through a narrow, typed IPC whitelist exposed by contextBridge in the preload script — raw ipcRenderer is never exposed. The target web page runs in a fully sandboxed WebContentsView with no preload and no Node access; it is treated as untrusted at all times.

CDP events flow from the target page's WebContents into NetworkRecorder and ConsoleRecorder, which redact secrets and write records to CaptureStore. The ToolRegistry handlers read from CaptureStore and write back through ContentBoundaryWrapper before results reach the LLM. The AgentRunner drives the tool-use loop: it calls the LLM provider, dispatches tool calls through ToolRegistry, and accumulates evidence references from every assistant message.

┌─────────────────────────────────────────────────────────────────────┐
│  Main Process                                                        │
│                                                                      │
│  AppController ──► BrowserManager ──► TargetPage (WebContentsView)  │
│       │                  │                  │                        │
│       │            CaptureStore         CDP attach                   │
│       │          NetworkRecorder   ◄────────┘                        │
│       │          ConsoleRecorder                                     │
│       │                  │                                           │
│       │            ToolRegistry (Zod + ActionPolicy + Redactor)      │
│       │                  │                                           │
│       └──────────► AgentRunner ──► LLMProvider (mock / anthropic)   │
│                          │                                           │
│                    ContentBoundaryWrapper (nonce per run)            │
└────────────────────────────┬────────────────────────────────────────┘
                             │ typed IPC (contextBridge only)
┌────────────────────────────▼────────────────────────────────────────┐
│  Preload  uiPreload.ts — exposes window.debuggerApp                  │
└────────────────────────────┬────────────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────────────┐
│  Renderer (React)  panels: Network / Console / Snapshot / Agent     │
└─────────────────────────────────────────────────────────────────────┘
                                         (separate, sandboxed view)
┌─────────────────────────────────────────────────────────────────────┐
│  Target WebContentsView  — nodeIntegration:false, sandbox:true      │
│  No preload. Treated as untrusted. CDP attached from main process.  │
└─────────────────────────────────────────────────────────────────────┘

Security Model

Five invariants are enforced unconditionally and tested by tests/integration/security-invariants.test.ts:

  1. Sandboxed target pagenodeIntegration: false, contextIsolation: true, sandbox: true, no preload. typeof require is always undefined inside the target page.
  2. All CDP in main process — no renderer or target-page code ever calls CDP directly. CDPClient lives entirely in src/main/cdp/.
  3. ToolRegistry-only LLM accessAgentRunner calls toolRegistry.run() exclusively. No direct CDP, fs, or network calls from within the agent loop.
  4. ContentBoundary on all page text — every tool whose output is 'page-data' is wrapped by ContentBoundaryWrapper.wrapJson() with a per-run nonce before the data reaches the LLM, preventing prompt injection from page content.
  5. ActionPolicy default deny for risky categorieseval, network, state, download, upload are denied; click, fill, interact, eval-readonly require user confirmation; navigate, snapshot, get, wait, dialog are auto-allowed.

Configured Tools

Tool Category Risk
browser.open_url navigate page_action
browser.open_blank navigate page_action
browser.reload navigate page_action
browser.back navigate page_action
browser.forward navigate page_action
browser.stop navigate page_action
browser.wait_for_load_state wait read
browser.batch navigate page_action
page.snapshot snapshot read
page.screenshot snapshot read
page.get_interactive_elements snapshot read
page.get_by_ref snapshot read
page.find snapshot read
page.query_selector snapshot read
page.wait_for wait read
page.scroll interact page_action
page.click click page_action
page.type fill page_action
page.press interact page_action
page.evaluate_readonly eval-readonly read
network.list_requests get read
network.get_request get read
network.get_response_body get read
console.list_messages get read
storage.get_cookies get read
storage.get_local_storage get read
storage.get_session_storage get read
dialog.status dialog read
dialog.accept dialog page_action
dialog.dismiss dialog page_action
diagnostics.summarize_current_page get read
diagnostics.doctor get read

Configuration

Environment variables

Variable Description
LLM_PROVIDER Provider to use: mock (default) or anthropic
LLM_API_KEY API key for the selected provider
LLM_MODEL Model name (e.g. claude-3-5-sonnet-20241022)
AI_WEB_DEBUGGER_LLM_PROVIDER Override — takes precedence over LLM_PROVIDER
AI_WEB_DEBUGGER_LLM_API_KEY Override — takes precedence over LLM_API_KEY
AI_WEB_DEBUGGER_LLM_MODEL Override — takes precedence over LLM_MODEL

Config files

Settings are loaded in this order (later entries win):

  1. Built-in defaults
  2. <userData>/config.json — persisted user settings
  3. ./ai-web-debugger.json — project-level override (checked into your repo)
  4. Environment variables with AI_WEB_DEBUGGER_* prefix

<userData> on macOS is ~/Library/Application Support/ai-web-debugger.

Artifact locations

Artifact Path
Session screenshots <userData>/sessions/<sessionId>/screenshots/*.png
JSONL capture events <userData>/sessions/<sessionId>/events.jsonl
Session metadata <userData>/sessions/<sessionId>/session.json
Main process log <userData>/logs/main.jsonl

Each app launch creates a new <sessionId> (UUID v4).

Scripts

Script Description
bun run dev Start app in development mode with hot reload
bun run build Typecheck + compile with electron-vite
bun run preview Preview the production build locally
bun run start Run the compiled app (electron .)
bun test (or bun run test) Run unit and integration tests with vitest
bun run test:watch Watch mode for tests
bun run lint Run ESLint (flat config, TypeScript-aware)
bun run typecheck Run tsc --noEmit only
bun run fixture Start the local debug-site fixture server on port 4321
bun run dist Package the app with electron-builder (do not run in CI without signing)

Testing

# Run the full test suite (unit + integration)
bun run test

# For manual end-to-end testing, start the fixture server first:
bun run fixture
# Then launch the app and navigate to http://127.0.0.1:4321/
bun run dev

The integration tests under tests/integration/ are mock-driven — they do not launch a real Electron process. agent-mock-flow.test.ts exercises the full AgentRunner → ToolRegistry → MockLLMProvider pipeline with an in-memory CaptureStore. security-invariants.test.ts performs static source-file checks and runtime Redactor assertions.

Known limitations

The following items are out of scope for MVP v0.1 and are tracked in the implementation plan (§7 Out of scope for MVP):

  • Multi-tab support (only one target WebContentsView per session)
  • Full iframe / OOPIF multi-frame tracking
  • SQLite persistence (currently in-memory CaptureStore only)
  • Real code-signing and notarization for distribution builds
  • Network request interception / mutation tools
  • Playwright-style assertions in page.wait_for (subset implemented)
  • OpenAI provider adapter (only Anthropic and mock are wired)

License

MIT — see LICENSE.

About

Electron-based observable browser for LLM-driven webpage debugging

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors