AI Web Debugger is an Electron desktop application that embeds a sandboxed browser (via Chromium DevTools Protocol) and exposes a structured tool layer so a large-language model can observe and diagnose live web pages. The LLM never touches DOM, network, or storage directly — everything flows through a ToolRegistry with Zod schemas, risk labels, ActionPolicy gating, ContentBoundaryWrapper nonce isolation, and a Redactor that strips secrets before any data reaches the model. Designed for developers and QA engineers who want an AI co-pilot for debugging unfamiliar or flaky pages.
This project uses bun as the package manager and script runner.
# 1 — Install dependencies
bun install
# 2 — Start the app in development mode
bun run dev
# 3 — In a separate terminal, start the fixture server (optional but useful for testing)
bun run fixture
# 4 — In the app address bar navigate to:
# http://127.0.0.1:4321/The fixture server at http://127.0.0.1:4321/ serves a pre-wired debug site with intentional errors, slow endpoints, and secret-leaking routes so you can exercise every tool without a real target.
brew tap dickwu/tap
brew install --cask ai-web-debuggerThe macOS build is unsigned and not notarized. If macOS shows a damaged app or quarantine warning, run:
sudo xattr -d com.apple.quarantine /Applications/AI\ Web\ Debugger.app/The app is split into three Electron processes. The main process owns all privileged work: CDP attachment, network/console recording, the ToolRegistry, AgentRunner, and all file I/O. It communicates with the renderer through a narrow, typed IPC whitelist exposed by contextBridge in the preload script — raw ipcRenderer is never exposed. The target web page runs in a fully sandboxed WebContentsView with no preload and no Node access; it is treated as untrusted at all times.
CDP events flow from the target page's WebContents into NetworkRecorder and ConsoleRecorder, which redact secrets and write records to CaptureStore. The ToolRegistry handlers read from CaptureStore and write back through ContentBoundaryWrapper before results reach the LLM. The AgentRunner drives the tool-use loop: it calls the LLM provider, dispatches tool calls through ToolRegistry, and accumulates evidence references from every assistant message.
┌─────────────────────────────────────────────────────────────────────┐
│ Main Process │
│ │
│ AppController ──► BrowserManager ──► TargetPage (WebContentsView) │
│ │ │ │ │
│ │ CaptureStore CDP attach │
│ │ NetworkRecorder ◄────────┘ │
│ │ ConsoleRecorder │
│ │ │ │
│ │ ToolRegistry (Zod + ActionPolicy + Redactor) │
│ │ │ │
│ └──────────► AgentRunner ──► LLMProvider (mock / anthropic) │
│ │ │
│ ContentBoundaryWrapper (nonce per run) │
└────────────────────────────┬────────────────────────────────────────┘
│ typed IPC (contextBridge only)
┌────────────────────────────▼────────────────────────────────────────┐
│ Preload uiPreload.ts — exposes window.debuggerApp │
└────────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────────────┐
│ Renderer (React) panels: Network / Console / Snapshot / Agent │
└─────────────────────────────────────────────────────────────────────┘
(separate, sandboxed view)
┌─────────────────────────────────────────────────────────────────────┐
│ Target WebContentsView — nodeIntegration:false, sandbox:true │
│ No preload. Treated as untrusted. CDP attached from main process. │
└─────────────────────────────────────────────────────────────────────┘
Five invariants are enforced unconditionally and tested by tests/integration/security-invariants.test.ts:
- Sandboxed target page —
nodeIntegration: false,contextIsolation: true,sandbox: true, no preload.typeof requireis alwaysundefinedinside the target page. - All CDP in main process — no renderer or target-page code ever calls CDP directly.
CDPClientlives entirely insrc/main/cdp/. - ToolRegistry-only LLM access —
AgentRunnercallstoolRegistry.run()exclusively. No direct CDP, fs, or network calls from within the agent loop. - ContentBoundary on all page text — every tool whose
outputis'page-data'is wrapped byContentBoundaryWrapper.wrapJson()with a per-run nonce before the data reaches the LLM, preventing prompt injection from page content. - ActionPolicy default deny for risky categories —
eval,network,state,download,uploadare denied;click,fill,interact,eval-readonlyrequire user confirmation;navigate,snapshot,get,wait,dialogare auto-allowed.
| Tool | Category | Risk |
|---|---|---|
browser.open_url |
navigate | page_action |
browser.open_blank |
navigate | page_action |
browser.reload |
navigate | page_action |
browser.back |
navigate | page_action |
browser.forward |
navigate | page_action |
browser.stop |
navigate | page_action |
browser.wait_for_load_state |
wait | read |
browser.batch |
navigate | page_action |
page.snapshot |
snapshot | read |
page.screenshot |
snapshot | read |
page.get_interactive_elements |
snapshot | read |
page.get_by_ref |
snapshot | read |
page.find |
snapshot | read |
page.query_selector |
snapshot | read |
page.wait_for |
wait | read |
page.scroll |
interact | page_action |
page.click |
click | page_action |
page.type |
fill | page_action |
page.press |
interact | page_action |
page.evaluate_readonly |
eval-readonly | read |
network.list_requests |
get | read |
network.get_request |
get | read |
network.get_response_body |
get | read |
console.list_messages |
get | read |
storage.get_cookies |
get | read |
storage.get_local_storage |
get | read |
storage.get_session_storage |
get | read |
dialog.status |
dialog | read |
dialog.accept |
dialog | page_action |
dialog.dismiss |
dialog | page_action |
diagnostics.summarize_current_page |
get | read |
diagnostics.doctor |
get | read |
| Variable | Description |
|---|---|
LLM_PROVIDER |
Provider to use: mock (default) or anthropic |
LLM_API_KEY |
API key for the selected provider |
LLM_MODEL |
Model name (e.g. claude-3-5-sonnet-20241022) |
AI_WEB_DEBUGGER_LLM_PROVIDER |
Override — takes precedence over LLM_PROVIDER |
AI_WEB_DEBUGGER_LLM_API_KEY |
Override — takes precedence over LLM_API_KEY |
AI_WEB_DEBUGGER_LLM_MODEL |
Override — takes precedence over LLM_MODEL |
Settings are loaded in this order (later entries win):
- Built-in defaults
<userData>/config.json— persisted user settings./ai-web-debugger.json— project-level override (checked into your repo)- Environment variables with
AI_WEB_DEBUGGER_*prefix
<userData> on macOS is ~/Library/Application Support/ai-web-debugger.
| Artifact | Path |
|---|---|
| Session screenshots | <userData>/sessions/<sessionId>/screenshots/*.png |
| JSONL capture events | <userData>/sessions/<sessionId>/events.jsonl |
| Session metadata | <userData>/sessions/<sessionId>/session.json |
| Main process log | <userData>/logs/main.jsonl |
Each app launch creates a new <sessionId> (UUID v4).
| Script | Description |
|---|---|
bun run dev |
Start app in development mode with hot reload |
bun run build |
Typecheck + compile with electron-vite |
bun run preview |
Preview the production build locally |
bun run start |
Run the compiled app (electron .) |
bun test (or bun run test) |
Run unit and integration tests with vitest |
bun run test:watch |
Watch mode for tests |
bun run lint |
Run ESLint (flat config, TypeScript-aware) |
bun run typecheck |
Run tsc --noEmit only |
bun run fixture |
Start the local debug-site fixture server on port 4321 |
bun run dist |
Package the app with electron-builder (do not run in CI without signing) |
# Run the full test suite (unit + integration)
bun run test
# For manual end-to-end testing, start the fixture server first:
bun run fixture
# Then launch the app and navigate to http://127.0.0.1:4321/
bun run devThe integration tests under tests/integration/ are mock-driven — they do not launch a real Electron process. agent-mock-flow.test.ts exercises the full AgentRunner → ToolRegistry → MockLLMProvider pipeline with an in-memory CaptureStore. security-invariants.test.ts performs static source-file checks and runtime Redactor assertions.
The following items are out of scope for MVP v0.1 and are tracked in the implementation plan (§7 Out of scope for MVP):
- Multi-tab support (only one target
WebContentsViewper session) - Full iframe / OOPIF multi-frame tracking
- SQLite persistence (currently in-memory
CaptureStoreonly) - Real code-signing and notarization for distribution builds
- Network request interception / mutation tools
- Playwright-style assertions in
page.wait_for(subset implemented) - OpenAI provider adapter (only Anthropic and mock are wired)
MIT — see LICENSE.