Skip to content

agarg0627/BOB

Repository files navigation

BOB — Build your Own Browser

Press ⌘K on any webpage, describe what you want in plain English, and an AI agent generates, previews, and installs a content-script feature that changes the page to your liking — in under 10 seconds.

License TypeScript Chrome MV3 LA Hacks 2026

📺 Watch the demo  · 

BOB is provider-agnostic (Anthropic, OpenAI, Google), bring-your-own-key, and runs entirely on-device. No backend, no proxy, no telemetry. Built at LA Hacks 2026.

Architecture

BOB architecture

A Chrome MV3 extension. The service worker hosts a tool-using LLM agent (Reflexion-style retry loop) behind a unified provider contract; content scripts handle the ⌘K overlay, DOM pruning, behavior tracking, and feature execution in MAIN world.

Setup

npm install
npm run build

Then load dist/ as an unpacked extension at chrome://extensions.

Open the BOB options page (right-click the extension icon → Options) and paste an API key for one of: Anthropic, OpenAI, or Google. Pick your provider in the same page. The extension uses that provider for all feature generation until you change it.

API key sources:

Dev

npm run dev        # Vite watch mode
npm run typecheck  # tsc --noEmit

How it works

  1. User presses ⌘K on any page → shadow-DOM overlay opens.
  2. User types a request ("hide YouTube Shorts", "dim the sidebar").
  3. Content script captures a pruned ≤4 KB DOM snapshot of the current page (stable identifiers prioritized, autogenerated class noise stripped).
  4. Background service worker sends prompt + snapshot to the configured LLM, with a tool registry the model can call:
    • query_dom — run targeted CSS-selector probes against the live page when the snapshot isn't enough.
    • test_code — execute candidate JS in the user's tab and read back the DOM delta plus any thrown error before committing.
  5. LLM returns JSON: { code, name, description, urlPattern }.
  6. Overlay shows a syntax-highlighted preview with editable name and URL pattern. The user reviews and clicks Install (nothing is auto-applied).
  7. Feature is saved to chrome.storage.local and executed immediately via chrome.scripting.executeScript with world: 'MAIN', using a per-injection Trusted Types policy so it survives strict-CSP sites.
  8. On future visits to matching URLs, the content script asks the background worker to re-inject the saved code — including across SPA navigations (history-API patch) and DOM mutations (window.__bobObserve, a debounced slug-keyed MutationObserver helper installed in MAIN world).
  9. If the injected code throws, BOB feeds the runtime error and a fresh DOM snapshot back to the agent and asks it to fix the root cause — up to two retries (Reflexion-style).
  10. Popup lists installed features with run/error counts, expandable stack traces, toggles, and JSON import/export.

After install, the overlay flips into refine mode ("smaller", "apply more broadly", "undo this") with the prior code and a capped history of past turns in context, so refinement is a real conversation rather than a re-prompt.

Features

  • ⌘K overlay — prompt → preview → install on any page.
  • Three providers, one interface — Anthropic Claude, OpenAI GPT, Google Gemini / Gemma. Switching providers does not regress reasoning capability: a single effortMode: 'high' flag maps to Anthropic extended thinking, OpenAI reasoning_effort, and Gemini thinkingBudget / Gemma thinkingLevel.
  • Tool-using agentquery_dom and test_code let the model verify against the real page instead of hallucinating selectors.
  • Reflexion retry loop — runtime errors trigger an automatic fix-the-root-cause pass with the live DOM and stack trace.
  • Idempotent generated code — every mutation is tagged data-bob='<slug>' and every observer is keyed by slug, so re-running on SPA navigation or toggle cycles doesn't compound side effects.
  • Conversational refinement — refine installed features in the same overlay, with iteration history (parentFeatureId, iterationNumber).
  • Behavior-driven suggestions — BOB watches click patterns (privacy-bounded: no input/contenteditable/password reads, sensitive hostnames skipped) and proactively proposes automations like "auto-dismiss this banner" after three manual dismissals. Three-state dismissal: Try / Later (3-day cooldown) / Never.
  • Voice input — Web Speech API on the prompt textarea.
  • Import / export — features serialize to portable JSON. Curated feature packs ship-able to teammates.
  • Bring-your-own-key, on-device — keys and features live in chrome.storage.local. No server.

Screenshots

Refinement & Sandbox Generated Chess Game Stack Overflow Rewrite
Refinement and sandboxed test_code Chess game generated on the fly Stack Overflow message rewrite
Features & Suggestions Settings & API Keys
Installed features and proactive suggestions Provider and API key configuration

File tree

src/
  background/
    index.ts              Service worker, message router,
                          MAIN-world injector, __bobObserve helper
    llm.ts                Provider-agnostic feature generation entry
    agent.ts              Tool-loop controller + output contract parsing
    tools.ts              query_dom, test_code tool dispatcher
    suggestions-engine.ts Behavior → suggestion synthesis, dismissal
    settings.ts           API keys, provider, effort mode persistence
    error-recorder.ts     Per-feature run/error tracking
    providers/
      types.ts            Provider interface
      prompt.ts           Shared system prompt + user prompt builder
      anthropic.ts        Anthropic Messages API client
      openai.ts           OpenAI Chat Completions client
      google.ts           Gemini / Gemma generateContent client
  content/
    index.ts              Content script entry, runs on every page
    injector.ts           Dispatches feature execution to background
    dom-prune.ts          ≤4 KB DOM snapshot for LLM context
    behavior-tracker.ts   Privacy-bounded click pattern observer
    spa.ts                History-API patch for SPA navigation
    lifecycle.ts          Per-feature install/teardown plumbing
    observer-helper.ts    Shared mutation-observer utilities
    page-badge.ts         On-page status badge
    quick-toggle.ts       Per-feature on-page toggle UI
    voice-input.ts        Web Speech API for prompt entry
    overlay/
      overlay.ts          ⌘K UI (shadow DOM, state machine)
      overlay.css         Overlay styles
      preview.ts          Diff/preview view rendering
  popup/
    popup.{html,ts,css}   Installed-features panel
    suggestions-section.* Proactive suggestion UI
    iteration.ts          Refinement-tree rendering
    import-export.ts      JSON feature pack import/export
  options/
    options.{html,ts,css} Provider + API key configuration
  shared/
    types.ts              Locked type contracts
    storage.ts            chrome.storage CRUD with per-key locks
    messages.ts           Message-passing helper
    hotkey.ts             ⌘K binding
    keybind-conflicts.ts  Detect collisions with site shortcuts
    ui-scale.ts           Overlay scaling

~6,000 lines of TypeScript. Zero runtime dependencies.

FAQ

Why bring-your-own-key? We don't run a backend, and we're not paying your inference bills. Your API key never leaves your machine — it sits in chrome.storage.local.

Why not just ask ChatGPT to write a userscript? Because the agent verifies against your live DOM, retries on errors, and persists the result as an installable, idempotent feature. A chat tab can't do any of that.

Why a Chrome extension instead of a Chromium fork? Scope. MV3 + content scripts + MAIN-world injection is enough surface area to do most of what a "real" browser modification needs.

Known limitations

  • Sites with strict Trusted Types policies (some Google properties) may still reject injected scripts despite the per-injection policy fallback. Demo on sites that work; don't fight individual sites' CSP.
  • Generated code runs in the page's MAIN world and is subject to the page's CSP. Most sites work; a few do not.
  • No cloud sync. Features and settings live in chrome.storage.local on the device they were created on. Use Import / Export in the popup to move feature packs between machines.

Stack

TypeScript · Vite · @crxjs/vite-plugin · Chrome MV3 service worker · shadow-DOM overlay · Anthropic / OpenAI / Google APIs · Web Speech API · chrome.scripting.executeScript (MAIN world) · Trusted Types · chrome.storage.local. Zero runtime dependencies.

License

MIT — see LICENSE.

About

Build your Own Browser

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors