- What is OpenBrowser?
- Features
- Built-In Tools
- Setup
- Usage
- Contributing
- What is OpenBrowser?
- Acknowledgements
OpenBrowser is a free, open-source Chrome extension that brings AI-powered browser automation to your fingertips — without subscriptions, usage limits, or data collection. Connect your own API key and describe what you want in plain English. OpenBrowser handles the rest.
"Book me a flight to NYC under $400." → Done.
"Compare the specs on these three laptops I have open." → Done.
"Fill in this form with my details and submit it." → Done.
Unlike SaaS-based browser agents, OpenBrowser runs entirely inside your browser. Your API keys and conversation history never touch a third-party server.
Connect to 12 providers with your own API key:
| Provider | Notes |
|---|---|
| Anthropic (Claude) | claude-sonnet-4-5, claude-opus-4, claude-haiku-4-5 |
| OpenAI (GPT-4) | gpt-4o, gpt-4-turbo, gpt-3.5-turbo |
| Google Gemini | gemini-2.0-flash, gemini-1.5-pro |
| Groq | llama-3.3-70b, mixtral — fast free tier |
| Ollama (Local) | Fully offline — llama3.2, mistral, deepseek-r1 |
| OpenRouter | 200+ models via a single key |
| Cloudflare Workers AI | Low-latency edge inference |
| HuggingFace | Open-weight models |
| MiniMax / Moonshot / Qwen | Chinese providers with global access |
| Custom API | Any OpenAI-compatible endpoint |
OpenBrowser gives the AI a full toolkit for web automation:
Navigation & Interaction
navigate— Go to any URL or site name ("YouTube", "my Gmail", "search for...")click,type,scroll,select_option— Interact with any page elementopen_tab,switch_tab,list_tabs— Manage multiple browser tabs
Content & Data
screenshot— Capture the current pageget_page_content,scrape_page— Extract structured page contentextract_data,export_data— Pull data into tables, CSV, or JSONdownload_csv— Save extracted data as a file
Forms
smart_fill_form— Semantic form filling (understands "first name" = "given name" = "fname")scan_forms— Discover all form fields on a page
Research & Analysis
summarize_tabs— Digest all open tabs at oncecross_site_research— Compare data across multiple open tabs simultaneouslyauto_highlight— Highlight relevant passages based on your goalremove_highlights— Clear all highlights
Planning
create_task_plan— Render a live visual checklist in the chatupdate_task_step— Mark steps done/active/failed in real timereason,think— Explicit chain-of-thought before acting
Memory & Knowledge
memorize,recall— Persist facts across sessions (optional)
Files (Virtual Filesystem)
write_file,read_file,list_files,delete_file— AI-managed file storage in IndexedDB
Bookmarks & Citations
save_bookmark— Auto-tagged smart bookmarkingshow_bookmarks— Browse saved bookmarks with tag filteringadd_citation— APA/MLA/Chicago/URL format source collectionshow_citations,clear_citations
Utilities
run_javascript— Execute arbitrary JS on the current pagebrowse_intent— Intent-based navigation ("find me a cheap flight")wait— Pause for slow-loading pagesfinish— Conclude the task with a summary
AI answers appear token by token, just like Claude.ai. No waiting for the full response.
Click the microphone button to dictate your instructions. Live transcription via the Web Speech API — no extra APIs, works offline.
Press Ctrl+Shift+P on any webpage to open a floating command palette. Type your instruction and press Enter — OpenBrowser opens the side panel and runs it automatically.
Save any task as a reusable macro. Run manually or schedule it to repeat automatically (every 15 min / 30 min / 1h / 6h / daily) using Chrome alarms.
The AI can memorize facts and recall them across different conversations and sessions. A visual Memory Dashboard lets you edit or delete stored entries.
An IndexedDB-backed file store lets the AI generate, save, read, and delete files. A "Files" tab in the panel shows a tree view with download support.
Dark mode (default), Light mode, and a Custom mode with a full accent color picker and preset swatches.
When the AI outputs a mermaid code block, it renders as a live diagram — flowcharts, sequence diagrams, ER diagrams, Gantt charts — styled to match the OpenBrowser palette.
One-click access to saved prompts. 6 built-in templates (Summarize, Extract data, Fill form, etc.) plus unlimited custom templates. Open with Ctrl+K.
Right-click any page for instant OpenBrowser actions: Summarize, Ask about selected text, Translate, Save as citation, Screenshot, Fill forms.
Set RPM and RPD caps to protect your API quota. OpenBrowser stops 5 calls before the limit with a warning.
Configure a secondary provider/model. If the primary hits its quota, the agent automatically switches mid-task.
- Google Chrome (version 114 or later)
- An API key from at least one supported provider
(Or Ollama running locally for a fully free experience)
Option A — Install from source (recommended)
-
Download the latest release ZIP from the Releases page
(Or clone the repository) -
Open Chrome and navigate to
chrome://extensions -
Enable Developer mode (toggle in the top-right corner)
-
Click Load unpacked and select the
ctrl-browser/folder
(Or drag-and-drop the unzipped folder) -
The OpenBrowser icon will appear in your toolbar
Option B — Chrome Web Store
Coming soon.
- Click the OpenBrowser icon in your toolbar (or press
Ctrl+Shift+Y) - Click Settings in the bottom navigation
- Select your Provider (e.g. Anthropic)
- Select your Model (e.g. claude-sonnet-4-5)
- Paste your API Key
- Click Save Settings
⚡ Free option: Select Ollama (Local — Free), click Test Connection, and install any model with
ollama pull llama3.2. No API key needed.
- Install Ollama from ollama.ai
- Start the server:
ollama serve - Pull a model:
ollama pull llama3.2(ormistral,deepseek-r1,qwen2.5) - In OpenBrowser Settings → Provider → Ollama (Local — Free)
- Click Test Connection — OpenBrowser will auto-discover your installed models
Open the side panel, type your instruction, and press Enter.
"Go to YouTube and search for the latest OpenAI keynote"
"Fill in the contact form with: name=John Smith, email=john@example.com"
"Summarize all my open tabs and tell me the most important one"
"Extract the pricing table from this page as CSV"
For complex tasks, the AI creates a live task plan you can watch execute step by step:
"Research the top 5 Python web frameworks, compare their GitHub stars,
performance benchmarks, and learning curves, then save a comparison
table to a file"
"Fill in this checkout form.
Name: Jane Doe, Email: jane@example.com,
Country: Canada, Province: Ontario, ZIP: M5V 3A8"
OpenBrowser matches fields semantically — "given name", "first name", "prénom" all resolve correctly.
"I have three laptop tabs open. Compare their RAM, storage, price,
display size, and battery life in a table"
Press Ctrl+Shift+P on any page → type your instruction → press Enter.
No side panel needed — the agent runs in the background and opens automatically.
| Shortcut | Action |
|---|---|
Ctrl+Shift+Y |
Toggle side panel |
Ctrl+Shift+P |
Quick command palette (on page) |
Enter |
Send message |
Shift+Enter |
New line in input |
Escape |
Stop running agent |
Ctrl+K |
Open prompt templates |
Ctrl+M |
Open memory dashboard |
Ctrl+N |
New conversation |
Ctrl+? or ? |
Show all shortcuts |
Alt+1–5 |
Switch to Chat / History / Files / Macros / Settings |
Check Settings → Max Steps. The default is 20. For complex tasks, increase it to 50 or 100. Each step uses one API call.
- Make sure you copied the full key including any prefix (e.g.
sk-ant-...) - Check that your account has credits or is on an active plan
- For Anthropic free tier: set RPM = 5 and RPD = 25 in Settings → Rate Limits to stay within limits
Some pages (bank websites, certain Chrome pages) block extensions from injecting scripts. This is a browser security restriction and cannot be overridden. Try navigating to the page manually first, then giving the instruction.
See the Using Ollama section above. Ollama runs a local HTTP server on port 11434. OpenBrowser communicates with it the same way it does with cloud providers — no data leaves your machine.
No. OpenBrowser is a pure client-side extension. Your API keys, conversation history, memories, and files are stored only in your browser's local storage (chrome.storage.local and IndexedDB). No analytics, no telemetry, no cloud sync.
This is an Anthropic model quirk — some models fall back to an XML-style tool-calling format. OpenBrowser detects and handles all known fallback formats automatically. You should never see raw <function=...> blocks in the chat; if you do, please open an issue.
Open Chrome DevTools on the side panel (right-click → Inspect), then run:
chrome.storage.local.get(null, data => console.log(JSON.stringify(data)));Copy the output. To restore, paste it back and use chrome.storage.local.set(...).
Make sure Edge Glow is enabled in Settings → Features. The glow injects a fixed overlay onto the active tab's DOM — it won't appear on chrome:// pages or extension pages.
Yes! Select Custom API as the provider and enter any OpenAI-compatible base URL. This works with LM Studio, vLLM, LocalAI, Jan, and most self-hosted inference servers.
The primary model handles all requests normally. If it returns a 429 (rate limit), 402 (payment required), or a quota error, OpenBrowser automatically switches to the backup model for the rest of that run. The switch is seamless — the task continues without interruption.
OpenBrowser/
├── manifest.json MV3 manifest — permissions, commands, CSP
├── background.js Service worker: sidebar toggle, context menu,
│ page-change detector, macro scheduler, quick palette
├── sidepanel.html Main panel UI (HTML + embedded CSS)
├── sidepanel.js ~3500 lines: agent loop, 50+ tools, streaming,
│ VFS, themes, Mermaid, memory, macros
├── welcome.html Onboarding page (shown on first install)
├── welcome.js Open Panel button handler
├── sandbox.html Isolated iframe for safe JS evaluation
├── content-scripts/
│ └── content.js DOM interaction layer (injected on demand)
└── icons/ Extension icons (16, 32, 48, 128px)
How the agent loop works:
- User sends a message →
runAgent()starts - The full conversation history + system prompt →
callAIStreaming() - AI response streams in token by token → rendered live in the chat
- If the response contains tool calls →
executeTool()runs each one - Tool results feed back into the conversation → next AI call
- Loop continues until
finishtool, max steps reached, or user stops
Contributions are warmly welcome! Here's how to get involved:
- Check the existing issues first
- Open a new issue with:
- Chrome version (
chrome://version) - OpenBrowser version (visible in Settings)
- Provider and model being used
- Steps to reproduce
- What you expected vs. what happened
- Console errors (right-click the panel → Inspect → Console)
- Chrome version (
Open an issue with the enhancement label. Describe the use case, not just the feature — "I want to do X but currently can't because Y" is more useful than "Add feature Z".
- Fork the repository
- Clone your fork:
git clone https://github.com/YOUR_NAME/OpenBrowser.git - Make your changes — the extension loads directly from the folder, no build step needed
- Test by loading the folder as an unpacked extension
- Open a pull request with a clear description of what changed and why
- All JS is vanilla ES2022 — no build tools, no bundler, no dependencies
- New tools go in the
TOOLSarray (with a description) andexecuteTool()switch - Follow existing naming conventions:
camelCasefor functions,snake_casefor tool names - Keep tool descriptions accurate — they go directly into the API request
- New settings fields need entries in both
loadSettingsUI()and the save handler
- Add an entry to the
PROVIDERSobject insidepanel.js - Include:
name,baseUrl,models[],format('anthropic'|'openai'|'gemini'),requiresKey,keyPlaceholder,keyHint - Add the provider to both select dropdowns in
sidepanel.html - If it needs a custom request format, add a branch in
buildProviderRequest()
- Firefox/Edge support (Manifest V3 side panel API is Chrome-only today)
- Extension sync across devices via Chrome Sync
- Plugin/extension API for third-party tools
- Session replay — watch the agent's actions as a visual recording
- Headless mode — run agents without the side panel open
- Multi-agent — spawn parallel agents for different tabs
MIT License — see LICENSE for details.
You are free to use, modify, and distribute OpenBrowser for any purpose. If you build something cool with it, a mention or a star is appreciated but never required.
- Anthropic for Claude, the model that powers most of our testing
- Ollama for making local LLMs accessible to everyone
- Do Browser for the inspiration
- Mermaid.js for diagram rendering
- Every contributor who has reported bugs, suggested features, or submitted PRs
Built with ❤️ and 🤖 · github.com/Prof-MAN9/OpenBrowser
