OpenBrowser — AI Browser Automation

By: Prof_MAN

Control your browser with plain English. Free, open-source, no subscriptions.

What is OpenBrowser?

OpenBrowser is a free, open-source Chrome extension that brings AI-powered browser automation to your fingertips — without subscriptions, usage limits, or data collection. Connect your own API key and describe what you want in plain English. OpenBrowser handles the rest.

"Book me a flight to NYC under $400." → Done.
"Compare the specs on these three laptops I have open." → Done.
"Fill in this form with my details and submit it." → Done.

Unlike SaaS-based browser agents, OpenBrowser runs entirely inside your browser. Your API keys and conversation history never touch a third-party server.

Features

🤖 Multi-Provider AI Support

Connect to 12 providers with your own API key:

Provider	Notes
Anthropic (Claude)	claude-sonnet-4-5, claude-opus-4, claude-haiku-4-5
OpenAI (GPT-4)	gpt-4o, gpt-4-turbo, gpt-3.5-turbo
Google Gemini	gemini-2.0-flash, gemini-1.5-pro
Groq	llama-3.3-70b, mixtral — fast free tier
Ollama (Local)	Fully offline — llama3.2, mistral, deepseek-r1
OpenRouter	200+ models via a single key
Cloudflare Workers AI	Low-latency edge inference
HuggingFace	Open-weight models
MiniMax / Moonshot / Qwen	Chinese providers with global access
Custom API	Any OpenAI-compatible endpoint

🧰 50+ Built-in Agent Tools

OpenBrowser gives the AI a full toolkit for web automation:

Navigation & Interaction

navigate — Go to any URL or site name ("YouTube", "my Gmail", "search for...")
click, type, scroll, select_option — Interact with any page element
open_tab, switch_tab, list_tabs — Manage multiple browser tabs

Content & Data

screenshot — Capture the current page
get_page_content, scrape_page — Extract structured page content
extract_data, export_data — Pull data into tables, CSV, or JSON
download_csv — Save extracted data as a file

Forms

smart_fill_form — Semantic form filling (understands "first name" = "given name" = "fname")
scan_forms — Discover all form fields on a page

Research & Analysis

summarize_tabs — Digest all open tabs at once
cross_site_research — Compare data across multiple open tabs simultaneously
auto_highlight — Highlight relevant passages based on your goal
remove_highlights — Clear all highlights

Planning

create_task_plan — Render a live visual checklist in the chat
update_task_step — Mark steps done/active/failed in real time
reason, think — Explicit chain-of-thought before acting

Memory & Knowledge

memorize, recall — Persist facts across sessions (optional)

Files (Virtual Filesystem)

write_file, read_file, list_files, delete_file — AI-managed file storage in IndexedDB

Bookmarks & Citations

save_bookmark — Auto-tagged smart bookmarking
show_bookmarks — Browse saved bookmarks with tag filtering
add_citation — APA/MLA/Chicago/URL format source collection
show_citations, clear_citations

Utilities

run_javascript — Execute arbitrary JS on the current page
browse_intent — Intent-based navigation ("find me a cheap flight")
wait — Pause for slow-loading pages
finish — Conclude the task with a summary

🌊 Streaming Responses

AI answers appear token by token, just like Claude.ai. No waiting for the full response.

🎤 Voice Input

Click the microphone button to dictate your instructions. Live transcription via the Web Speech API — no extra APIs, works offline.

⌨️ Quick Command Palette

Press Ctrl+Shift+P on any webpage to open a floating command palette. Type your instruction and press Enter — OpenBrowser opens the side panel and runs it automatically.

🔖 Macros & Scheduling

Save any task as a reusable macro. Run manually or schedule it to repeat automatically (every 15 min / 30 min / 1h / 6h / daily) using Chrome alarms.

🧠 Persistent Memory

The AI can memorize facts and recall them across different conversations and sessions. A visual Memory Dashboard lets you edit or delete stored entries.

📁 Virtual Filesystem

An IndexedDB-backed file store lets the AI generate, save, read, and delete files. A "Files" tab in the panel shows a tree view with download support.

🎨 Themes

Dark mode (default), Light mode, and a Custom mode with a full accent color picker and preset swatches.

🔷 Mermaid Diagrams

When the AI outputs a mermaid code block, it renders as a live diagram — flowcharts, sequence diagrams, ER diagrams, Gantt charts — styled to match the OpenBrowser palette.

📝 Prompt Templates

One-click access to saved prompts. 6 built-in templates (Summarize, Extract data, Fill form, etc.) plus unlimited custom templates. Open with Ctrl+K.

🖱 Context Menu

Right-click any page for instant OpenBrowser actions: Summarize, Ask about selected text, Translate, Save as citation, Screenshot, Fill forms.

📊 Rate Limiting

Set RPM and RPD caps to protect your API quota. OpenBrowser stops 5 calls before the limit with a warning.

🔄 Backup Model

Configure a secondary provider/model. If the primary hits its quota, the agent automatically switches mid-task.

Setup

Prerequisites

Google Chrome (version 114 or later)
An API key from at least one supported provider
(Or Ollama running locally for a fully free experience)

Installation

Option A — Install from source (recommended)

Download the latest release ZIP from the Releases page
(Or clone the repository)
Open Chrome and navigate to chrome://extensions
Enable Developer mode (toggle in the top-right corner)
Click Load unpacked and select the ctrl-browser/ folder
(Or drag-and-drop the unzipped folder)
The OpenBrowser icon will appear in your toolbar

Option B — Chrome Web Store
Coming soon.

Configuring Your First Provider

Click the OpenBrowser icon in your toolbar (or press Ctrl+Shift+Y)
Click Settings in the bottom navigation
Select your Provider (e.g. Anthropic)
Select your Model (e.g. claude-sonnet-4-5)
Paste your API Key
Click Save Settings

⚡ Free option: Select Ollama (Local — Free), click Test Connection, and install any model with ollama pull llama3.2. No API key needed.

Using Ollama (Fully Local & Free)

Install Ollama from ollama.ai
Start the server: ollama serve
Pull a model: ollama pull llama3.2 (or mistral, deepseek-r1, qwen2.5)
In OpenBrowser Settings → Provider → Ollama (Local — Free)
Click Test Connection — OpenBrowser will auto-discover your installed models

Usage

Basic Chat

Open the side panel, type your instruction, and press Enter.

"Go to YouTube and search for the latest OpenAI keynote"
"Fill in the contact form with: name=John Smith, email=john@example.com"  
"Summarize all my open tabs and tell me the most important one"
"Extract the pricing table from this page as CSV"

Multi-Step Tasks

For complex tasks, the AI creates a live task plan you can watch execute step by step:

"Research the top 5 Python web frameworks, compare their GitHub stars, 
 performance benchmarks, and learning curves, then save a comparison 
 table to a file"

Smart Form Filling

"Fill in this checkout form. 
 Name: Jane Doe, Email: jane@example.com, 
 Country: Canada, Province: Ontario, ZIP: M5V 3A8"

OpenBrowser matches fields semantically — "given name", "first name", "prénom" all resolve correctly.

Cross-Site Research

"I have three laptop tabs open. Compare their RAM, storage, price, 
 display size, and battery life in a table"

Quick Command Palette

Press Ctrl+Shift+P on any page → type your instruction → press Enter.
No side panel needed — the agent runs in the background and opens automatically.

Keyboard Shortcuts

Shortcut	Action
`Ctrl+Shift+Y`	Toggle side panel
`Ctrl+Shift+P`	Quick command palette (on page)
`Enter`	Send message
`Shift+Enter`	New line in input
`Escape`	Stop running agent
`Ctrl+K`	Open prompt templates
`Ctrl+M`	Open memory dashboard
`Ctrl+N`	New conversation
`Ctrl+?` or `?`	Show all shortcuts
`Alt+1–5`	Switch to Chat / History / Files / Macros / Settings

Help & FAQs

Why does the agent stop after a few steps?

Check Settings → Max Steps. The default is 20. For complex tasks, increase it to 50 or 100. Each step uses one API call.

My API key isn't working

Make sure you copied the full key including any prefix (e.g. sk-ant-...)
Check that your account has credits or is on an active plan
For Anthropic free tier: set RPM = 5 and RPD = 25 in Settings → Rate Limits to stay within limits

The agent can't interact with the page

Some pages (bank websites, certain Chrome pages) block extensions from injecting scripts. This is a browser security restriction and cannot be overridden. Try navigating to the page manually first, then giving the instruction.

How do I use Ollama with OpenBrowser?

See the Using Ollama section above. Ollama runs a local HTTP server on port 11434. OpenBrowser communicates with it the same way it does with cloud providers — no data leaves your machine.

Does OpenBrowser collect any data?

No. OpenBrowser is a pure client-side extension. Your API keys, conversation history, memories, and files are stored only in your browser's local storage (chrome.storage.local and IndexedDB). No analytics, no telemetry, no cloud sync.

Why does the AI sometimes use `<function=...>` syntax in its response?

This is an Anthropic model quirk — some models fall back to an XML-style tool-calling format. OpenBrowser detects and handles all known fallback formats automatically. You should never see raw <function=...> blocks in the chat; if you do, please open an issue.

How do I back up my macros, memory, and settings?

Open Chrome DevTools on the side panel (right-click → Inspect), then run:

chrome.storage.local.get(null, data => console.log(JSON.stringify(data)));

Copy the output. To restore, paste it back and use chrome.storage.local.set(...).

The page glow effect isn't appearing

Make sure Edge Glow is enabled in Settings → Features. The glow injects a fixed overlay onto the active tab's DOM — it won't appear on chrome:// pages or extension pages.

Can I add my own provider?

Yes! Select Custom API as the provider and enter any OpenAI-compatible base URL. This works with LM Studio, vLLM, LocalAI, Jan, and most self-hosted inference servers.

What's the difference between the backup model and the primary model?

The primary model handles all requests normally. If it returns a 429 (rate limit), 402 (payment required), or a quota error, OpenBrowser automatically switches to the backup model for the rest of that run. The switch is seamless — the task continues without interruption.

Architecture

OpenBrowser/
├── manifest.json          MV3 manifest — permissions, commands, CSP
├── background.js          Service worker: sidebar toggle, context menu,
│                          page-change detector, macro scheduler, quick palette
├── sidepanel.html         Main panel UI (HTML + embedded CSS)
├── sidepanel.js           ~3500 lines: agent loop, 50+ tools, streaming,
│                          VFS, themes, Mermaid, memory, macros
├── welcome.html           Onboarding page (shown on first install)
├── welcome.js             Open Panel button handler
├── sandbox.html           Isolated iframe for safe JS evaluation
├── content-scripts/
│   └── content.js         DOM interaction layer (injected on demand)
└── icons/                 Extension icons (16, 32, 48, 128px)

How the agent loop works:

User sends a message → runAgent() starts
The full conversation history + system prompt → callAIStreaming()
AI response streams in token by token → rendered live in the chat
If the response contains tool calls → executeTool() runs each one
Tool results feed back into the conversation → next AI call
Loop continues until finish tool, max steps reached, or user stops

Contributing

Contributions are warmly welcome! Here's how to get involved:

Reporting Bugs

Check the existing issues first
Open a new issue with:
- Chrome version (chrome://version)
- OpenBrowser version (visible in Settings)
- Provider and model being used
- Steps to reproduce
- What you expected vs. what happened
- Console errors (right-click the panel → Inspect → Console)

Suggesting Features

Open an issue with the enhancement label. Describe the use case, not just the feature — "I want to do X but currently can't because Y" is more useful than "Add feature Z".

Submitting Code

Fork the repository
Clone your fork: git clone https://github.com/YOUR_NAME/OpenBrowser.git
Make your changes — the extension loads directly from the folder, no build step needed
Test by loading the folder as an unpacked extension
Open a pull request with a clear description of what changed and why

Code Guidelines

All JS is vanilla ES2022 — no build tools, no bundler, no dependencies
New tools go in the TOOLS array (with a description) and executeTool() switch
Follow existing naming conventions: camelCase for functions, snake_case for tool names
Keep tool descriptions accurate — they go directly into the API request
New settings fields need entries in both loadSettingsUI() and the save handler

Adding a New Provider

Add an entry to the PROVIDERS object in sidepanel.js
Include: name, baseUrl, models[], format ('anthropic'|'openai'|'gemini'), requiresKey, keyPlaceholder, keyHint
Add the provider to both select dropdowns in sidepanel.html
If it needs a custom request format, add a branch in buildProviderRequest()

Roadmap

Firefox/Edge support (Manifest V3 side panel API is Chrome-only today)
Extension sync across devices via Chrome Sync
Plugin/extension API for third-party tools
Session replay — watch the agent's actions as a visual recording
Headless mode — run agents without the side panel open
Multi-agent — spawn parallel agents for different tabs

License

MIT License — see LICENSE for details.

You are free to use, modify, and distribute OpenBrowser for any purpose. If you build something cool with it, a mention or a star is appreciated but never required.

Acknowledgements

Anthropic for Claude, the model that powers most of our testing
Ollama for making local LLMs accessible to everyone
Do Browser for the inspiration
Mermaid.js for diagram rendering
Every contributor who has reported bugs, suggested features, or submitted PRs

Built with ❤️ and 🤖 · github.com/Prof-MAN9/OpenBrowser

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
content-scripts		content-scripts
icons		icons
LICENSE.md		LICENSE.md
README.md		README.md
background.js		background.js
manifest.json		manifest.json
sandbox-render.html		sandbox-render.html
sandbox.html		sandbox.html
sidepanel.html		sidepanel.html
sidepanel.js		sidepanel.js
welcome.html		welcome.html
welcome.js		welcome.js

Folders and files

Latest commit

History

Repository files navigation

OpenBrowser — AI Browser Automation

By: Prof_MAN

Contents

What is OpenBrowser?

Features

🤖 Multi-Provider AI Support

🧰 50+ Built-in Agent Tools

🌊 Streaming Responses

🎤 Voice Input

⌨️ Quick Command Palette

🔖 Macros & Scheduling

🧠 Persistent Memory

📁 Virtual Filesystem

🎨 Themes

🔷 Mermaid Diagrams

📝 Prompt Templates

🖱 Context Menu

📊 Rate Limiting

🔄 Backup Model

Setup

Prerequisites

Installation

Configuring Your First Provider

Using Ollama (Fully Local & Free)

Usage

Basic Chat

Multi-Step Tasks

Smart Form Filling

Cross-Site Research

Quick Command Palette

Keyboard Shortcuts

Help & FAQs

Why does the agent stop after a few steps?

My API key isn't working

The agent can't interact with the page

How do I use Ollama with OpenBrowser?

Does OpenBrowser collect any data?

Why does the AI sometimes use <function=...> syntax in its response?

How do I back up my macros, memory, and settings?

The page glow effect isn't appearing

Can I add my own provider?

What's the difference between the backup model and the primary model?

Architecture

Contributing

Reporting Bugs

Suggesting Features

Submitting Code

Code Guidelines

Adding a New Provider

Roadmap

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

Why does the AI sometimes use `<function=...>` syntax in its response?