Crawler

A small, opinionated browser agent that lives in your Chrome side panel.

Think of it as a free, hackable cousin of Perplexity Comet: a chat that can actually see the page you're on, answer questions about it, and — when you let it — drive the browser for you. Click things. Fill forms. Read across tabs. Pull structured data out of messy HTML.

It's also stubbornly cheap to run. Crawler doesn't pin you to one provider. You drop in a handful of free API keys (NVIDIA, OpenRouter, Groq, Gemini, Cerebras, Mistral, or your own Ollama box) and it juggles them for you. When one rate-limits, it slides to the next. You don't notice. You don't pay.

No build step, no npm install, no bundler. Vanilla JS + ES modules. The whole thing is ~60 KB of code you can read in an afternoon.

Get it running in 30 seconds

Open chrome://extensions.
Flip Developer mode on (top-right corner).
Click Load unpacked → pick this crawler/ folder.
Pin the puzzle-piece icon so it's one click away.
Hit Ctrl+Shift+U (or Cmd+Shift+U on Mac) to pop the side panel.
Click the gear, paste an API key, you're done.

The toolbar icon is Chrome's default puzzle piece — we don't ship custom art. If you want branding, drop 16/48/128 px PNGs into icons/ and add an icons block to manifest.json.

How it actually works (the fun part)

Most "use any AI" tools make you pick a provider in a dropdown. Crawler doesn't. It builds one giant rotation list out of every (provider × API key × model) you've configured, and walks it on every request.

Here's the life of one message:

Planner asks the pool: "give me a candidate."
We grab the next one that isn't sitting in cooldown.
We hit its endpoint, with its key, asking for its model.
Worked? Great — return the answer, remember the winner.
HTTP 429? Park that candidate for the duration of Retry-After (defaults to 60s) and move on.
HTTP 401 / 402? That key is dead or out of credits — bench it for 24 hours so we stop pestering it.
HTTP 5xx? 15–30s timeout. Try the next one.

Because every provider is in the same list, failover is cross-provider, not just cross-key. NVIDIA hits its quota → next call lands on OpenRouter free → that 429s → over to Groq → and you're none the wiser. The side panel quietly logs each rotation so you can see what's happening:

⟳ openrouter/deepseek-v3:free key #2 → HTTP 429 (cooldown 30s). Rotating…

Picking providers

Crank the gear icon open. There's one card per provider — flip on the ones you have keys for, paste the keys (one per line), tick the models you want in rotation. Use free-tier models only is on by default, so you have to opt in to spend money.

Provider	Free?	Where to get a key
NVIDIA NIM	Yes — 1000 credits/key/month	build.nvidia.com
OpenRouter	Yes — `:free`-tagged models	openrouter.ai/keys
Groq	Yes — generous, ridiculously fast	console.groq.com/keys
Cerebras	Yes — fastest tokens/sec on the planet	cloud.cerebras.ai
Google Gemini	Yes — Flash models are free	aistudio.google.com/apikey
Mistral	Yes — the small models	console.mistral.ai/api-keys
Ollama	Yes — local, no key needed	localhost only
OpenAI	No	platform.openai.com
Anthropic	No	console.anthropic.com
xAI / Custom	Varies	per provider

My recommended free stack

If you just want this to work, set up:

NVIDIA — 2 or 3 keys
OpenRouter — 2 or 3 keys (only :free models ticked)
Groq — 1 or 2 keys

That's roughly 30+ candidates in rotation. Hit Test Pool in Settings — it'll send a pong ping and tell you which candidate answered.

⚠️ What we deliberately won't ship: GitHub Copilot login and ChatGPT Plus / Pro login. Both depend on private internal APIs that violate their providers' terms — using them tends to get accounts banned. If you have ChatGPT Plus and want OpenAI models here, use a platform.openai.com API key (it's billed separately).

Two modes, same loop

Assistant mode — you ask, it answers. The current page's text gets pulled in as context (sanitised first), and the model replies. Good for "summarise this," "what's this article actually saying," "extract the prices into JSON."

Agent mode — you set a goal, it acts. The model has to call present_plan first (a one-line summary plus numbered steps) so you can see what it intends to do before it touches anything. Approve the plan, it proceeds. Reject it, it stops and says "ok."

Under the hood it's the same ReAct loop. The only difference is which tools are exposed.

Tools the agent can call

A small, deliberately lean set. Snapshots are how the model "sees" the page — it gets back a numbered list of every interactive element, then operates them by index, not by selector.

present_plan — show the plan, wait for user approval
snapshot — numbered list of links / buttons / inputs on the active tab
act({ index, action }) — click / type / select / hover that element
navigate, open_tab, list_tabs — tab control
read_page, extract, click, type_text, scroll, wait_for — the classic selector-based toolset (used in non-lean mode)

Adding a new tool is one append in lib/tools.js: a JSON schema and a handler. The model picks it up on the next run.

Try these prompts

Drop these into the side panel to get a feel for it:

"Summarise this page in 5 bullets."
"Extract every product name and price into a JSON array."
"Click the 'Sign in' link and tell me what fields are on the next page."
"Search Hacker News for 'agentic browsers' and list the top 5 results with links."
"Open the first three results in new tabs and tell me which one mentions pricing."

What's where

manifest.json             MV3 manifest — sidePanel + content scripts + Ctrl+Shift+U
background.js             service worker; port routing + agent loop driver
content.js                isolated-world DOM driver (click / type / scroll / extract)
sidepanel.{html,css,js}   chat UI; long-lived port to the worker
options.{html,js}         provider + key + safety settings
lib/llm.js                OpenAI + Anthropic streaming clients (SSE parsed by hand)
lib/pool.js               unified candidate rotation + cooldown bookkeeping
lib/presets.js            provider definitions (endpoint, models, free vs paid)
lib/planner.js            ReAct loop: plan → call tool → observe → repeat
lib/tools.js              tool registry exposed to the LLM
lib/injectionFilter.js    prompt-injection heuristics + page sanitiser
lib/storage.js            chrome.storage wrappers (Settings, History)

The side panel opens a long-lived chrome.runtime.connect port to the service worker. Each user message becomes a run event; the planner streams delta, action, observation, confirm, final, done, and error events back over the same port.

The content script is auto-injected by the manifest and re-injected on demand from lib/tools.js — that second path covers reload races and activeTab permission edge cases.

Safety stuff (please read this part)

This thing can click buttons on your behalf. A few guardrails:

Confirm before clicks is on by default. The agent has to ask before every click / type / navigation. Toggle off in Settings if you trust it for a particular run.
Plan approval — agent mode forces a present_plan call up front. No plan = no browser control.
Prompt-injection scanner — heuristic regex scan in lib/injectionFilter.js flags page text that's trying to override the system prompt or pull out secrets, strips <script> blocks, and wraps page content in an UNTRUSTED_PAGE_CONTENT block before forwarding it to the model.
No password / 2FA autofill — ever. The system prompt explicitly tells the agent to ask the user for those instead of typing them.
Local-first storage. API keys live in chrome.storage.local, sent to whichever provider you configured and nowhere else. Clear the lot from the side panel any time.

Extending it

It's small enough that "extending" mostly means "edit the file."

New tool → append to TOOLS in lib/tools.js. JSON schema + a handler function. The model gets it on the next run.
New connector (Gmail, Calendar, Notion…) → add identity to manifest.json, drop a lib/connectors/<name>.js that uses chrome.identity.launchWebAuthFlow, expose its functions as tools.
Local model → drop a WASM runtime (e.g. MLC Web-LLM) into lib/ and branch on settings.strictLocal inside lib/llm.js.
New provider → one entry in lib/presets.js is usually enough, since most providers speak OpenAI-compatible JSON.

Known rough edges

No icons shipped — Chrome falls back to the default puzzle piece.
The injection filter is heuristic, not a trained classifier. For production, swap in a small ONNX/WASM model.
extract and click-by-text don't traverse iframes or Shadow DOM yet (all_frames: false in the manifest, deliberately, to keep behaviour predictable).
OAuth connectors are scaffolding only — no Gmail / Calendar / Notion out of the box. See Extending if you want to wire one up.

Version

0.4.0 — pre-1.0, expect breaking changes between minor versions while the tool surface settles.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
lib		lib
README.md		README.md
background.js		background.js
content.js		content.js
manifest.json		manifest.json
options.html		options.html
options.js		options.js
sidepanel.css		sidepanel.css
sidepanel.html		sidepanel.html
sidepanel.js		sidepanel.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawler

Get it running in 30 seconds

How it actually works (the fun part)

Picking providers

My recommended free stack

Two modes, same loop

Tools the agent can call

Try these prompts

What's where

Safety stuff (please read this part)

Extending it

Known rough edges

Version

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crawler

Get it running in 30 seconds

How it actually works (the fun part)

Picking providers

My recommended free stack

Two modes, same loop

Tools the agent can call

Try these prompts

What's where

Safety stuff (please read this part)

Extending it

Known rough edges

Version

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages