Persistent local Chromium that a human and an LLM share. The LLM drives most flows; the human steps in for 2FA, judgment calls, and approvals. Sessions stay warm forever.
Most browser-automation tools today fall into two camps:
- Fully autonomous agents (Browser Use, Stagehand, Skyvern, Multi-On, Open Operator) — spin up their own headless browser, log in with stored credentials, try to complete tasks unsupervised. Great for benchmarks; brittle on real-world flows that hit 2FA, anti-bot, or judgment calls. And the "stored credentials in someone else's cloud" model is a non-starter for serious operational work.
- Pure record-replay (Playwright Codegen, Selenium IDE) — fast to build, no AI in the loop, breaks on the first DOM change.
krake_browser is the third camp: symbiosis.
A long-lived Chromium runs against your own user profile. You're already logged into WhatsApp, Instagram, LinkedIn, your FDA portal, your Google Workspace, whatever. The LLM attaches over CDP, runs a recipe, and pauses with an explicit human_intervention action whenever it needs you — to solve a CAPTCHA, complete a 2FA, approve a draft message, or take over a step that's easier done by hand.
The key property: session inheritance in both directions.
- You 2FA into the FDA portal once; the LLM can now operate FDA pages.
- The LLM keeps your WhatsApp Web session warm indefinitely; you skip the QR-scan dance every Monday.
- When LinkedIn anti-bot challenges you, the LLM hands off, you click through, and the recipe resumes.
+----------------------+ +-------------------------+
| Chat / LLM Client |<-MCP-->| krake_browser engine |
| (Claude, Cursor, | | (Sinatra + Playwright) |
| custom UI) | +-------------------------+
+----------------------+ |
| CDP
v
+--------------------------+
| Persistent Chromium |
| (--user-data-dir=...) |
| Tabs: WA / IG / FDA /.. |
+--------------------------+
^
|
Human, watching
+ co-piloting
Recipes (sequences of click / insert / wait / human_intervention / etc.) come from sibling repos:
- krake_recipes — generic public recipes (WhatsApp, IG, LinkedIn, FDA, …). Community-contributable.
- TrueSightDAO/tdg_recipes — DAO-specific recipes (Partner Follow-ups, Edgar contribution check-ins, …) that consume the same engine.
Stage 0 — scaffold. This repo currently contains the design:
- README.md — what and why (this file)
- ARCHITECTURE.md — components and how they wire together
- DSL.md — recipe format, extending the Krake DSL from 2014 with formalized
human_intervention
The engine itself is being built across subsequent commits. See the roadmap below.
- Recipe DSL spec (extends Krake DSL from 2014 with formalized
human_intervention) - CDP attach to externally-launched Chromium
- Persistent profile management (
--user-data-dir, restart-safe) - Recipe executor (consumes recipes from local clones of krake_recipes / tdg_recipes)
- Chat-handoff transport (MCP server first; WebSocket fallback)
- Tab coordination (LLM works in named tabs; warns before touching human's active tab)
- Localhost auth on the CDP/MCP port (keys-to-the-kingdom; not optional)
- Reference client: minimal CLI that proxies a recipe + prompts user when
human_interventionfires
Direct evolution of the Krake.io DSL Gary Teh designed in 2014 for the original Krake scraper platform. The action vocabulary (click, insert, scroll_bottom, wait, trigger_change, solve_captcha) is preserved verbatim; the only new primitive is human_intervention with a prompt string and an ack channel, which formalizes what solve_captcha was already half-doing.
If you're familiar with the original Krake DSL, you already know 90% of krake_browser. See DSL.md for the delta.
The CDP port that the engine attaches to is keys-to-the-kingdom — it exposes every cookie and every active session in your browser to anything that can reach the port. The engine binds CDP to localhost only and requires a token on the MCP/WebSocket surface. Do not expose port 9222 to your LAN. Do not run this on a multi-user machine where other accounts are untrusted.
MIT — see LICENSE.