Skip to content

Mrbaeksang/deepcloak

🛡️ DeepCloak

The deep-research agent that reads the pages others can't.

Cloudflare · Datadome · Turnstile · reCAPTCHA — it walks straight through them.

PyPI CI License: MIT Python 3.11+ MCP native PRs welcome GitHub stars Watch on YouTube

English · 한국어 · 简体中文

Quickstart · How it works · Use from an agent (MCP) · Why we built it · Changelog

DeepCloak running: it detects a Cloudflare Turnstile, escalates, and bypasses it — then writes a cited report


The problem

You ask a research tool a question. Half the best sources sit behind a Bot Wall — Cloudflare, Datadome, a Turnstile, a reCAPTCHA. Every other tool gets a 403, silently drops those pages, and hands you a thinner report. You never even learn what it missed.

What DeepCloak does

When a plain fetch hits a Bot Wall, DeepCloak Escalates that one URL to a Stealth Fetch and Bypasses the wall — recovering the content other agents abandon. Then it tells you, at the bottom of every report, exactly how many walls it broke through.

It's a thin, local-first orchestrator over two great projects: local-deep-research (the research loop) and CloakBrowser (the stealth browser). Use it as a CLI, an MCP server, or a Claude skill. MIT.

🌑 Why we built this

The open web is quietly closing. More of the best writing now sits behind a bot check, and AI research agents — the tools we increasingly trust to read the web for us — go blind at exactly those doors, without ever saying so. A report that silently skips every walled source isn't neutral; it's wrong in a way you can't see.

DeepCloak's stance is simple: your agent should be able to read what a person with a browser can read — and it should be honest about how it got there. So it Bypasses the wall when it has to, keeps everything local (no query or page leaves your machine), and prints an Evidence Record of every wall it crossed. Capability and transparency, MIT-licensed, no lock-in.

✨ Why it's different

Plain deep research DeepCloak
Reads the open web
Reads Cloudflare / Datadome / Turnstile / reCAPTCHA pages dropped silently Bypassed
Tells you which sources were walled ✅ Evidence Record
Local-first (no API key required)
Fast on open pages plain-first, stealth only when needed

Verified live — not mocked. The clip above is an unedited screen recording (captured with ffmpeg, no compositing) of a real deepcloak run against a local LLM (Qwen) + SearXNG — no API key. It Escalates on each Bot Wall and Bypasses 8 Cloudflare/Turnstile walls in one pass, then writes a cited report. Full clip: docs/media/demo-real.mp4; a raw asciinema session is also kept at docs/media/demo.cast. Wall counts vary per run (8–20) because the open web does.

🚀 Quickstart

pip install deepcloak
deepcloak setup                       # one-time: downloads the stealth browser
export OPENAI_API_KEY=...             # or ANTHROPIC_API_KEY / GEMINI_API_KEY — or --provider ollama
deepcloak "How does Cloudflare Turnstile detect bots?" --depth detailed --out report.md

You get a cited report.md ending with a 🛡️ Bypassed N bot-walled sources section, plus a report.md.evidence.json sidecar.

🧠 How it works

search (DuckDuckGo, no setup) ─▶ candidate URLs
        │
        ▼  for each page:
   plain fetch ─▶ Bot Wall detected? ──no──▶ use it (fast)
                        │ yes
                        ▼
                  Escalate ─▶ Stealth Fetch (CloakBrowser) ─▶ Bypass
        │
        ▼
research loop (local-deep-research) ─▶ cited report + Evidence Records

Stealth is heavy, so DeepCloak tries a cheap plain fetch first and only launches the stealth browser when it actually detects a Bot Wall (--stealth auto, the default). Use --depth detailed/report to fetch full pages where Bypasses happen.

🤖 Connect it to your agent (MCP)

DeepCloak runs as a stdio MCP server exposing deep_research(query, depth), quick_summary(query), and get_evidence(run_id).

Claude Code — add to your project's .mcp.json (an example ships in this repo):

{ "mcpServers": { "deepcloak": { "command": "deepcloak", "args": ["mcp"] } } }

Codex — add to ~/.codex/config.toml:

[mcp_servers.deepcloak]
command = "deepcloak"
args = ["mcp"]

Then your agent can call deep_research and read bot-walled sources directly. Prefer a slash-style skill? Drop skill/SKILL.md into ~/.claude/skills/deepcloak/.

⚙️ Configuration

Flag Default Notes
--depth detailed quick / detailed / report
--engine duckduckgo searxng / auto
--stealth auto always / off
--provider / --model auto-detected OPENAIANTHROPICGEMINI, or ollama
--respect-robots off honor robots.txt
--proxy SOCKS5 for the Stealth Fetch

⚠️ Responsible use

DeepCloak Bypasses bot-detection. You are responsible for having the right to access whatever you fetch. robots.txt is ignored by default; pass --respect-robots to honor it (ADR-0002). Don't use it to violate sites' terms or the law.

🗺️ Roadmap

  • More Bot Wall signatures + smarter Escalation heuristics
  • More search backends beyond DuckDuckGo / SearXNG
  • Cache Bypassed pages across runs
  • Richer Evidence Record export (HTML / JSON schema)

Ideas welcome — start a Discussion or open a feature request.

🛠️ Built on

local-deep-research (MIT) + CloakBrowser (MIT), via pip — no vendored code. Domain glossary in CONTEXT.md; design decisions in docs/adr/; contributing guide in CONTRIBUTING.md.

📄 License

MIT — see LICENSE and NOTICE.

If DeepCloak read a page your last tool gave up on, drop a ⭐ — it helps others find it.

Built by Mrbaeksang · baeksang.dev · contact@baeksang.dev

Star History Chart

About

Local-first deep research agent that reads the whole web — even pages behind Cloudflare, Datadome, Turnstile & reCAPTCHA. Stealth fetch + cited reports. MCP-native, MIT.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors