Local-Web-Capture

Claude Code plugin for capturing web content from the user's own machine — so every request exits via the user's IP. Built for geo-restricted content (seeded with Israeli news + e-commerce) where hosted scrapers get blocked at the edge.

Principle

Cheapest method that works wins. Escalation ladder:

Rung	Method	Tool	When
1	Headless, static HTTP	Scrapling `Fetcher`	Default. Server-rendered articles, simple prices.
2	Headless, stealth	Scrapling `StealthyFetcher` (Camoufox)	Bot-blocked / Cloudflare.
2b	Headless, JS render	Scrapling `PlayWrightFetcher`	Needs JS, no bot block.
3	Real Chrome, logged in	`bb-browser`	Paywalled / session-bound.

Backups (installed, not default): webclaw, lightpanda, camofox-browser, stealth-browser-mcp, Playwright MCP.

Per-domain choice is cached in sites.yaml so the right rung is picked on repeat captures.

Skills

setup-tools — one-time install of Scrapling, bb-browser, data dir seeding.
scrape-article — Rungs 1–3, save markdown capture.
scrape-authenticated — Rung 4 only, real Chrome via bb-browser.
scrape-price — structured price snapshot for e-commerce URLs.
capture-translate — scrape + save a translated version (default he→en; arbitrary pairs supported).
batch-capture — list of URLs in, individual captures + human/agent summary reports out.
batch-to-pdf — compile a batch into a single typeset PDF (Typst) with source URL + capture date per article and page-numbered footer.
capture-list — browse/search/summarise the data directory.

Where captures are saved

Resolution order (see reference/save-location.md):

--out <path> explicit override.
Project-local — if the current working directory is inside a git repo, save to <repo_root>/captures/. Keeps research artefacts with the project they inform.
Global fallback — ~/local-web-capture/.

Layout is the same in both modes:

<root>/
├── articles/YYYY/MM/*.md         # markdown + frontmatter
├── prices/<domain>/*.json        # structured price snapshots
├── translations/YYYY/MM/*.md     # stacked translations
├── batches/<batch-id>/           # batch-capture output
│   ├── articles/
│   ├── report.md                 # human-summary
│   ├── report.agent.json         # agent-summary (strict schema)
│   └── report.pdf                # optional, via batch-to-pdf
├── sites.yaml                    # per-domain strategy cache
└── .cache/

See reference/data-layout.md for the full schema.

Toolkit evaluation

See toolkit-options.md for the shortlist of ~22 candidate browser/extractor projects ranked by stars with verdicts.

Important: MCP bias note

Once the plugin is validated end-to-end, remove Playwright from user-level MCP config. If it stays globally visible, Claude will pull toward it on every scrape request regardless of what the skill ladder says. Leaving it at project level (or installing only when needed) keeps the ladder honest.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude-plugin		.claude-plugin
assets/typst		assets/typst
reference		reference
skills		skills
README.md		README.md
toolkit-options.md		toolkit-options.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local-Web-Capture

Principle

Skills

Where captures are saved

Toolkit evaluation

Important: MCP bias note

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local-Web-Capture

Principle

Skills

Where captures are saved

Toolkit evaluation

Important: MCP bias note

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages