Webfn is the ultimate web data extraction tool designed specifically for AI Agents and Automated Pipelines.
Whether you need to bypass anti-bot protections, extract clean markdown from heavily javascript-rendered pages, or perform deep site crawls, Webfn provides a seamless and robust interface.
- Search the web and extract structured JSON results.
- Fetch rendered SPA pages and extract human-readable markdown.
- Crawl entire websites from sitemaps or internal links.
- Screenshot pages natively or via the cloud.
- Auto-Detect agent mode — outputs pure JSON when piped, rich CLI UI when in a terminal.
Webfn supports both Chrome and Lightpanda engines out of the box. End-to-end fetch benchmark (news.ycombinator.com):
| Engine | Average Time (ms) | Speedup |
|---|---|---|
| Chrome | 2630ms |
Baseline |
| Lightpanda | 2537ms |
1.04x faster |
Official Engine Benchmarks (Raw Execution):
- Memory Footprint: Lightpanda uses 9x less memory than Chrome.
- Raw Execution Speed: Lightpanda is up to 11x faster than Chrome.
# Install globally via npm
npm install -g webfn-cli
# Or via pnpm
pnpm add -g webfn-cliOnce installed, the webfn command is available globally:
webfn fetch https://example.com
webfn search "ai agents"No install needed with npx:
npx webfn-cli fetch https://example.com
npx webfn-cli screenshot https://example.com --fullWebfn works out of the box with Chrome/Chromium (auto-detected). For the best experience, you should also set up Lightpanda — a fast, lightweight headless browser that webfn uses by default.
Lightpanda is the default headless engine for search, fetch, and crawl commands. Without it, webfn falls back to your local Chrome installation.
Install Lightpanda:
Please refer to the official docs for installation instructions: https://lightpanda.io/docs/open-source/installation
After installing, verify it works:
webfn doctorIf you skip Lightpanda, all commands still work — webfn will automatically fall back to your local Chrome. Use
--engine chrometo always force Chrome.
Make sure Google Chrome or Chromium is installed. Webfn will auto-detect it. You can also point to a custom binary:
webfn fetch https://example.com --chrome /path/to/chromeEnsure you have Node.js installed, then:
pnpm install
pnpm buildVerify your environment dependencies:
pnpm dev doctorWebfn is built to be controlled by LLMs and Agents.
If you use an agent that supports the Skills CLI (like Claude Code, Cursor, Antigravity, or Trae), you can install the official webfn skill. This automatically provides your agent with full context on how to use the CLI efficiently, bypassing bot protections and extracting clean data.
# Install the webfn skill for your AI agent
npx skills add NormVg/webfnOnce installed, simply ask your agent to "look something up", "fetch a webpage", or "do research", and it will autonomously invoke webfn.
When stdout is piped (e.g., an AI agent calling the CLI), webfn automatically suppresses all progress bars and interactive UI, outputting a single, strictly formatted JSON envelope.
# Agent mode (auto-detected):
webfn fetch https://example.com | jq .page.title
# Force JSON in a TTY:
webfn fetch https://example.com --json
# Human-friendly rich output (default in terminal):
webfn fetch https://example.comAll JSON output follows a consistent format:
{
"ok": true,
"command": "fetch",
"page": { "title": "Example", "markdown": "..." },
"browser": { "engine": "lightpanda", "headless": true },
"files": [ ... ],
"storage": { ... }
}# Search for a topic
webfn search "ai agents"
# Search and download the full HTML/Markdown of the top results
webfn collect "ai agents"
# Fetch a specific page and convert to markdown
webfn fetch https://example.com
# Take a full-page screenshot
webfn screenshot https://example.com --full
# Crawl a site up to a specific depth
webfn crawl https://example.com --depth 2 --max-pages 50For comprehensive configuration and documentation, see docs.md.
Webfn leverages powerful open-source libraries:
- Puppeteer: For headless browser automation and CDP control.
- Lightpanda: For ultra-fast, memory-efficient headless execution.
- Cloudflare Browser Rendering: For edge-based, serverless page evaluation.
- Commander.js: For robust CLI routing and option parsing.
Contributions are always welcome!
- Fork the repository.
- Create a new branch (
git checkout -b feature/amazing-feature). - Commit your changes (
git commit -m 'Add some amazing feature'). - Push to the branch (
git push origin feature/amazing-feature). - Open a Pull Request.
Please ensure you run pnpm run format and verify your changes don't break the agent JSON output structures.
This project is licensed under the MIT License - see the LICENSE file for details.