Pull any public docs site into local markdown files.
$ webpull https://docs.example.com
⚡ webpull · 16 workers
docs.example.com → ./docs.example.com
●●●·●●●●·●●●●●●●·
├─ ✓ getting-started/installation.md
├─ ✓ api/authentication.md
├─ ✓ guides/deployment.md
█████████████░░░░░░░ 68% 102/150 · 6p/s · 17.2s
bun install -g webpullwebpull <url> [options]
Options:
-o, --out <dir> Output directory (default: ./<hostname>)
-m, --max <n> Max pages to pull (default: 500)
# Pull React docs
webpull https://react.dev/reference
# Custom output dir, limit to 100 pages
webpull https://docs.python.org -o ./python-docs -m 100- Discovers pages via sitemap.xml, nav link extraction, or link crawling
- Fetches in parallel using a worker pool sized to your CPU cores
- Converts to markdown using Defuddle for intelligent content extraction
- Writes to disk preserving the URL path structure with YAML frontmatter
Each markdown file includes metadata:
---
title: "Getting Started"
url: "https://docs.example.com/getting-started"
---- Bun runtime
MIT