Skip to content

Dhravya/webpull

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webpull

Pull any public docs site into local markdown files.

$ webpull https://docs.example.com

  ⚡ webpull · 16 workers
  docs.example.com → ./docs.example.com

  ●●●·●●●●·●●●●●●●·
  ├─ ✓ getting-started/installation.md
  ├─ ✓ api/authentication.md
  ├─ ✓ guides/deployment.md
  █████████████░░░░░░░ 68% 102/150 · 6p/s · 17.2s

Install

bun install -g webpull

Usage

webpull <url> [options]

Options:
  -o, --out <dir>   Output directory (default: ./<hostname>)
  -m, --max <n>     Max pages to pull (default: 500)

Examples

# Pull React docs
webpull https://react.dev/reference

# Custom output dir, limit to 100 pages
webpull https://docs.python.org -o ./python-docs -m 100

How it works

  1. Discovers pages via sitemap.xml, nav link extraction, or link crawling
  2. Fetches in parallel using a worker pool sized to your CPU cores
  3. Converts to markdown using Defuddle for intelligent content extraction
  4. Writes to disk preserving the URL path structure with YAML frontmatter

Each markdown file includes metadata:

---
title: "Getting Started"
url: "https://docs.example.com/getting-started"
---

Requirements

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors