SiteSavvy v0.6.0
Capture the web, your way.
v0.6.0 completes the feature set with 7 new modules covering pagination, authentication, proxy/Tor, stealth, recipes, docs-site mode, and offline full-text search — on top of v0.5.0's AI/RAG/MCP capabilities.
Installation
pip install sitesavvyOr download the stand-alone binary for your OS (no Python required) from the assets below.
What's new in v0.6.0
| Feature | Flag | Module |
|---|---|---|
| 📄 Pagination awareness | --follow-pagination (default on) |
pagination.py |
| 🔐 Authenticated crawling | --login-url / --login-user / --login-pass |
auth.py |
| 🌐 Proxy / Tor / SOCKS5 | --proxy http://... or socks5://... |
proxies.py |
| 🥸 Stealth mode | --stealth |
stealth.py |
| 🍳 Recipe mode → cookbook EPUB | --recipe-mode |
recipe.py |
| 📚 Docs-site mode | --docs-mode |
docs_mode.py |
| 🔍 Offline full-text search | --offline-search |
offline_search.py |
Quick examples
# Offline-searchable mirror
sitesavvy crawl https://example.com --offline-search --format html --out-dir ./out
# → open ./out/search.html in any browser
# Recipe site → cookbook EPUB
sitesavvy crawl https://recipes.example.com --recipe-mode --out-dir ./out
# → ./out/sitesavvy-cookbook.epub
# Authenticated crawl
sitesavvy crawl https://private.example.com \
--login-url https://private.example.com/login \
--login-user alice --login-pass secret --out-dir ./out
# Tor + stealth
sitesavvy crawl https://example.onion \
--proxy socks5://127.0.0.1:9050 --stealth --out-dir ./outStats
- 38 source modules (7 new in v0.6.0)
- 534 tests passing (252 new), 90% coverage
ruff check .clean,mypy sitesavvyclean- Tested on Python 3.12
Release assets
| Asset | OS | Notes |
|---|---|---|
sitesavvy-0.6.0-linux-x86_64.tar.gz |
Linux x86_64 | Single-file PyInstaller binary |
sitesavvy-0.6.0-macos-x86_64.tar.gz |
macOS x86_64 | Single-file PyInstaller binary |
sitesavvy-0.6.0-windows-x86_64.exe |
Windows x86_64 | Single-file PyInstaller binary |
sitesavvy-0.6.0-py3-none-any.whl |
Universal | pip install wheel |
sitesavvy-0.6.0.tar.gz |
Universal | Source distribution |
Legal
SiteSavvy is provided for personal, non-commercial use only. Respect the copyright, terms of service, and robots.txt of every site you crawl. Licensed under the MIT License.