Skip to content

Multi-page extraction

Choose a tag to compare

@thevangelist thevangelist released this 30 Mar 20:34
· 138 commits to main since this release

Dembrandt now supports multi-page crawling with intelligent result merging. Use the following new flags to unify data from across a domain:

--pages N: Crawls up to N pages. It automatically prioritizes high-value links (like /pricing or /features) while filtering out noise like terms and privacy pages.

--sitemap: Discovers URLs via sitemap.xml instead of DOM scraping. It supports robots.txt directives, nested sitemap indexes, and domain variants.

Example:
dembrandt stripe.com --sitemap --pages 10