The reliability release. librecrawl-mcp now does a complete, word-by-word audit of every page on any site — however large or heavy — without dropping pages, overloading the origin, or crashing. Proven on real 700–1,900 page production sites.
What's in this release
✅ Word-by-word audit of every page
Content analysis (readability, AI-tells, boilerplate, punctuation) + extended SEO checks (hreflang, schema.org, security headers, image performance) run across the whole crawl, not a 50-page sample.
✅ All pages — full coverage by default
sitemap_fill_cap defaults to 0 = entire sitemap. No caps to remember, no flags to pass. A plain librecrawl_start_chunked_audit(url=...) crawls the whole site. Internal-linked pages and sitemap-only orphans.
✅ Broken-link detection across every domain
Every outbound link — yours, third-party, social, CDN — is HTTP-validated and classified into 17 status classes. Dedup-then-validate (same as Screaming Frog): thousands of link appearances → unique URLs → each checked once.
✅ No laziness — nothing silently dropped
Per-page core checks + external-link validation cover 100% of crawled pages. Deep content/extended checks cap at 500 pages for memory safety (tunable up), and the report says exactly what was covered.
✅ Heavy / large sites crawlable
4–5 MB pages (Elementor / heavy-WP) now fetch successfully — 25s per-page timeout gives heavy pages TIME instead of timing out to status 0.
✅ Screaming-Frog-grade politeness — never overloads an origin
4 concurrent workers + 500ms jittered delay + 25s timeout. Heavy pages get more time, not more parallelism. Validated: full audits of 1,900-page sites with the origin staying healthy throughout.
✅ Re-scan anytime — zero history on the server (ephemeral)
After the client downloads the zip, the server wipes the session, every artifact file, and the upstream crawl record. 0 bytes, 0 rows per-audit footprint. Re-scan as often as you like; nothing persists.
✅ OOM-safe (v2.1.1 fix)
v2.1.0's "check every page" exhausted memory on huge heavy sites and looped. Deep-checks now cap at 500 pages by default — full audits of 1,900+ page sites complete cleanly.
The fix arc (v2.0.5 → v2.1.1)
- v2.0.5 — hreflang false positives eliminated
- v2.0.7 — finalize works under force_advance (8 files always; event-loop fix)
- v2.0.8 — heavy 4–5 MB pages actually fetch
- v2.0.9 — Screaming-Frog politeness; never overload the origin
- v2.1.0 — full audit by default (every page, every text, every link)
- v2.1.1 — OOM-safe on very large heavy sites
Verified on real sites
| Site | Pages | Result |
|---|---|---|
| theplusaddons.com | 1,942 | all 8 files, full coverage, origin safe |
| nexterwp.com | 709 | 100% 200-OK, every external domain validated |
Output
Single zip, 8 files: branded PDF + Markdown + per-page CSV + extended-checks CSV + content-audit CSV + external-links CSV + sitemap-recon CSV + SUMMARY.
Roadmap
- Concurrent multi-site audits (3+): the 3-backend infrastructure is provisioned; pool routing is the next release (currently one audit at a time).
- JavaScript rendering for SPA sites.
MIT · self-hosted · ephemeral · built on LibreCrawl.