Release v2.1.1 — Full word-by-word audit of every page, heavy-site-safe · adityaarsharma/librecrawl-technical-seo-audit-mcp

The reliability release. librecrawl-mcp now does a complete, word-by-word audit of every page on any site — however large or heavy — without dropping pages, overloading the origin, or crashing. Proven on real 700–1,900 page production sites.

What's in this release

✅ Word-by-word audit of every page

Content analysis (readability, AI-tells, boilerplate, punctuation) + extended SEO checks (hreflang, schema.org, security headers, image performance) run across the whole crawl, not a 50-page sample.

✅ All pages — full coverage by default

sitemap_fill_cap defaults to 0 = entire sitemap. No caps to remember, no flags to pass. A plain librecrawl_start_chunked_audit(url=...) crawls the whole site. Internal-linked pages and sitemap-only orphans.

✅ Broken-link detection across every domain

Every outbound link — yours, third-party, social, CDN — is HTTP-validated and classified into 17 status classes. Dedup-then-validate (same as Screaming Frog): thousands of link appearances → unique URLs → each checked once.

✅ No laziness — nothing silently dropped

Per-page core checks + external-link validation cover 100% of crawled pages. Deep content/extended checks cap at 500 pages for memory safety (tunable up), and the report says exactly what was covered.

✅ Heavy / large sites crawlable

4–5 MB pages (Elementor / heavy-WP) now fetch successfully — 25s per-page timeout gives heavy pages TIME instead of timing out to status 0.

✅ Screaming-Frog-grade politeness — never overloads an origin

4 concurrent workers + 500ms jittered delay + 25s timeout. Heavy pages get more time, not more parallelism. Validated: full audits of 1,900-page sites with the origin staying healthy throughout.

✅ Re-scan anytime — zero history on the server (ephemeral)

After the client downloads the zip, the server wipes the session, every artifact file, and the upstream crawl record. 0 bytes, 0 rows per-audit footprint. Re-scan as often as you like; nothing persists.

✅ OOM-safe (v2.1.1 fix)

v2.1.0's "check every page" exhausted memory on huge heavy sites and looped. Deep-checks now cap at 500 pages by default — full audits of 1,900+ page sites complete cleanly.

The fix arc (v2.0.5 → v2.1.1)

v2.0.5 — hreflang false positives eliminated
v2.0.7 — finalize works under force_advance (8 files always; event-loop fix)
v2.0.8 — heavy 4–5 MB pages actually fetch
v2.0.9 — Screaming-Frog politeness; never overload the origin
v2.1.0 — full audit by default (every page, every text, every link)
v2.1.1 — OOM-safe on very large heavy sites

Verified on real sites

Site	Pages	Result
theplusaddons.com	1,942	all 8 files, full coverage, origin safe
nexterwp.com	709	100% 200-OK, every external domain validated

Output

Single zip, 8 files: branded PDF + Markdown + per-page CSV + extended-checks CSV + content-audit CSV + external-links CSV + sitemap-recon CSV + SUMMARY.

Roadmap

Concurrent multi-site audits (3+): the 3-backend infrastructure is provisioned; pool routing is the next release (currently one audit at a time).
JavaScript rendering for SPA sites.

MIT · self-hosted · ephemeral · built on LibreCrawl.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.1.1 — Full word-by-word audit of every page, heavy-site-safe

Choose a tag to compare

Sorry, something went wrong.