Release v1.1.1 — Chunked Crawling: Resume Huge Crawls Across Sessions · adityaarsharma/librecrawl-technical-seo-audit-mcp

New: librecrawl_resume_from_crawl_id(crawl_id) — the crawl a 50,000-page site politely, across multiple days, from any session feature.

The flow

Day 1:  Audit https://huge-site.com (max 5000 pages)
        → runs, returns crawl_id 42
        → pause anytime with librecrawl_pause_crawl()

Day 2:  list_crawls() → see crawl_id 42 with status "paused" at 1,847 pages
        librecrawl_resume_from_crawl_id(42)
        → picks up at page 1,848 — every previously-crawled URL is re-used
          from SQLite, ZERO duplicate HTTP requests to the target

Day 3:  Same thing until done → generate_report(42) for the full audit

State persists across PM2 restarts, server reboots, and entirely different MCP client sessions. The crawl is keyed by crawl_id, not by the MCP session.

How it stays polite

Token-bucket rate limiter — smooth distribution, no bursting. Default 2 req/sec.
Respects Crawl-Delay directive in robots.txt
Single connection, sequential by default (configurable up to 5 workers)
Identifiable User-Agent (LibreCrawl/1.0 (Web Crawler))
Idempotent resume — URLs already in the DB are never re-fetched

Implementation note

Tries two resume paths transparently:

In-session resume (/api/resume_crawl) — when the upstream crawler instance is still alive with is_paused=True. Just flips the flag.
Cross-session DB resume (/api/crawls/<id>/resume) — when the crawler has been GCed. Rebuilds from the SQLite snapshot.

The tool returns {\"mode\": \"in_session_resume\" | \"db_resume\"} so you can see which path fired.

Also in v1.1.1

Better auto-recovery — _ensure_crawler_ready() now uses crawl speed (instead of status string alone) to detect paused/zombie crawlers. Upstream LibreCrawl reports paused crawls as status=\"running\", which the old detection missed.
Tool count: 20 (was 19) — librecrawl_resume_from_crawl_id added.

Upgrade

```bash
cd ~/librecrawl-mcp && git pull && pm2 restart librecrawl-mcp
```

Or rerun the installer — it'll update in place.

Verified live on nexterwp.com: paused at 19 pages from session A, resumed from session B (fresh MCP session), continued cleanly to 35 pages with no duplicate requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.1 — Chunked Crawling: Resume Huge Crawls Across Sessions

Choose a tag to compare

Sorry, something went wrong.