PullMD v3 grows from a web-page reader into a general anything-to-Markdown service for agents, with a leaner default output. Everything beyond plain web extraction is opt-in and degrades gracefully — left unconfigured, v3 handles web pages exactly like v2, just with a cleaner body by default.
Breaking
- Clean markdown body by default. The inline source-attribution line is no longer emitted in the body — for web pages and Reddit posts (
**r/sub** · u/user · N ↑ · …is gone;subreddit,author,published,upvotesare frontmatter fields now). SetPULLMD_SOURCE_HEADER=trueto restore the legacy inline header. See MIGRATION.md.
Added
- Document conversion via the new markitdown sidecar: PDF, DOCX, PPTX, XLSX, EPUB, ZIP, CSV, JSON, XML — by URL (
GET /api?url=…) or upload (POST /api/file, 25 MB), with PWA drag-and-drop & file picker. Conversions run sandboxed (subprocess timeout + memory cap). - Opt-in media tier (
PULLMD_VISION_*/PULLMD_STT_*): image captioning and audio transcription via any OpenAI-compatible endpoint (cloud or local) — runs inside pullmd, no extra container. - Opt-in PDF OCR tier (
PULLMD_PDF_OCR_*+?pdf=ocr): table-grade PDF conversion via a vendor-neutral OCR provider (reference: Mistral OCR), with automatic fallback to the free path. Includes a PWA toggle. - Keyless YouTube transcripts (
MARKITDOWN_YOUTUBE=true): title + description + transcript with clickable timecodes; per-requestyt_timecodes/yt_chunk. PULLMD_FRONTMATTER_FIELDSallowlist to trim frontmatter, plus source-specific fields (LLM usage, image size, audio seconds, YouTube duration/views, Reddit meta,pdf_pages).
Changed
- Claude Code skill bundle renamed
web-reader→pullmd(GET /pullmd.zip; the old URL redirects). Existing installs:rm -rf ~/.claude/skills/web-readerbefore installing the new zip. - MCP
read_urldescription and skill instructions cover the v3 capabilities; the MCP server reports the real package version.
Fixed
- Relative image/link URLs are resolved against the source page (no more broken images on share pages).
- Media frontmatter survives the cache; provider errors degrade to plain extraction.
- PWA: permalink bar moved above the result header.
Full details in the CHANGELOG.
Images: aeternalabshq/{pullmd,pullmd-trafilatura,pullmd-playwright,pullmd-markitdown}:3.0.0 on Docker Hub and GHCR (multi-arch amd64+arm64). :latest now tracks v3 — pin :2 to stay on the v2 output format.