PDF to Markdown, powered by LLM vision.
Drop a PDF, get clean Markdown — tables, formulas, code, figures, all handled.
Demo · Installation · Quick Start · Configuration · CLI Reference
English | 简体中文
pdfmark-ai doesn't parse PDFs the traditional way. Instead, it renders each page as an image and lets multimodal LLMs (Claude, Kimi, Qwen, etc.) "read" it — just like a human would. The result? Clean, structured Markdown that handles what other tools simply can't: complex tables with merged cells, inline math formulas, source code blocks, embedded diagrams, and even blurry scans.
Real conversion results on academic papers and technical documents — no post-editing.
![]() |
![]() |
| PDF original — mixed figures, tables & code | Converted Markdown — images extracted, tables formatted |
![]() |
![]() |
| PDF original — tables & code blocks | Converted Markdown — syntax-highlighted code |
![]() |
![]() |
| PDF original — charts & formulas | Converted Markdown — chart images referenced |
![]() |
![]() |
| PDF original — dense math formulas | Converted Markdown — LaTeX `$...$` and `$$...$$` wrapping |
![]() |
![]() |
| PDF original — blurred / low-quality scan | Converted Markdown — content correctly recognized |
- 🖼️ Vision-based extraction — treats each page as an image, handles complex layouts that traditional parsers miss
- 🧮 Math formulas — LaTeX rendering with automatic
$...$and$$...$$wrapping - 📊 Complex tables — merged cells, multi-row headers, nested structures
- 💻 Code blocks — syntax-appropriate formatting for source code
- ✂️ Image extraction —
--crop-imagesto crop figures and diagrams as separate files - 🔍 Blur tolerance — handles low-quality and blurred scans with high recognition accuracy
- 🤖 Multi-provider — Claude, Kimi, Xiaomi, Qwen, and any OpenAI-compatible API
- ⚡ Incremental caching — SHA-256 progressive cache avoids re-processing unchanged pages
pip install pdfmark-ai- Python >= 3.10
- An LLM API key (Anthropic, Kimi, Qwen, or OpenAI-compatible)
# Step 1: Generate config templates in your current directory
pdfmark --init
# Step 2: Edit .env — uncomment ONE provider and fill in your API key
# e.g. LLM_API_KEY=your-xiaomi-api-key
# Step 3: Run
pdfmark input.pdf -o output.mdDefault provider is Xiaomi MiMo (mimo-v2.5). To use a different provider, either edit .env to set LLM_MODEL / LLM_BASE_URL, or change active_provider in pdfmark.toml.
💡 Tip: Configuration files (
.envandpdfmark.toml) are always read from your current working directory — not from the package installation directory. Place them alongside your PDF files or in your project root.
pdfmark-ai uses a 4-layer priority chain: CLI args > env vars > TOML config > defaults.
Config files live in your working directory (where you run pdfmark):
| File | Purpose | Contains |
|---|---|---|
.env |
API keys & overrides | LLM_API_KEY, LLM_MODEL, LLM_BASE_URL |
pdfmark.toml |
Provider presets & settings | providers, DPI, chunking, caching |
You can generate both files with pdfmark --init, or create them manually.
# Uncomment ONE provider and add your key:
LLM_API_KEY=your-xiaomi-api-key
# LLM_AUTH_TYPE=auth_token
# LLM_API_KEY=your-kimi-api-key
# LLM_API_KEY=your-anthropic-api-key
# LLM_API_KEY=your-qwen-api-key
# Optional: override model or base URL
# LLM_MODEL=mimo-v2.5
# LLM_BASE_URL=https://token-plan-cn.xiaomimimo.com/anthropicactive_provider = "xiaomi"
[providers.xiaomi]
base_url = "https://token-plan-cn.xiaomimimo.com/anthropic"
model = "mimo-v2.5"
[render]
dpi = 150
[cache]
enabled = true
dir = "~/.cache/pdfmark"| Provider | active_provider |
Notes |
|---|---|---|
| Anthropic Claude | anthropic |
Supports Opus 4.6, Sonnet 4.6 and other Claude models. Uses Anthropic Messages API natively. |
| Kimi (Moonshot) | kimi |
Anthropic-compatible API |
| Xiaomi (MiMo) | xiaomi |
Auth token required. Default provider (mimo-v2.5). |
| Qwen (Alibaba) | qwen |
OpenAI-compatible SDK |
| Any OpenAI-compatible | set LLM_BASE_URL |
Set LLM_SDK_TYPE=openai |
Usage: pdfmark [OPTIONS] [INPUT]
Arguments:
INPUT Path to the PDF file to convert
Options:
--init Generate .env and pdfmark.toml config templates
-f, --force Overwrite existing config files (use with --init)
-o, --output Output markdown file path
--lang Document language (e.g. 'en', 'zh', 'auto')
--crop-images Extract visual regions from pages as images
--refine (deprecated, ignored) Has no effect.
--no-cache Disable caching of rendered pages and chunks
--no-frontmatter Omit YAML frontmatter from output
--detect-only Detect document structure and print sections
--config Path to a TOML configuration file
--dpi Rendering DPI for PDF pages
--model LLM model identifier
--api-key LLM API key (or set LLM_API_KEY env var)
--base-url LLM API base URL
--max-concurrent Maximum concurrent LLM requests
Use --crop-images to extract figures and diagrams from the PDF as separate image files:
pdfmark input.pdf -o output.md --crop-imagesCropped images are saved alongside the output file (e.g., images/page_003_fig_001.png). Crop mode and plain mode use separate caches, so you can switch freely without needing --no-cache.
MIT









