dwocr batches PDF OCR requests against an OpenAI-compatible API, with defaults tuned for Doubleword-hosted Qwen OCR models. It renders each PDF page locally, submits page-level requests through autobatcher, and writes one markdown file per source PDF.
By default it uses:
- base URL:
https://api.doubleword.ai/v1 - model:
Qwen/Qwen3.5-397B-A17B-FP8 - prompt: the built-in Qwen 397 OCR benchmark prompt in
src/dwocr/prompts.py
pip install -e .This exposes two commands:
dwocrdwocr-web
Basic usage:
dwocr INPUT_PATHINPUT_PATH can be either:
- a single PDF file
- a directory, in which case
dwocrrecursively processes*.pdffiles below it
The CLI looks for an API key in this order:
--api-keyDOUBLEWORD_API_KEYOPENAI_API_KEY
If you do not pass --output-dir, output is written to a sibling dwocr_output/ directory next to the input root. Relative paths are preserved, so nested PDFs produce nested markdown files.
export DOUBLEWORD_API_KEY=...
dwocr ./papers \
--base-url https://api.doubleword.ai/v1 \
--model Qwen/Qwen3.5-397B-A17B-FP8 \
--output-dir ./ocr_output \
--render-images \
--batch-size 512 \
--batch-window-seconds 5 \
--poll-interval-seconds 5 \
--completion-window 24h \
--target-longest-image-dim 1024 \
--render-concurrency 8 \
--overwritedwocr INPUT_PATH [options]
--api-key TEXT
--base-url TEXT OpenAI-compatible API base URL
--model TEXT OCR model name
--output-dir TEXT Output directory for markdown files
--prompt-file TEXT Replace the built-in OCR prompt
--temperature FLOAT Default: 0.0
--max-tokens INT Default: 4096
--batch-size INT Default: 512
--batch-window-seconds FLOAT Default: 5.0
--poll-interval-seconds FLOAT Default: 5.0
--completion-window {1h,24h} Default: 24h
--target-longest-image-dim INT Default: 1024
--render-concurrency INT Default: min(16, max(4, cpu_count))
--render-images Save cropped image regions and rewrite markdown image tags
--overwrite Allow writing into a non-empty output directory
Each source PDF becomes one markdown file containing page outputs in order:
<!-- source: some/file.pdf -->
<!-- model: Qwen/Qwen3.5-397B-A17B-FP8 -->
<!-- generated_at: 2026-03-19T12:34:56+00:00 -->
<!-- page 1 -->
...page markdown...
<!-- page 2 -->
...page markdown...When --render-images is enabled and the model emits markers such as:
image[[120,300,520,700]] Figure caption
dwocr will:
- crop that region from the rendered source page
- write the crop into a sibling asset directory such as
document_images/ - rewrite the OCR output to a normal markdown image link
If any pages fail, dwocr still finishes the remaining work, exits non-zero, and prints the failed page list to stderr.
Run:
dwocr-webThen open http://127.0.0.1:8123.
The web UI lets you:
- submit OCR jobs for a PDF or directory
- set model, base URL, batching, and rendering options
- monitor active jobs and logs
- inspect recent job details and the exact generated CLI command