Lightweight PaddleOCR-VL inference for Windows and Linux systems that use an ONNXRuntime layout model plus a ROCm-backed OpenAI-compatible VLM server.
This repository keeps the inference path small:
- No PaddlePaddle runtime is required for inference.
- PP-DocLayoutV3 runs through ONNXRuntime.
- Visual language recognition is served by your ROCm vLLM or llama.cpp endpoint.
- Outputs are saved as PaddleOCR-VL-style JSON and Markdown files.
The lightweight ONNXRuntime path has been validated against the Paddle native pipeline on 1355 images.
| Item | Result |
|---|---|
| Full-run success | 1355 / 1355 |
| Payload alignment | 1355 / 1355 |
| Layout, crop, request order, request payload | Strictly aligned |
git clone <your-repo-url> PaddleOCR-VL-ROCm
cd PaddleOCR-VL-ROCm
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[dev]Linux/macOS:
git clone <your-repo-url> PaddleOCR-VL-ROCm
cd PaddleOCR-VL-ROCm
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"Place the PP-DocLayoutV3 ONNX files here:
models/PP-DocLayoutV3-onnx/
inference.onnx
inference.yml
Download the verified ONNX model directly from Hugging Face:
pip install -e .[download]
python scripts/download_ppdoclayoutv3_onnx.pyModel link:
https://huggingface.co/AlexTransformer/PP-DocLayoutV3-onnx
If you already have the verified ONNX directory locally, copy it with:
python scripts/download_ppdoclayoutv3_onnx.py `
--source-dir C:\path\to\PP-DocLayoutV3-onnx `
--target-dir models/PP-DocLayoutV3-onnxStart or provide an OpenAI-compatible VLM server. For vLLM, the server should expose:
http://127.0.0.1:8000/v1/models
http://127.0.0.1:8000/v1/chat/completions
Check the endpoint:
paddleocr-vl-rocm-check-server --server-url http://127.0.0.1:8000/v1paddleocr-vl-rocm `
--input examples/input/handwrite_ch_demo.png `
--output outputs/smoke `
--layout-model models/PP-DocLayoutV3-onnx `
--server-url http://127.0.0.1:8000/v1 `
--api-model-name PaddleOCR-VL-1.5-0.9B `
--vlm-backend vllm-serverExpected outputs:
outputs/smoke/handwrite_ch_demo_res.json
outputs/smoke/handwrite_ch_demo.md
from paddleocr_vl_rocm import PaddleOCRVLROCm
pipeline = PaddleOCRVLROCm(
layout_model_dir="models/PP-DocLayoutV3-onnx",
vlm_server_url="http://127.0.0.1:8000/v1",
api_model_name="PaddleOCR-VL-1.5-0.9B",
)
result = pipeline.predict("examples/input/handwrite_ch_demo.png")
result.print()
result.save_to_json("outputs")
result.save_to_markdown("outputs", pretty=False)The smoke images are copied from ppocrv6_onnx/test_images:
handwrite_ch_demo.pnghandwrite_en_demo.pngancient_demo.pngjapan_demo.pngmagazine.pngmagazine_vetical.pngpinyin_demo.png
JSON contains:
input_pathwidth,heightlayout_det_resparsing_res_listmodel_settings
Markdown contains the recognized document content in reading order.
python -m compileall -q src/paddleocr_vl_rocm
python -m pytest -q
paddleocr-vl-rocm --helpROCm acceleration is provided by the VLM server. This Python package handles the ONNXRuntime layout stage, document crop routing, VLM requests, and result serialization.