PaddleOCR-VL-ROCm

Lightweight PaddleOCR-VL inference for Windows and Linux systems that use an ONNXRuntime layout model plus a ROCm-backed OpenAI-compatible VLM server.

This repository keeps the inference path small:

No PaddlePaddle runtime is required for inference.
PP-DocLayoutV3 runs through ONNXRuntime.
Visual language recognition is served by your ROCm vLLM or llama.cpp endpoint.
Outputs are saved as PaddleOCR-VL-style JSON and Markdown files.

Validation Result

The lightweight ONNXRuntime path has been validated against the Paddle native pipeline on 1355 images.

Item	Result
Full-run success	1355 / 1355
Payload alignment	1355 / 1355
Layout, crop, request order, request payload	Strictly aligned

Install

git clone <your-repo-url> PaddleOCR-VL-ROCm
cd PaddleOCR-VL-ROCm
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[dev]

Linux/macOS:

git clone <your-repo-url> PaddleOCR-VL-ROCm
cd PaddleOCR-VL-ROCm
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Prepare Models

Place the PP-DocLayoutV3 ONNX files here:

models/PP-DocLayoutV3-onnx/
  inference.onnx
  inference.yml

Download the verified ONNX model directly from Hugging Face:

pip install -e .[download]
python scripts/download_ppdoclayoutv3_onnx.py

Model link:

https://huggingface.co/AlexTransformer/PP-DocLayoutV3-onnx

If you already have the verified ONNX directory locally, copy it with:

python scripts/download_ppdoclayoutv3_onnx.py `
  --source-dir C:\path\to\PP-DocLayoutV3-onnx `
  --target-dir models/PP-DocLayoutV3-onnx

Start or provide an OpenAI-compatible VLM server. For vLLM, the server should expose:

http://127.0.0.1:8000/v1/models
http://127.0.0.1:8000/v1/chat/completions

Check the endpoint:

paddleocr-vl-rocm-check-server --server-url http://127.0.0.1:8000/v1

CLI Usage

paddleocr-vl-rocm `
  --input examples/input/handwrite_ch_demo.png `
  --output outputs/smoke `
  --layout-model models/PP-DocLayoutV3-onnx `
  --server-url http://127.0.0.1:8000/v1 `
  --api-model-name PaddleOCR-VL-1.5-0.9B `
  --vlm-backend vllm-server

Expected outputs:

outputs/smoke/handwrite_ch_demo_res.json
outputs/smoke/handwrite_ch_demo.md

Python API

from paddleocr_vl_rocm import PaddleOCRVLROCm

pipeline = PaddleOCRVLROCm(
    layout_model_dir="models/PP-DocLayoutV3-onnx",
    vlm_server_url="http://127.0.0.1:8000/v1",
    api_model_name="PaddleOCR-VL-1.5-0.9B",
)

result = pipeline.predict("examples/input/handwrite_ch_demo.png")
result.print()
result.save_to_json("outputs")
result.save_to_markdown("outputs", pretty=False)

Example Images

The smoke images are copied from ppocrv6_onnx/test_images:

handwrite_ch_demo.png
handwrite_en_demo.png
ancient_demo.png
japan_demo.png
magazine.png
magazine_vetical.png
pinyin_demo.png

Output Format

JSON contains:

input_path
width, height
layout_det_res
parsing_res_list
model_settings

Markdown contains the recognized document content in reading order.

Tests

python -m compileall -q src/paddleocr_vl_rocm
python -m pytest -q
paddleocr-vl-rocm --help

Notes

ROCm acceleration is provided by the VLM server. This Python package handles the ONNXRuntime layout stage, document crop routing, VLM requests, and result serialization.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
scripts		scripts
src/paddleocr_vl_rocm		src/paddleocr_vl_rocm
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaddleOCR-VL-ROCm

Validation Result

Install

Prepare Models

CLI Usage

Python API

Example Images

Output Format

Tests

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PaddleOCR-VL-ROCm

Validation Result

Install

Prepare Models

CLI Usage

Python API

Example Images

Output Format

Tests

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages