Skip to content

AIwork4me/PaddleOCR-VL-ROCm

Repository files navigation

PaddleOCR-VL-ROCm

Lightweight PaddleOCR-VL inference for Windows and Linux systems that use an ONNXRuntime layout model plus a ROCm-backed OpenAI-compatible VLM server.

This repository keeps the inference path small:

  • No PaddlePaddle runtime is required for inference.
  • PP-DocLayoutV3 runs through ONNXRuntime.
  • Visual language recognition is served by your ROCm vLLM or llama.cpp endpoint.
  • Outputs are saved as PaddleOCR-VL-style JSON and Markdown files.

Validation Result

The lightweight ONNXRuntime path has been validated against the Paddle native pipeline on 1355 images.

Item Result
Full-run success 1355 / 1355
Payload alignment 1355 / 1355
Layout, crop, request order, request payload Strictly aligned

Install

git clone <your-repo-url> PaddleOCR-VL-ROCm
cd PaddleOCR-VL-ROCm
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[dev]

Linux/macOS:

git clone <your-repo-url> PaddleOCR-VL-ROCm
cd PaddleOCR-VL-ROCm
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Prepare Models

Place the PP-DocLayoutV3 ONNX files here:

models/PP-DocLayoutV3-onnx/
  inference.onnx
  inference.yml

Download the verified ONNX model directly from Hugging Face:

pip install -e .[download]
python scripts/download_ppdoclayoutv3_onnx.py

Model link:

https://huggingface.co/AlexTransformer/PP-DocLayoutV3-onnx

If you already have the verified ONNX directory locally, copy it with:

python scripts/download_ppdoclayoutv3_onnx.py `
  --source-dir C:\path\to\PP-DocLayoutV3-onnx `
  --target-dir models/PP-DocLayoutV3-onnx

Start or provide an OpenAI-compatible VLM server. For vLLM, the server should expose:

http://127.0.0.1:8000/v1/models
http://127.0.0.1:8000/v1/chat/completions

Check the endpoint:

paddleocr-vl-rocm-check-server --server-url http://127.0.0.1:8000/v1

CLI Usage

paddleocr-vl-rocm `
  --input examples/input/handwrite_ch_demo.png `
  --output outputs/smoke `
  --layout-model models/PP-DocLayoutV3-onnx `
  --server-url http://127.0.0.1:8000/v1 `
  --api-model-name PaddleOCR-VL-1.5-0.9B `
  --vlm-backend vllm-server

Expected outputs:

outputs/smoke/handwrite_ch_demo_res.json
outputs/smoke/handwrite_ch_demo.md

Python API

from paddleocr_vl_rocm import PaddleOCRVLROCm

pipeline = PaddleOCRVLROCm(
    layout_model_dir="models/PP-DocLayoutV3-onnx",
    vlm_server_url="http://127.0.0.1:8000/v1",
    api_model_name="PaddleOCR-VL-1.5-0.9B",
)

result = pipeline.predict("examples/input/handwrite_ch_demo.png")
result.print()
result.save_to_json("outputs")
result.save_to_markdown("outputs", pretty=False)

Example Images

The smoke images are copied from ppocrv6_onnx/test_images:

  • handwrite_ch_demo.png
  • handwrite_en_demo.png
  • ancient_demo.png
  • japan_demo.png
  • magazine.png
  • magazine_vetical.png
  • pinyin_demo.png

Output Format

JSON contains:

  • input_path
  • width, height
  • layout_det_res
  • parsing_res_list
  • model_settings

Markdown contains the recognized document content in reading order.

Tests

python -m compileall -q src/paddleocr_vl_rocm
python -m pytest -q
paddleocr-vl-rocm --help

Notes

ROCm acceleration is provided by the VLM server. This Python package handles the ONNXRuntime layout stage, document crop routing, VLM requests, and result serialization.

About

Lightweight PaddleOCR-VL inference with ONNXRuntime layout and ROCm-backed OpenAI-compatible VLM serving.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages