Skip to content

On-device OCR models v1 (PP-OCRv5 mobile)

Choose a tag to compare

@ben-milanko ben-milanko released this 15 Jun 05:27
· 63 commits to main since this release
41f2acb

On-device OCR model bundle for pdf_ocr_ondevice.

PdfOcrModels.ppOcrV5Mobile downloads these files on first use, then runs OCR entirely on device (ONNX Runtime) — no per-page network call.

Assets

File Size SHA-256
PP-OCRv5_mobile_det.onnx 4.82 MB d5de5df3…4f4f7d
PP-OCRv5_mobile_rec.onnx 16.56 MB 0030c6b0…d40a8d
ppocrv5_dict.txt 74 KB d1979e9f…42af1b

Provenance & license

Derived works of PaddleOCR PP-OCRv5 mobile (Copyright PaddlePaddle Authors), redistributed under the Apache License 2.0 — see LICENSE.txt and NOTICE.txt in the assets.

The two .onnx files were converted from the official PaddlePaddle inference models (inference.json + inference.pdiparams) with paddle2onnx (opset 14); no weights were retrained or altered. ppocrv5_dict.txt is the recognizer's character dictionary (18383 entries) extracted verbatim from the official PP-OCRv5_mobile_rec config.

Sources: https://huggingface.co/PaddlePaddle/PP-OCRv5_mobile_det · https://huggingface.co/PaddlePaddle/PP-OCRv5_mobile_rec

Verified

End-to-end recognition confirmed against the shipped Dart pipeline (rec vocab 18385 aligns with the dictionary + CTC blank; a black-on-white sample decodes exactly).