Skip to content

feat(python): add PP-OCRv6 paddle backend support#696

Merged
SWHL merged 2 commits into
RapidAI:mainfrom
jaminmei:feat/ppocrv6-paddle
Jun 26, 2026
Merged

feat(python): add PP-OCRv6 paddle backend support#696
SWHL merged 2 commits into
RapidAI:mainfrom
jaminmei:feat/ppocrv6-paddle

Conversation

@jaminmei

@jaminmei jaminmei commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Adds PP-OCRv6 paddle backend support alongside the existing onnxruntime/openvino. Two changes were needed beyond just registering model URLs:

  1. init_predictor + device_config: v6 paddle models use the PIR format (inference.json), same as v5. Added PPOCRV6 to the existing PPOCRV5 branches in both paddle/main.py and paddle/device_config.py so v6 gets the same predictor configuration (enable_new_ir, enable_new_executor, set_optimization_level).

  2. Model registry: Added paddle.PP-OCRv6 section with det + rec (tiny/small/medium), pointing to the official files on RapidAI/RapidOCR master. rec entries include dict_url pointing to the dict files uploaded by the maintainer.

Verified: v6 paddle det/rec tests pass; v4/v5 paddle regression tests all pass (no behavior change).

- init_predictor + device_config: add PPOCRV6 to PPOCRV5 branches (PIR format, enable_new_ir/new_executor)

- have_key + get_character_list: read character_dict from inference.yml instead of always returning False

- default_models.yaml: add paddle.PP-OCRv6 det+rec (tiny/small/medium), rec has no dict_url (dict from yml)

- test_engine.py: add EngineType.PADDLE to v6 det/rec parametrize

- Verified: v6 paddle tests pass; v4/v5 paddle regression tests pass
@jaminmei

Copy link
Copy Markdown
Contributor Author

Hi @SWHL, while working on the MNN backend for v6, I noticed that the v6 dict files haven't been uploaded to the RapidAI/RapidOCR ModelScope repo yet.

For v4 and v5, these dict files are hosted under paddle/PP-OCRvX/rec/<model_dir>/, and other backends that can't read embedded character info (MNN, torch) reference them via dict_url.

For v6, the dict files currently only exist in the PaddleOCR GitHub source repo: https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppocr/utils/dict

  • ppocrv6_dict.txt (18708 chars)
  • ppocrv6_tiny_dict.txt (6904 chars)

I think the missing v6 dict files have caused some complications. On one hand, PR #696 required more adjustments to the engine code (have_key / get_character_list) than would otherwise be necessary. On the other hand, the MNN backend also needs the dict_url for rec models.

If the dict files are uploaded to paddle/PP-OCRv6/rec/ following the same structure as v4/v5, the engine workaround in PR #696 can be reverted, and all backends can reference them via the standard dict_url path without special handling.

… v6 rec

Now that dict files are available on RapidAI/RapidOCR, revert the have_key/get_character_list workaround that read character_dict from inference.yml. Add dict_url to paddle v6 rec entries pointing to the official dict files.

- paddle/main.py: have_key returns False, get_character_list returns [], remove yml_path logic

- default_models.yaml: add dict_url to paddle v6 rec (tiny/small/medium)

- init_predictor and device_config v6 branches remain (required for PIR format)
@jaminmei

Copy link
Copy Markdown
Contributor Author

@SWHL Since the dict files are now available on the repo, I've submitted a new commit that reverts the have_key / get_character_list workaround and adds dict_url to the paddle v6 rec entries. No engine code changes beyond the init_predictor and device_config version branches needed for PIR format support.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Python Paddle inference backend to support PP-OCRv6 models (in addition to existing onnxruntime/openvino support) and wires that support into the default model registry and tests.

Changes:

  • Treat OCRVersion.PPOCRV6 the same as PPOCRV5 for Paddle predictor initialization (enable_memory_optim() path).
  • Apply the same Paddle CPU “new IR / new executor / optimization level” configuration for PPOCRV6 as for PPOCRV5.
  • Register Paddle PP-OCRv6 det/rec default model bundles and run PP-OCRv6 det/rec tests with the Paddle engine.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
python/tests/test_engine.py Adds Paddle to the PP-OCRv6 det/rec parametrized engine coverage.
python/rapidocr/inference_engine/paddle/main.py Extends Paddle predictor init logic to include PP-OCRv6.
python/rapidocr/inference_engine/paddle/device_config.py Applies PPOCRv5-style CPU IR/executor settings to PPOCRv6.
python/rapidocr/default_models.yaml Adds Paddle PP-OCRv6 det/rec model registry entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +655 to +660
ch_PP-OCRv6_rec_tiny:
model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/master/paddle/PP-OCRv6/rec/PP-OCRv6_rec_tiny
inference.pdiparams: bb2f8f54d1e25f28c71b6fa4fe23f5940e159cae27fbee96155c99f822156e57
inference.json: b5b14770c7dcf092781e92f4278a2ae5f95048f08b4b8a04140e88cb2745f147
inference.yml: 66170210bad538e83fff3c4a3867e547d6bf20b50d64b20347c4b913f3034ea1
dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/master/paddle/PP-OCRv6/rec/PP-OCRv6_rec_tiny/ppocrv6_tiny_dict.txt
@jaminmei

Copy link
Copy Markdown
Contributor Author

Thanks for catching this. The initial commit did implement the inference.yml approach (hence the PR description), but after the dict files were uploaded to the repo, a follow-up commit reverted that workaround and switched to the standard dict_url path. The current behavior is intentional: have_key() returns False, dict_url is present in the rec entries, and the recognizer downloads the dict via get_dict_key_url() as expected. I'll update the PR description to match.

@SWHL SWHL added this to the v3.9.1 milestone Jun 26, 2026
@SWHL SWHL merged commit 29b812f into RapidAI:main Jun 26, 2026
@jaminmei jaminmei deleted the feat/ppocrv6-paddle branch June 29, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants