Skip to content

docs: expand PaddleOCR advanced usage guide#11018

Merged
julian-risch merged 1 commit intodeepset-ai:mainfrom
jimmyzhuu:codex/docs-paddleocr-advanced-guide-pr
Apr 2, 2026
Merged

docs: expand PaddleOCR advanced usage guide#11018
julian-risch merged 1 commit intodeepset-ai:mainfrom
jimmyzhuu:codex/docs-paddleocr-advanced-guide-pr

Conversation

@jimmyzhuu
Copy link
Copy Markdown
Contributor

Related Issues

Closes #11017

Summary

This PR expands the PaddleOCRVLDocumentConverter documentation so it is more useful for people working with real OCR ingestion pipelines.

It adds:

  • clearer guidance on when PaddleOCR is a good fit
  • grouped guidance for the most useful advanced parameters
  • scenario-based recommendations for common document types
  • a more realistic advanced example for structure-heavy PDFs
  • a short note on how raw_paddleocr_responses can help during tuning

What changed

  • Added a short explanation of why raw_paddleocr_responses is useful during tuning.
  • Added a When to use it section.
  • Added a Useful configuration areas section that groups high-value parameters by purpose.
  • Added a Typical scenarios section with practical configuration suggestions.
  • Added an advanced example for layout-heavy PDFs using options like:
    • use_doc_orientation_classify
    • use_doc_unwarping
    • use_layout_detection
    • use_ocr_for_image_block
    • merge_tables
    • restructure_pages
    • prettify_markdown
  • Synced the same docs update to version-2.27-unstable.
  • Added a release note.

Checklist

  • I have read the contributors guidelines.
  • I have updated the related issue with new insights and changes.
  • I have added tests or explained why they are not needed.
  • I have added release notes.
  • I have updated the documentation accordingly.

Notes for reviewers

This is a docs-only change.
No runtime behavior was changed.

Validation run locally:

  • pre-commit on the changed docs files and release note
  • npm run build in docs-website

@jimmyzhuu jimmyzhuu requested a review from a team as a code owner April 1, 2026 13:58
@jimmyzhuu jimmyzhuu requested review from julian-risch and removed request for a team April 1, 2026 13:58
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 1, 2026

@jimmyzhuu is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@anakin87
Copy link
Copy Markdown
Member

anakin87 commented Apr 1, 2026

@jimmyzhuu just an info: 2.27 has been just released so you should do these changes in docs and versioned_docs/version-2.27

@jimmyzhuu
Copy link
Copy Markdown
Contributor Author

Thanks for the heads-up. I rebased the branch onto the latest main and updated the versioned docs target from versioned_docs/version-2.27-unstable to versioned_docs/version-2.27.

@jimmyzhuu jimmyzhuu force-pushed the codex/docs-paddleocr-advanced-guide-pr branch from c680d5b to 3dd471d Compare April 1, 2026 14:14
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
haystack-docs Ready Ready Preview, Comment Apr 2, 2026 1:52pm

Request Review

Copy link
Copy Markdown
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you for opening this pull request @jimmyzhuu . The Vercel Preview looks good to too.

@julian-risch julian-risch merged commit 865e7a5 into deepset-ai:main Apr 2, 2026
17 of 18 checks passed
@jimmyzhuu jimmyzhuu deleted the codex/docs-paddleocr-advanced-guide-pr branch April 3, 2026 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand PaddleOCR docs with advanced parameters and real-world scenarios

3 participants