Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy import PyMuPDF #11685

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

jzhang533
Copy link
Collaborator

To address the compatibility issues caused by PyMuPDF dependency.

This pull request proposes a lazy import for PyMuPDF. This means:

  • No PyMuPDF dependency: PyMuPDF will be removed from the requirements.txt file.
  • Error handling: An informative error message will guide the user to install PyMuPDF if it's not already present in their environment. This error will only occur when PyMuPDF functionality is actually required.

Additionally, if this PR is accepted, the following actions will be necessary:

  • Release a new version of PaddleOCR.
  • Update relevant documentation within this repository.

I've used addle.util.try_import in the implementation, we can also discuss on this implementation.

PR 类型 PR types

Others

PR 变化内容类型 PR changes

Others

描述 Description

see above

提PR之前的检查 Check-list

  • 这个 PR 是提交到dygraph分支或者是一个cherry-pick,否则请先提交到dygarph分支。
    This PR is pushed to the dygraph branch or cherry-picked from the dygraph branch. Otherwise, please push your changes to the dygraph branch.
  • 这个PR清楚描述了功能,帮助评审能提升效率。This PR have fully described what it does such that reviewers can speedup.
  • 这个PR已经经过本地测试。This PR can be covered by existing tests or locally verified.

Copy link

paddle-bot bot commented Mar 6, 2024

Thanks for your contribution!

Copy link
Collaborator

@tink2123 tink2123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jzhang533
Copy link
Collaborator Author

will merge with an approval from @dyning

@jzhang533 jzhang533 merged commit 69832ab into PaddlePaddle:release/2.7 Mar 7, 2024
1 of 2 checks passed
@jzhang533 jzhang533 mentioned this pull request Mar 7, 2024
3 tasks
@dhruv-anand-aintech
Copy link

Thanks a lot for fixing this!

Do you know which version this will be available in?
I'm trying to install the latest version from pypi (2.7.0.3) and it still has it as a requirement, and fails due to the wheel error.

@jzhang533
Copy link
Collaborator Author

Thanks a lot for fixing this!

Do you know which version this will be available in?

I'm trying to install the latest version from pypi (2.7.0.3) and it still has it as a requirement, and fails due to the wheel error.

try this : https://pypi.org/project/paddleocr/2.7.2/

@dcfabian
Copy link

Hello,

I've noticed that PyMuPDF is still a dependency for PaddleOCR, albeit indirectly, and I wanted to bring this to attention in the context of this pull request. The dependency chain involves pdf2docx, which is listed as a requirement in ppstructure/recovery/requirement.txt.

Specifically, pdf2docx has PyMuPDF>=1.19.0 as a direct dependency. Since pdf2docx is required for PaddleOCR's ppstructure recovery functionality, it inherently makes PyMuPDF an indirect yet crucial dependency for the project.

jzhang533 added a commit to jzhang533/PaddleOCR that referenced this pull request Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants