-
-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible speed up #161
Comments
See my comments in the original thread. Once the |
I made the same local changes in my setup |
arnocandel
added a commit
to h2oai/camelot
that referenced
this issue
Aug 28, 2023
bosd
pushed a commit
to bosd/pypdf_table_extraction
that referenced
this issue
Aug 28, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@majd1239 wrote in atlanhq/camelot#427:
While using camelot to extract tables from pdfs. I noticed it's really slow. I profiled the code and turns out that %60 of the bottleneck is from np.isclose here and here as well as multiple other places in core.py:
camelot/camelot/core.py
camelot/camelot/core.py
The slowdown makes sense since there is a very big overhead with np.isclose if we are dealing with native python floats instead of numpy types.
I switched the method to math.isclose instead and the processing time was reduced to more than half!
I can submit an Pull Request with the changes if the devs agree this is a safe change to make.
Thanks
The text was updated successfully, but these errors were encountered: