Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible speed up #161

Open
vinayak-mehta opened this issue Jul 9, 2020 · 2 comments
Open

Possible speed up #161

vinayak-mehta opened this issue Jul 9, 2020 · 2 comments

Comments

@vinayak-mehta
Copy link
Member

@majd1239 wrote in atlanhq/camelot#427:

While using camelot to extract tables from pdfs. I noticed it's really slow. I profiled the code and turns out that %60 of the bottleneck is from np.isclose here and here as well as multiple other places in core.py:

camelot/camelot/core.py

Line 103 in cd8ac79
 if np.isclose(te.x, x_coord, atol=0.5): 

camelot/camelot/core.py

Line 67 in cd8ac79
 if np.isclose(self.y0, y0, atol=edge_tol): 

The slowdown makes sense since there is a very big overhead with np.isclose if we are dealing with native python floats instead of numpy types.

I switched the method to math.isclose instead and the processing time was reduced to more than half!

I can submit an Pull Request with the changes if the devs agree this is a safe change to make.

Thanks

@FrancoisHuet
Copy link

See my comments in the original thread. Once the hybrid and network parsers are merged in, this should no longer be an issue.

@ivoytov
Copy link

ivoytov commented Jul 16, 2020

I made the same local changes in my setup

arnocandel added a commit to h2oai/camelot that referenced this issue Aug 28, 2023
bosd pushed a commit to bosd/pypdf_table_extraction that referenced this issue Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants