Possible speed up #161

vinayak-mehta · 2020-07-09T22:48:56Z

While using camelot to extract tables from pdfs. I noticed it's really slow. I profiled the code and turns out that %60 of the bottleneck is from np.isclose here and here as well as multiple other places in core.py:

camelot/camelot/core.py

Line 103 in cd8ac79
 if np.isclose(te.x, x_coord, atol=0.5):

camelot/camelot/core.py

Line 67 in cd8ac79
 if np.isclose(self.y0, y0, atol=edge_tol):

The slowdown makes sense since there is a very big overhead with np.isclose if we are dealing with native python floats instead of numpy types.

I switched the method to math.isclose instead and the processing time was reduced to more than half!

I can submit an Pull Request with the changes if the devs agree this is a safe change to make.

Thanks

The text was updated successfully, but these errors were encountered:

FrancoisHuet · 2020-07-10T20:07:02Z

See my comments in the original thread. Once the hybrid and network parsers are merged in, this should no longer be an issue.

ivoytov · 2020-07-16T14:35:06Z

I made the same local changes in my setup

arnocandel added a commit to h2oai/camelot that referenced this issue Aug 28, 2023

Speedup as in camelot-dev/camelot#161

c769b09

bosd pushed a commit to bosd/pypdf_table_extraction that referenced this issue Aug 28, 2024

Speedup as in camelot-dev#161

bd365b2

bosd mentioned this issue Aug 28, 2024

Speedup as in https://github.com/camelot-dev/camelot/issues/161 py-pdf/pypdf_table_extraction#94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible speed up #161

Possible speed up #161

vinayak-mehta commented Jul 9, 2020

FrancoisHuet commented Jul 10, 2020

ivoytov commented Jul 16, 2020

Possible speed up #161

Possible speed up #161

Comments

vinayak-mehta commented Jul 9, 2020

FrancoisHuet commented Jul 10, 2020

ivoytov commented Jul 16, 2020