Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with pages with no tables - total number of pages variable, no good page indexing #491

Open
Bernardo-Hazan opened this issue Nov 1, 2022 · 2 comments

Comments

@Bernardo-Hazan
Copy link

Hello,

There is a daily pdf report whose last page has no tables, but whose total number of pages vary each day.

I cannot extract 'all' when there is a page with no table:
_

File ~\anaconda3\lib\site-packages\camelot\io.py:113 in read_pdf
tables = p.parse(

File ~\anaconda3\lib\site-packages\camelot\handlers.py:176 in parse
t = parser.extract_tables(

File ~\anaconda3\lib\site-packages\camelot\parsers\stream.py:456 in extract_tables
self._generate_table_bbox()

File ~\anaconda3\lib\site-packages\camelot\parsers\stream.py:310 in _generate_table_bbox
table_bbox = self._nurminen_table_detection(hor_text)

File ~\anaconda3\lib\site-packages\camelot\parsers\stream.py:287 in _nurminen_table_detection
table_bbox = textedges.get_table_areas(textlines, relevant_textedges)

File ~\anaconda3\lib\site-packages\camelot\core.py:221 in get_table_areas
average_textline_height = sum_textline_height / float(len(textlines))

ZeroDivisionError: float division by zero

_

And cannot index the reading pages from first to second to last, as I cannot know beforehand the total number of pages.

Can you help me out?

@Bernardo-Hazan
Copy link
Author

Flavor used: stream

@Bernardo-Hazan
Copy link
Author

I just noticed that there could be a table or not on the last page, it depends on the day. So, always ignoring the last page is not an option. Nonetheless, a conditional analysis before a division with zero (len(textlines)), and page skipping, would solve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant