New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify Table Areas Returns Full Page #149
Comments
Looks like a bug, let me look into this. |
I'm having the same issue, can reproduce it using the example in the docs: https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-table-areas tables = camelot.read_pdf('table_areas.pdf', flavor='stream', table_areas=['316,499,566,337'])
tables[0].df returns the entire page. |
Sorry for the late response on this and sorry again for a typo in the docs. The keyword argument to specify table areas is Will change it to |
It works. Thank you! |
Table_area still reads the whole page |
As per the advanced uses section in the documentation, I would like to define a portion of a page for table extraction using the stream parsing method. I am using the 3rd page of the following pdf...
SziniczToxicol.pdf
I read the pdf like this
table = Camelot.read_pdf('SziniczToxicol.pdf', pages='3', flavor='stream', flag_size=True)
visualize text to understand the table boundaries
table[0].plot('text')
Observe the upper left and bottom right boundaries which I estimated to be (79,727) and (537,383) respectively.
Now I attempt to parse this section along with column demarcations (353 and 474).
table2 = Camelot.read_pdf('SziniczToxicol.pdf' ,pages='3', flavor='stream', table_areas=['79,727,537,384'], columns=['353,473'], flag_size=True)
The attached output csv file includes text beyond my selection in fact it seems to be the full page in 3 column format. Is this due to stream treating the whole page as one table? Am I specifying my selected area correctly? Any help would be great. Thanks for making this great tool.
Toxicol-page-3-table-1.zip
The text was updated successfully, but these errors were encountered: