Specify Table Areas Returns Full Page #149

Tavisius25 · 2018-10-15T21:15:28Z

As per the advanced uses section in the documentation, I would like to define a portion of a page for table extraction using the stream parsing method. I am using the 3rd page of the following pdf...
SziniczToxicol.pdf

I read the pdf like this
table = Camelot.read_pdf('SziniczToxicol.pdf', pages='3', flavor='stream', flag_size=True)

visualize text to understand the table boundaries
table[0].plot('text')

Observe the upper left and bottom right boundaries which I estimated to be (79,727) and (537,383) respectively.

Now I attempt to parse this section along with column demarcations (353 and 474).
table2 = Camelot.read_pdf('SziniczToxicol.pdf' ,pages='3', flavor='stream', table_areas=['79,727,537,384'], columns=['353,473'], flag_size=True)

The attached output csv file includes text beyond my selection in fact it seems to be the full page in 3 column format. Is this due to stream treating the whole page as one table? Am I specifying my selected area correctly? Any help would be great. Thanks for making this great tool.
Toxicol-page-3-table-1.zip

vinayak-mehta · 2018-10-16T16:24:58Z

Looks like a bug, let me look into this.

charles-haynes · 2018-10-22T00:05:05Z

I'm having the same issue, can reproduce it using the example in the docs:

https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-table-areas

tables = camelot.read_pdf('table_areas.pdf', flavor='stream', table_areas=['316,499,566,337'])
tables[0].df

returns the entire page.

vinayak-mehta · 2018-10-22T22:28:40Z

Sorry for the late response on this and sorry again for a typo in the docs. The keyword argument to specify table areas is table_area and not table_areas. Though now that I think of it, table_areas sounds more right. I've fixed the docs.

Will change it to table_areas in a later release.

felipeacsi · 2018-10-25T16:07:21Z

It works. Thank you!

cfrejlach · 2021-03-12T03:52:45Z

Table_area still reads the whole page

vinayak-mehta closed this as completed Oct 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify Table Areas Returns Full Page #149

Specify Table Areas Returns Full Page #149

Tavisius25 commented Oct 15, 2018

vinayak-mehta commented Oct 16, 2018 •

edited

charles-haynes commented Oct 22, 2018 •

edited

vinayak-mehta commented Oct 22, 2018

felipeacsi commented Oct 25, 2018

cfrejlach commented Mar 12, 2021

Specify Table Areas Returns Full Page #149

Specify Table Areas Returns Full Page #149

Comments

Tavisius25 commented Oct 15, 2018

vinayak-mehta commented Oct 16, 2018 • edited

charles-haynes commented Oct 22, 2018 • edited

vinayak-mehta commented Oct 22, 2018

felipeacsi commented Oct 25, 2018

cfrejlach commented Mar 12, 2021

vinayak-mehta commented Oct 16, 2018 •

edited

charles-haynes commented Oct 22, 2018 •

edited