Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to read merged column headers #51

Closed
dr333 opened this issue Aug 31, 2017 · 1 comment
Closed

Unable to read merged column headers #51

dr333 opened this issue Aug 31, 2017 · 1 comment

Comments

@dr333
Copy link

dr333 commented Aug 31, 2017

This is a follow up of earlier issue#43 which was closed. All the details provided in #43 are the same, here is the updated information:

I upgraded tabula.py to use the latest jar (tabula-1.0.1-jar-with-dependencies.jar) and while it reduced these warnings, I still get some.

Aug 31, 2017 11:42:02 AM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont
WARNING: Using fallback font 'LiberationSans' for 'TimesNewRomanPS-ItalicMT'
Aug 31, 2017 11:42:03 AM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont
WARNING: Using fallback font 'LiberationSans' for 'TimesNewRomanPS-ItalicMT'
Aug 31, 2017 11:42:03 AM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont
WARNING: Using fallback font 'LiberationSans' for 'TimesNewRomanPS-ItalicMT'

The main issue is that common cell headers donot get read in and not sure if the warnings are related. Please find the PDF file here: ufile.io/5xuti
You will see that common cell header in table of page 1 for instance ('Three Months Ended March 31') gets dropped.

@chezou
Copy link
Owner

chezou commented Sep 1, 2017

Could you follow the issue template?

I think even if tabula-java can't solve to extract "common cell header". If tabula-java can, tabula-py convert it into DataFrame, and I think DataFrame can not handle combined cell.

@chezou chezou closed this as completed Sep 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants