Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read the embedded font #31

Closed
4 tasks
GGPay opened this issue May 18, 2017 · 6 comments
Closed
4 tasks

Can't read the embedded font #31

GGPay opened this issue May 18, 2017 · 6 comments

Comments

@GGPay
Copy link

GGPay commented May 18, 2017

Hi

I've got a problem when try read one of the pdf. Can you take a look - where am i wrong?

from tabula import convert_into
convert_into("data\test1.pdf", "data\test1.csv", output_format="csv")

Output:

image

May 18, 2017 2:56:53 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2Font getawtFont
INFO: Can't read the embedded font Arial-BoldMT
May 18, 2017 2:56:53 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2Font getawtFont
INFO: Using font Arial Bold instead
May 18, 2017 2:56:53 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2Font getawtFont
INFO: Can't read the embedded font ArialMT
May 18, 2017 2:56:53 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2Font getawtFont
INFO: Using font Arial instead

@chezou
Copy link
Owner

chezou commented May 18, 2017

It seems you're facing this bug.
https://issues.apache.org/jira/browse/PDFBOX-2818

In the master code of tabula-java, it had been upgraded to PDFBOX 2.0.0, so after next release of tabula-java, it should be fixed.

@GGPay
Copy link
Author

GGPay commented May 18, 2017

Thanks. I google it too.

The package works fine with that bug.
Also i had another issue too - looks like tabula-java doesn't support big PDF. When i split PDF for 100 pages reach - the package works fine.

Thank you sooo much for your package. You save me tons of time.

@chezou
Copy link
Owner

chezou commented May 18, 2017

As I mentioned in #27 , you can set -Xmx option.

Now we can set java options for tabula-py using java_options=["-Xmx2048g"].

@GGPay
Copy link
Author

GGPay commented May 19, 2017

Thank you - large file works with java_options=["-Xmx256m"]

@chezou chezou mentioned this issue Jul 19, 2017
4 tasks
@chezou
Copy link
Owner

chezou commented Aug 8, 2017

@GGPay I released tabula-py v1.0.0. Could you try it?

@chezou
Copy link
Owner

chezou commented Aug 14, 2017

Using tabula-py v1.0.0, convert_into() to your PDF has been succeeded.

image

@chezou chezou closed this as completed Aug 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants