-
-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandas.io.common.CParserError: Error tokenizing data. #17
Comments
It seems to related to #2 . I guess you should set area option https://github.com/chezou/tabula-py#how-can-i-ignore-useless-area . Could you show me your command with options? I'd like to know the result of extracting with tabula-java. Could you try it? |
We do this command: `#!/usr/bin/env python url = "https://resource.holdan.co.uk/Holdan/gbp/BMD.pdf" It seems to work in the vanilla tabula web interface. |
As mentioned in #2,though tabula-java exports multiple tables with defferent size of column, there is no delimiter within tables. So with current version of tabula-py, you should specify each tables areas. |
Ok, thanks for the help. Kieran |
Hi,
We get the following error parsing a certain pdf file from a URL.
This is using latest tabula-py from git.
url is https://resource.holdan.co.uk/Holdan/gbp/BMD.pdf
The text was updated successfully, but these errors were encountered: