Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to read Hyphen ( -) properly #29

Closed
4 tasks
catchcharan opened this issue May 17, 2017 · 4 comments
Closed
4 tasks

Unable to read Hyphen ( -) properly #29

catchcharan opened this issue May 17, 2017 · 4 comments

Comments

@catchcharan
Copy link

catchcharan commented May 17, 2017

Summary of your issue

X410-SATA-S28 text value in a pdf is getting converted as X410?SATA?S28 into csv. this issue applicable for python and java versions on tabula.

Environment

windows/linux
Write and check your environment.

  • python --version: ? 2.7
  • java -version: ? 1.7
  • OS and it's version: ? windows/linux
  • Your PDF URL: just pdf with one cell and the value X410-SATA-S28

What did you do when you faced the problem?

I will replace ? with a hyphen in code temporarily
//write here

Example code:

paste your core code

java -Xmx4080m -jar C:\Python27\lib\site-packages\tabula\tabula-0.9.2-jar-with-dependencies.jar --pages all --guess --format CSV --outfile C:\Meher\pricelistoutput.csv --spreadsheet C:\Meher\pricelist.pdf

Output:

paste your output
X410?SATA?S28 

## What did you intend to be?

X410-SATA-S28 
@chezou
Copy link
Owner

chezou commented May 18, 2017

Basically, it is tabula-java's problem, but I think you should set appropriate locale.
tabulapdf/tabula-java#143

@chezou
Copy link
Owner

chezou commented Aug 8, 2017

@catchcharan I released tabula-py v1.0.0. I think this problem is resolved. Could you try it?

@chezou
Copy link
Owner

chezou commented Aug 14, 2017

I found your code is using tabula-java. It is not tabula-py's issue.

@chezou chezou closed this as completed Aug 14, 2017
@abedkhooli
Copy link

Trying v1 of tabula-py with Arabic pdf, text comes as ??s (same as command line in tabula-java. I am using Windows 10 and I guess it has to do with the jar file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants