Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't Read Arabic file names #124

Closed
AhmadSawalhah opened this issue Nov 23, 2016 · 3 comments
Closed

Can't Read Arabic file names #124

AhmadSawalhah opened this issue Nov 23, 2016 · 3 comments
Assignees
Milestone

Comments

@AhmadSawalhah
Copy link

AhmadSawalhah commented Nov 23, 2016

@chrismattmann
problem in reading files if its name is Arabic
try this file name ( احمد.docx) , it give me this error

File "C:\Python34\lib\http\client.py", line 1109, in putheader values[i] = one_value.encode('latin-1') UnicodeEncodeError: 'latin-1' codec can't encode characters in position 21-27: ordinal not in range(256)
when I just change the name to english, it works fine
thanks

@chrismattmann
Copy link
Owner

I'll check it out @AhmadSawalhah but it may have to come in 1.15

@chrismattmann
Copy link
Owner

in your program can you force the encoding before calling Tika?

@chrismattmann
Copy link
Owner

closing this as there is a current work-around. Just convert file name to English, then run. If you can answer my questions in the future feel free to open a new issue and cite this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants