Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ascii' codec can't encode character u'\u0645' in position 0: ordinal not in range(128) #63

Closed
mshakirDr opened this issue Mar 2, 2019 · 1 comment

Comments

@mshakirDr
Copy link

mshakirDr commented Mar 2, 2019

I am trying to run Urdu training data (using Noori Nastaleeq font) but make training urd results in the following:

python generate_line_box.py -i "data/ground-truth/longJameel_Noori_NastaleeqRegular1610.tif" -t "data/ground-truth/longJameel_Noori_NastaleeqRegular1610.gt.txt" > "data/ground-truth/longJameel_Noori_NastaleeqRegular1610.box"
Traceback (most recent call last):
  File "generate_line_box.py", line 41, in <module>
    print(u"%s %d %d %d %d 0" % (prev_char, 0, 0, width, height))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0645' in position 0: ordinal not in range(128)
Makefile:111: recipe for target 'data/ground-truth/longJameel_Noori_NastaleeqRegular1610.box' failed

I have attached the problematic line image with .gt.txt. The files are generated on Windows uisng GDI and .net and imported to Linux. Putting urd.traineddata beforehand doesn't help as well.
output.zip

@mshakirDr
Copy link
Author

Paste the following at the start of generate_line_box.py. File editing won't work in Linux Subsystem for Windows (Permission denied error). Virtual machine or a Linux Machine is the solution.

# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant