Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError in Python version when using Cyrillic characters. #9

Closed
GoogleCodeExporter opened this issue Nov 25, 2015 · 1 comment

Comments

@GoogleCodeExporter
Copy link

Want to use Cyrillic characters with diff_match_patch (python version,
release), but got errors like:
"UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 0:
unexpected end of data"

appending in some places to strings ".decode("utf-8").encode("utf-8")",
seem to solve the problems, but I guess not 100%.

see the attached patch (and for any case new file).

Alexandr.

Original issue reported on code.google.com by sashul...@gmail.com on 11 May 2008 at 1:12

Attachments:

@GoogleCodeExporter
Copy link
Author

Thank you Alexandr for the bug report and the patches.  Sorry for the delay.  I 
have
fixed the Unicode issues in diff_fromDelta and patch_fromText.  In both cases I 
added:
    if type(text) == unicode:
      text = text.encode("ascii")
These are two functions which are expecting a subset of ASCII characters.

However, your patch also made changes to diff_text1, diff_text2, patch_apply
and patch_obj.__str__.  Despite many tests, I am unable to find scenarios where
the existing code fails when passed Unicode.  An example testcase would be most
apreciated.

In the mean time, I've pushed out a new version which includes the Unicode 
fixes for
diff_fromDelta and patch_fromText in the Python version, as well as a new unit 
test
in all three versions which verifies the behaviour of invalid Unicode sequences 
(e.g.
%c3%xy).

Original comment by neil.fra...@gmail.com on 14 May 2008 at 7:47

  • Changed state: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant