Work around Unicode encoding error on Windows #703
Conversation
For Python <3.6 on Windows, encoding strings for output to the Windows command prompt may result in a UnicodeEncodingError. VOC does not take encoding into consideration, and so the output differs. Until such time as proper encoding support is implemented in VOC, use an environment variable that Python provides for overriding the IO encoding. By setting this to UTF-8, the output may appear garbled, but the error is avoided, and it matches the run-as-Java output. For consistency, pass the environment variable to Java as well. Addresses beeware#610 and beeware#237.
Thanks for looking into this. I'm intrigued what you'd consider "proper" encoding support in this context. I can't deny that Windows is definitely having difficulties here; I imagine there would be many Linux configurations that have similar problems due to odd codepage configurations. However, in the Linux space at least, my understanding is that this is something that is considered an error of usage - somewhere between "you're doing it wrong" and "you're doing it in a way that makes it impossible to know what is right". What is the right approach here? |
For one instance, to really match what Cpython does, sys.stdout.write() should be encoding from string to stream using sys.stdout.encoding. Typically for Linux and for Windows-with-Python>=3.6 that will be UTF-8, but it still should not be assumed. WIth voc, sys.stdout.encoding does not even exist. I believe this is at least part of the cause of #395, and if that's the case, that one can only be solved with some semblance of encoding support. |
If I understood correctly, the issue is that VOC output is always UTF-8, and the CPython output depends on the environment. @freakboy3742 any objections about merging this? |
I don't have any particular objection to merging, other than wanting to have a slightly better understanding of what the "real" fix is. If it's as simple as defining |
There is more to it than the portion I mentioned in my previous comment. I had thought it would require creating translation tables for each encoding we wish to support. However, I may give a shot at using Java's encoding support to make it work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am on a windows machine and the test_title and test_case_changes in the test_str.py file were failing earlier, but after making these changes they are working fine.
Hi there! It looks like this PR might be dead, so we're closing it for now. Feel free to re-open it if you'd like to continue, or think about directing your efforts to https://github.com/beeware/briefcase or https://github.com/beeware/toga. Both of these have more active development right now. 😄 |
For Python <3.6 on Windows, encoding strings for output to the Windows
command prompt may result in a UnicodeEncodingError. VOC does not take
encoding into consideration, and so the output differs.
Until such time as proper encoding support is implemented in VOC, use an
environment variable that Python provides for overriding the IO
encoding. By setting this to UTF-8, the output may appear garbled, but
the error is avoided, and it matches the run-as-Java output. For
consistency, pass the environment variable to Java as well.
Addresses #610 and #237.