A Python package to enable Unicode input and display when running Python from Windows console.
When running Python in the standard console on Windows, there are several problems when one tries to enter or display Unicode characters. The relevant issue is http://bugs.python.org/issue1602. This package solves some of them.
First, when you want to display Unicode characters in Windows console, you have to select a font able to display them. Similarly, if you want to enter Unicode characters, you have to have you keyboard properly configured. This has nothing to do with Python, but is included here for completeness.
The standard stream objects (
sys.stderr) are not capable of reading and displaying Unicode characters in Windows console. This has nothing to do with encoding, since even
αand there is no encoding under which
streamsmodule provides several alternative stream objects.
stderr_raware raw stream objects using WinAPI functions
WriteConsoleWto interact with Windows console through UTF-16-LE encoded bytes. The
stderr_textare standard text IO wrappers over standard buffered IO over our raw streams, and are intended to be primary replacements to
sys.std*streams. Unfortunately, other wrappers around
std*_textare needed (see below), so there are more stream objects in
streams.enableinstalls chosen stream objects instead of the original ones. By default, it chooses appropriate stream objects itself. The function
streams.disablerestores the original stream objects (these are stored in
sys.__std*__attributes by Python).
After replacing the stream objects, also using
input, see below.
Python interactive loop doesn't use
sys.stdinto read input so fixing it doesn't help. Also the
inputfunction may or may not use
sys.stdindepending on whether
sys.stdouthave the standard filenos and whether they are interactive. See http://bugs.python.org/issue17620 for more information.
To solve this, we install a custom readline hook. Readline hook is a function which is used to read a single line interactively by Python REPL. It may also be used by
inputfunction under certain conditions (see above). On Linux, this hook is usually set to GNU readline function, which provides features like autocompletion, history, …
readline_hookprovides our custom readline hook, which uses
sys.stdinto get the input and is (de)activated by functions
As we said, readline hook can be called from two places – from the REPL and from
inputfunction. In the first case the prompt is encoded using
sys.stdin.encoding, but in the second case
sys.stdout.encodingis used. So Python currently makes an assumption that these two encodings are equal.
Python tokenizer, which is used when parsing the input from REPL, cannot handle UTF-16 or generally any encoding containing null bytes. Because UTF-16-LE is the encoding of Unicode used by Windows, we have to additionally wrap our text stream objects (
streamsmodule contains also stream objects
stderr_text_transcoded. They basically just hide the underlying UTF-16-LE encoded buffered IO, and sets encoding to UTF-8. These transcoding wrappers are used by default by
There are additional issues on Python 2.
Since default Python 2 strings correspond to
unicode, people are usually calling
bytesargument. That is why we add
stderr_text_strstream objects to
streamsmodule. They are used by default on Python 2.
When we enter a Unicode literal into interactive interpreter, it gets processed by the Python tokenizer, which is bytes-based. When we enter
u"\u03b1"into the interactive interpreter, the tokenizer gets essentially
b'u"\xce\xb1"'plus the information that the encoding used is UTF-8. The problem is that the tokenizer uses the encoding only if
sys.stdinis a file object (see https://hg.python.org/cpython/file/d356e68de236/Parser/tokenizer.c#l797). Hence, we introduce another stream object
stdin_text_transcodedand also is structurally compatible with Python file object. This object is used by default on Python 2.
The check for interactive streams done by
raw_inputunfortunately requires that both
sys.stdoutare file objects. Besides
stdin_text_fileobjfor stdin we could use also
stdout_text_str_fileobjfor stdout. Unfortunately, that breaks
sys.stdoutas argument. Unfortunately, its generic
writemethod is used only if it is not a file object. Otherwise,
PyObject_Printis called, and this function is file-based, so it ends with a
fprintfcall, which is not something we want. In conclusion, we need stdout not to be a file object.
Given the situation described, the best solution seems to be reimplementing
inputbuiltin functions and monkeypatching
__builtins__. This is done by our
raw_inputmodule on Python 2.
Similarly to the input from from
sys.stdinthe arguments in
byteson Python 2 and the original ones may not be reconstructable. To overcome this we add
unicode_argvmodule. The function
unicode_argv.get_unicode_argvreturns Unicode version of
sys.argvobtained by WinAPI functions
CommandLineToArgvW. The function
sys.argvwith the Unicode arguments.
Install the package from PyPI via
pip install win-unicode-console (recommended), or download the archive and install it from the archive (e.g.
pip install win_unicode_console-0.x.zip), or install the package manually by placing directory
win_unicode_console and module
run.py from the archive to the
site-packages directory of your Python installation.
win_unicode_console module contains a function
enable, which install various fixes offered by
win_unicode_console modules, and a function
disable, which restores the original environment. By default, custom stream objects are installed as well as a custom readline hook. On Python 2,
input functions are monkeypatched.
sys.argv is not monkeypatched by default since unfortunately some Python 2 code strictly assumes
str instances in
sys.argv list. Use
enable(use_unicode_argv=True) if you want the monkeypathcing. For further customization, see the sources. The logic should be clear.
Generic usage of the package is just calling
win_unicode_console.enable() whenever the fixes should be applied and
win_unicode_console.disable() to revert all the changes. Note that it should be a responsibility of a Python user on Windows to install
win_unicode_console and fix his Python environment regarding Unicode interaction with console, rather than of a third-party developer enabling
win_unicode_console in his application, which adds a dependency. Our package should be seen as an external patch to Python on Windows rather than a feature package for other packages not directly related to fixing Unicode issues.
Different ways of how
win_unicode_console can be used to fix a Python environment on Windows follow.
- Python patch (recommended). Just call
usercustomizemodule (see https://docs.python.org/3/tutorial/appendix.html#the-customization-modules for more information). This will enable
win_unicode_consoleon every run of the Python interpreter (unless
siteis disabled). Doing so should not break executed scripts in any way. Otherwise, it is a bug of
win_unicode_consolethat should be fixed.
- Opt-in runner. You may easily run a script with
win_unicode_consoleenabled by using our
runnermodule and its helper
runscript. To do so, execute
py -i -m run script.pyinstead of
py -i script.pyfor interactive mode, and similarly
py -m run script.pyinstead of
py script.pyfor non-interactive mode. Of course you may provide arguments to your script:
py -i -m run script.py arg1 arg2. To run the bare interactive interpreter with
py -i -m run.
- Opt-out runner. In case you are using
win_unicode_consoleas Python patch, but you want to run a particular script with
win_unicode_consoledisabled, you can also use the runner. To do so, execute
py -i -m run --init-disable script.py.
- Customized runner. To move arbitrary initialization (e.g. enabling
win_unicode_consolewith non-default arguments) from
sitecustomizeto opt-in runner, move it to a separate module and use
py -i -m run --init-module module script.py. That will import a module
moduleon startup instead of enabling
win_unicode_consolewith default arguments.
win_unicode_console package was tested on Python 3.4, Python 3.5, and Python 2.7. 32-bit or 64-bit shouldn't matter. It also interacts well with the following packages:
coloramapackage (https://pypi.python.org/pypi/colorama) makes ANSI escape character sequences (for producing colored terminal text and cursor positioning) work under MS Windows. It does so by wrapping
win_unicode_consolereplaces the streams in order to support Unicode,
win_unicode_console.enablehas to be called before
colorama.initso everything works as expected.
coloramav0.3.3, there was an early binding issue (https://github.com/tartley/colorama/issues/32), so
win_unicode_console.enablehas to be called even before importing
colorama. Note that is already the case when
win_unicode_consoleis used as Python patch or as opt-in runner. The issue was already fixed.
pyreadlinepackage (https://pypi.python.org/pypi/pyreadline/2.0) implements GNU readline features on Windows. It provides its own readline hook, which actually supports Unicode input.
pyreadlineis active, and in that case, by default, reuses its readline hook rather than installing its own, so GNU readline features are preserved on top of our Unicode streams.
IPython(https://pypi.python.org/pypi/ipython) can be also used with
IPython3.2.1, there is an early binding issue (https://github.com/ipython/ipython/issues/8669), so
win_unicode_console.enablehas to be called even before importing
IPython. That is the case when
win_unicode_consoleis used as Python patch.
There was also an issue that IPython was not compatible with the builtin function
raw_inputreturning unicode on Python 2 (https://github.com/ipython/ipython/issues/8670). If you hit this issue, you can make
win_unicode_console.raw_input.raw_inputreturn bytes by enabling it as
win_unicode_console.enable(raw_input__return_unicode=False). This was fixed in IPython 4.
Since version 0.4, the signature of
streams.enablehas been changed because there are now more options for the stream objects to be used. It now accepts a keyword argument for each
stderr, setting the corresponding stream.
Nonemeans “do not set”,
Ellipsismeans “use the default value”.
streams.enable_onlywas added. It works the same way as
streams.enable, but the default value for each parameter is
streams.enable_error_writerhave been removed. Example: instead of
There are also corresponding changes in top-level
Since version 0.3, the custom stream objects have the standard filenos, so calling
inputdoesn't handle Unicode without custom readline hook.
- The code of
streamsmodule is based on the code submitted to http://bugs.python.org/issue1602.
- The idea of providing custom readline hook and the code of
readline_hookmodule is based on https://github.com/pyreadline/pyreadline.
- The code related to
unicode_argv.get_full_unicode_argvis based on http://code.activestate.com/recipes/572200/.
- The idea of using path hooks and the code related to
unicode_argv.argv_setter_hookis based on https://mail.python.org/pipermail/python-list/2016-June/710183.html.