New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make console windows fully UTF-8 by default on Windows, in line with the behavior on Unix-like platforms - character encoding, code page #7233
Comments
It is a platform default: So we need do |
Thanks for the sleuthing, @iSazonov. Yes, I think the fix is also appropriate for Windows 7: While you're more likely to run into problems with standard console programs there that can even break with UTF-8 input, I think it's more important for PowerShell Core to exhibit consistent encoding behavior and to support modern, cross-platform utilities that natively speak UTF-8 by default. |
I hope @JamesWTruher could comment. I think he considered this in time writing and implementing Encoding RFC. |
Since Windows 7 EOL and community are migrating to Windows 10 it seems a time to switch a console default to UTF8 on WIndows. /cc @SteveL-MSFT |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Let me try to summarize, now that we (hopefully) have the full picture: I've hidden my previous comments in favor of this one, @nu8 - I encourage you to do the same, as appropriate. This comment also corrects my incorrect earlier claim that you cannot set the ANSI code page to This issue is about making UTF-8 support in PowerShell on Windows complete, by making sure that PowerShell also uses UTF-8 when communicating with external programs (the built-in cmdlets already default to UTF-8, invariably so), which requires setting Currently, in the absence of PowerShell doing that itself, there are two workarounds: Option 1: Put the following statement in your
|
This comment was marked as outdated.
This comment was marked as outdated.
Is it a good idea to implement this inside PowerShell?Changing the encoding inside
Those changes remain after So, (unless there is a way to decouple the encoding and the codepage... Why are they coupled in first place?) should the current code page by changed by a console app? I don't think so. IMO, it either should be a system-wide setting, or a setting in WindowsTerminal / ConHost. Not a responsibility of a console app or a shell... |
Good point, @gerardog: If PowerShell is called from another shell, or more generally, from an existing console window, it wouldn't be appropriate to change the console window's code page without also restoring it on exit.
The system-wide change to UTF-8, as discussed in detail above, doesn't require any changes, and is already an option - but it has far-reaching consequences that may not work for everyone. Notably, both the OEM and the ANSI code page are then set to
That is an option - but a very cumbersome one: for ConHost you'd have to do it on a per-window-title basis, via the registry, individual shortcut files and Windows Terminal profiles would have to be modified with startup commands. The point is that PowerShell internally defaults to UTF-8, and externally it already defaults to UTF-8 when sending (piping) data, but not when receiving it, which makes for an awkward asymmetry. In order to make external programs use UTF-8 too, it must set the console code page(s) - the latter are what well-behaved CLIs consult in order to decide what character encoding to use. A simple solution - both conceptually simple and easy to document - would be to make PowerShell switch to UTF-8 (including changing the console code page) if and only if:
Conversely, that means that non-interactive CLI calls (via |
Oh MY GOD, thanks for your solution! I have been troubled by this issue for a long time, even though I have switched the code page to UTF-8 (65001). I also tried Java sources: System.out.println("""
Hello world!
你好,世界!
こんにちは世界!
안녕 세상!
""".stripTrailing());
System.out.println("System.out.charset(): " + System.out.charset());
System.out.println("properties:");
System.getProperties().forEach((k, v) -> {
if (k instanceof String ks && ks.contains("encod")) {
System.out.printf(" %s = %s%n", k, v);
}
}); Before:
After:
I strongly recommend providing the official UTF-8 configuration guide on the Windows platform. Otherwise, many developers will not be able to easily obtain the correct answer through Google/Bing/ChatGPT... |
PowerShell Core now commendably defaults to UTF-8 encoding, including when sending strings to external programs, as reflected in
$OutputEncoding
's default value.However, because the console-window shortcut file / taskbar entry still defaults to the OEM code page implied by the legacy system locale (e.g.
437
on US-English systems), it misinterprets strings from external programs; e.g., with Node.js installed:This currently requires the following workaround (in addition to requiring the console window to use a TrueType font (true by default on Windows 10)):
Prepend
$OutputEncoding =
to make a Windows PowerShell console fully UTF-8-aware.The above implicitly switches to the UTF-8 code page (
65001
), as then reflected inchcp
.This obscure workaround shouldn't be necessary, and I think it would make sense for PowerShell to automatically set
[console]::InputEncoding
and[console]::OutputEncoding
to (BOM-less) UTF-8 on startup.Update: When this issue was originally created, there was no mechanism for presetting code page
65001
(UTF-8) system-wide, which necessitated the awkward workaround. In recent versions of Windows 10 it is now possible to switch to code page65001
as the system locale and therefore system-wide, although as of Windows 10 version 1909 that feature is still in beta - see this SO answer.65001
in all console windows (includingcmd.exe
windows), this invariably also makes Windows PowerShell's ANSI-encoding-default cmdlets default to UTF-8, notablyGet-Content
andSet-Content
, which can be problematic from a backward-compatibility perspective.Additionally, there is a bug - see below.
The change, which can also be made programmatically (see below), requires administrative privileges and a reboot.
Environment data
The text was updated successfully, but these errors were encountered: