-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Make Console.Input/OutputEncoding default to UTF-16 on Windows #70168
Comments
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsBackgroundCurrently,
ProposalANSI codepages are totally legacy. We should totally get rid of it, and use some variant of Unicode anywhere. This would be a breaking change for ones who operates with Additionally, setting default encoding to UTF-16 should also indicates encoding problems when using English only. Since most code pages including UTF-8 shares the ASCII range, English text always gets outputted correctly under misconfigured encoding. Since most of the development is under English, encoding problems get kept silently. Additional wordsI'd really want you to treat encoding problem as severe bug. It's never a problem for English users, but has frustrated other language users for decades, since the start of multi-language Windows. Fixing such problem in minor release of VS instead of patch release is unacceptable for me, as well as other Chinese users.
|
On Windows you can set the default codepage to UTF-8 and this will reflect on all .NET applications. You can do that by running on non-Windows platforms, mostly the terminals already are using UTF-8 encoding. |
I know this option. Unfortunately, there's still tons of encoding issues under this, either existing or newly introduced. This option doesn't solve any issue at all. Using UTF-16 has more benefit that consoles are operated using W variant of console API, instead of file API. runtime/src/libraries/System.Console/src/System/ConsolePal.Windows.cs Lines 1193 to 1204 in 45589f2
|
Is this a mega breaking change? |
I have the same problem with CP932. However I want default is UTF-8 instead of UTF-16. |
GetConsoleCP is OK. What I want is just running F5 Debug Console with CP 65001. |
In fact I don't know. It also depends on how Windows handles the relationship between console file and the console APIs. In other words, I want to switch to |
I did some test with redirecting: The There is no magic happened. Both side of the pipe need to get agreement about the encoding. Changing default to UTF-16 would break a lot, since UTF-16 isn't widely used as file or communication encoding. The current behavior is far from ideal. With observing PowerShell garbling things, I understand how encoding issue happens. |
Today I read at OldNewThing that the default encoding can be set to UTF-8 through manifest. Although we don't own the manifest for any binaries, we can consider to set this property in default template. Setting this on the default |
Yes, it is a big breaking change. Windows didn't make this option as a default and marking it as |
Tagging subscribers to this area: @dotnet/area-system-console Issue DetailsBackgroundCurrently,
ProposalANSI codepages are totally legacy. We should totally get rid of it, and use some variant of Unicode anywhere. This would be a breaking change for ones who operates with Additionally, setting default encoding to UTF-16 should also indicates encoding problems when using English only. Since most code pages including UTF-8 shares the ASCII range, English text always gets outputted correctly under misconfigured encoding. Since most of the development is under English, encoding problems get kept silently. Additional wordsI'd really want you to treat encoding problem as severe bug. It's never a problem for English users, but has frustrated other language users for decades, since the start of multi-language Windows. Fixing such problem in minor release of VS instead of patch release is unacceptable for me, as well as other Chinese users.
|
Closing as a duplicate of #31466. |
Background
Currently,
System.Console
callsGetConsoleCP
on Windows to get console encoding, which has caused enormous problems:Characters not in current code page can be displayed/inputted in console, under default setting:
Without explicitly specifying
Encoding.Unicode
, the console can't display emoji (via Windows Terminal), or some other script not represented. (On Windows-1252 system it should not be able to display Chinese).Characters are frequently transcoded in wrong way, and get garbled.
Referring to
C# Interactive
is broken in VS16.8 preview5 roslyn#48874. I'm pretty annoyed too like the person in that thread.It's also garbling with latest dotnet SDK. The issue is newly happened with SDK update within this month (May).
Proposal
ANSI codepages are totally legacy. We should totally get rid of it, and use some variant of Unicode anywhere.
The internal encoding of Windows NT is UTF-16, the same of .NET. We can also safe the time of transcoding from UTF-16 to code page then to UTF-16 again.
This would be a breaking change for ones who operates with
Console.OpenStandardXXX
and redirected IO, which can be addressed by setting console encoding in program entry point. We may also add a compatibility switch for this. For ASCII interoperability, we should suggest setting the encoding to UTF-8.Additionally, setting default encoding to UTF-16 should also indicates encoding problems when using English only. Since most code pages including UTF-8 shares the ASCII range, English text always gets outputted correctly under misconfigured encoding. Since most of the development is under English, encoding problems get kept silently.
Additional words
I'd really want you to treat encoding problem as severe bug. It's never a problem for English users, but has frustrated other language users for decades, since the start of multi-language Windows. Fixing such problem in minor release of VS instead of patch release is unacceptable for me, as well as other Chinese users.
Multi-byte encoding system gets more pain from non-coding elements. Characters from wrong encoding will appear as broken mult-byte sequence (#69781).
There is Spanish build in roslyn CI. Can we add a CI leg to verify the runtime builds (and test runs?) correctly on non-English system?
The text was updated successfully, but these errors were encountered: