Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider changing the default console encoding on Windows to UTF-16 #31466

Open
Serentty opened this issue Nov 9, 2019 · 5 comments
Open

Consider changing the default console encoding on Windows to UTF-16 #31466

Serentty opened this issue Nov 9, 2019 · 5 comments

Comments

@Serentty
Copy link
Contributor

Serentty commented Nov 9, 2019

In my tests on Windows, .NET Core's console I/O defaults to the user's local codepage. This leads to quite a bit of overhead as .NET's UTF-16 strings have to be converted to the codepage, just for the console to convert them back to UTF-16. On top of this, it means that a lot of characters can't be properly represented, which I discovered by accident when I was unable to enter a euro sign in my program as my locale (932) doesn't have it. There's an easy fix for this, which is to change the console encoding manually, but unless the programmer is doing fuzzing testing, this problem could be hard to discover.

With that in mind, I propose that the default encoding for console I/O on Windows be changed to UTF-16, allowing low-overhead and lossless passing around of strings between the console and .NET programs.

On Mac and Linux, the default already seems to be UTF-8, which a sensible choice on those platforms, and I think Windows should be brought in line with those other two by using a Unicode encoding (UTF-16) for its I/O as well.

@scalablecory
Copy link
Contributor

This seems reasonable. We'd need to understand it from a compat perspective. Most console apps are expecting -- purely by default, so they may not even realize it -- the user's codepage when reading stdin. Changing the default here might create issues when piping between apps.

@scalablecory scalablecory transferred this issue from dotnet/core Nov 11, 2019
@huoyaoyuan
Copy link
Member

Shouldn't morden applications be totally agnostic to code pages? All strings should only be processed in some variation of Unicode.

@Serentty
Copy link
Contributor Author

@huoyaoyuan I agree. The issue here is deciding whether it's okay to make this change, which is possibly breaking for a very few specific use cases. I think user I/O over the console should be mostly unaffected, but as was brought up, this could break applications which assume codepages when piping the standard input and output. However, such applications are already incredibly fragile since they rely on the user having the correct codepage set system-wide, so they're likely to break all the time even without this change. In my opinion, this change would fix more than it breaks for the vast majority of users.

@Serentty
Copy link
Contributor Author

I've seen some other threads complaining that the console is defaulting to UTF-8 instead of codepages for them on Windows, but these are a few years old, so I'm a bit confused now. Was the default behaviour changed to default to codepages instead?

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@Serentty
Copy link
Contributor Author

@scalablecory Would a change like this be a good fit for a major release like .NET 5 or 6? After all, those usually have a list of migration notes, compared to more minor releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants