Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal: [console]::InputEncoding and [console]::OutputEncoding are set to the wrong UTF-8 encoding (*with* BOM) #7634

Closed
mklement0 opened this issue Aug 25, 2018 · 9 comments
Labels
Resolution-External The issue is caused by external component(s). WG-Interactive-Console the console experience

Comments

@mklement0
Copy link
Contributor

Note: As of v6.1.0-rc.1, the console on Windows is fundamentally not configured to use UTF-8 yet - see #7233

While [console]::InputEncoding and [console]::OutputEncoding on macOS and Linux are set to UTF-8, the specific encoding variant used is the one with a BOM, which is the wrong one (though I'm unclear on what the practical implications are, given that streams, not files are typically involved).

This contrasts with automatic variable $OutputEncoding which correctly uses the BOM-less UTF-8 encoding.

Steps to reproduce (macOS and Linux)

[console]::InputEncoding.GetPreamble().Count, 
[console]::OutputEncoding.GetPreamble().Count, 
$OutputEncoding.GetPreamble().Count

Expected behavior

0
0
0

Actual behavior

3
3
0

That is, the 3-byte BOM is unexpectedly present in [console]::InputEncoding and [console]::OutputEncoding

Environment data

PowerShell Core v6.1.0-rc.1 on macOS 10.13.6
PowerShell Core v6.1.0-rc.1 on Ubuntu 16.04.4 LTS
@PetSerAl
Copy link
Contributor

Well it is not PowerShell issue, but .NET one.

@mklement0
Copy link
Contributor Author

I haven't looked into it, @PetSerAl - do you know where these encodings are assigned?

@PetSerAl
Copy link
Contributor

@mklement0 I will reference Windows implementation. I am not sure if it behave the same on Linux or MacOS since it use platform dependent function on Windows.
Here it calls native GetConsoleOutputCP. For UTF-8 it would return 65001. There are not way to distinguish BOM/BOM-less flavor of UTF-8. Underlying console simply does not support that.
On the next line it just calls Encoding.GetEncoding which return UTF-8 with BOM if you pass 65001 as argument:

[Text.Encoding]::GetEncoding(65001).GetPreamble().Length # 3

@PetSerAl
Copy link
Contributor

Though, at quick glance on .NET Core implementation, it looks like it supposed to return BOM-less UTF-8 in this case, so maybe I was wrong and it have some PowerShell related issue.

@mklement0
Copy link
Contributor Author

Thanks, @PetSerAl. Indeed, given that [Text.Encoding]::Default.GetPreamble().Count is 0 - i.e., the BOM-less UTF-8 variant, it would stand to reason that the console use the same encoding.

@iSazonov iSazonov added the WG-Interactive-Console the console experience label Aug 27, 2018
@iSazonov
Copy link
Collaborator

It is here
https://github.com/dotnet/corefx/blob/e1a33f4a1cc582f42bd5ae0f887bf1aa9878d493/src/System.Console/src/System/ConsolePal.Unix.cs#L612
https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/Common/src/System/Text/EncodingHelper.Unix.cs#L34

So on Unixes CoreFX looks in env variables "LC_ALL", "LC_MESSAGES", "LANG".
If the vars is absent the default is Utf8 without BOM. ( return enc ?? new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);)

@mklement0
Copy link
Contributor Author

mklement0 commented Aug 28, 2018

Thanks for digging deeper, @iSazonov.

In the absence of said environment variables the behavior is correct, but it is broken if these environment variables are present and specify UTF-8 - and these environment variables are virtually never absent, because they are part of the current locale (culture) setting, and these days they indeed typically specify UTF-8.

I've filed a CoreFx bug - see https://github.com/dotnet/corefx/issues/32004

@BrucePay
Copy link
Collaborator

@mklement0 So this should be resolved as external correct?

@mklement0
Copy link
Contributor Author

mklement0 commented Aug 28, 2018

@BrucePay: Indeed, thanks - should've made that clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution-External The issue is caused by external component(s). WG-Interactive-Console the console experience
Projects
None yet
Development

No branches or pull requests

4 participants