-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Console unexpectedly uses a UTF-8 encoding *with BOM* on Windows #28929
Comments
@mklement0 would you like to work on this issue? |
Thanks for asking, @carlossanlop, but I'm hoping someone else can take this on. |
As far as class Program
{
static void Main()
{
Console.Write(Console.OutputEncoding.CodePage);
}
} Using the latest .NET 5.0 preview sdk:
Produces no BOM artifacts:
It should be noted however that Perhaps I'm misunderstanding what is broken here? |
My bad, @eiriktsarpalis - I missed that the referenced PR did fix the problem on Windows too, as you demonstrate. Sorry for the noise - I'm closing this. |
Note: #27258 (since fixed) addressed the same issue, but for Unix-like platforms only, because at the time I didn't realize that it should be fixed on Windows also.
On Windows too the preamble (BOM) should be removed from the UTF-8 encoding that the
Console
class uses when code page65001
(chcp 65001
) is in effect, given thatcmd.exe
- rightfully - has always operated without BOM in that case:After all, any programs relying on the active code page should be able to blindly assume that stdin input / stdout output is encoded accordingly and shouldn't have to deal with a BOM (neither to redundantly imply the same code page nor to potentially signal a different encoding).
The presence of a BOM, in fact, breaks PowerShell's background-jobs feature (via
Start-Job
), as detailed here:Follow-on bugs:
It's fair to assume that other applications/languages/frameworks may be affected as well, which can range from outright failure, as in PowerShell's case, to mistakenly considering the BOM part of the data.
Demonstration of
cmd
's current behavior:cmd
's stdout output has always created BOM-less output withchcp 65001
in effect; e.g. the following createst.txt
as a BOM-less UTF-8 file.The above shows that the file contains no BOM (and that char.
ü
was correctly encoded as UTF-8 as 2-byte sequence0xC3 0xBC
).The text was updated successfully, but these errors were encountered: