Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Console unexpectedly uses a UTF-8 encoding *with BOM* on Windows #28929

Closed
mklement0 opened this issue Mar 11, 2019 · 4 comments
Closed
Assignees
Labels
area-System.Console enhancement Product code improvement that does NOT require public API changes/additions

Comments

@mklement0
Copy link

Note: #27258 (since fixed) addressed the same issue, but for Unix-like platforms only, because at the time I didn't realize that it should be fixed on Windows also.

On Windows too the preamble (BOM) should be removed from the UTF-8 encoding that the Console class uses when code page 65001 (chcp 65001) is in effect, given that cmd.exe - rightfully - has always operated without BOM in that case:

After all, any programs relying on the active code page should be able to blindly assume that stdin input / stdout output is encoded accordingly and shouldn't have to deal with a BOM (neither to redundantly imply the same code page nor to potentially signal a different encoding).

The presence of a BOM, in fact, breaks PowerShell's background-jobs feature (via Start-Job), as detailed here:

Follow-on bugs:

It's fair to assume that other applications/languages/frameworks may be affected as well, which can range from outright failure, as in PowerShell's case, to mistakenly considering the BOM part of the data.


Demonstration of cmd's current behavior:

cmd's stdout output has always created BOM-less output with chcp 65001 in effect; e.g. the following creates t.txt as a BOM-less UTF-8 file.

C:\> chcp 65001
C:\> (echo hü)> t.txt
C:\> powershell -noprofile -c Format-Hex t.txt

           Path: C:\Users\jdoe\t.txt

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   68 C3 BC 0D 0A                                   hü..

The above shows that the file contains no BOM (and that char. ü was correctly encoded as UTF-8 as 2-byte sequence 0xC3 0xBC).

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@carlossanlop carlossanlop added enhancement Product code improvement that does NOT require public API changes/additions and removed untriaged New issue has not been triaged by the area owner labels Apr 27, 2020
@carlossanlop
Copy link
Member

@mklement0 would you like to work on this issue?

@carlossanlop carlossanlop modified the milestones: Future, 5.0 Apr 27, 2020
@carlossanlop carlossanlop added the help wanted [up-for-grabs] Good issue for external contributors label Apr 27, 2020
@mklement0
Copy link
Author

Thanks for asking, @carlossanlop, but I'm hoping someone else can take this on.

@eiriktsarpalis
Copy link
Member

As far as Console.Out is concerned, it would seem like the current implementation is stripping the preamble for Windows as well. I tested the following console app:

    class Program
    {
        static void Main()
        {
            Console.Write(Console.OutputEncoding.CodePage);
        }
    }

Using the latest .NET 5.0 preview sdk:

chcp 65001
dotnet run > foo.txt
powershell -noprofile -c Format-Hex foo.txt

Produces no BOM artifacts:

           Path: C:\Users\eitsarpa\source\repos\ConsoleApp14\ConsoleApp14\foo.txt

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   36 35 30 30 31                                   65001

It should be noted however that System.Console.OutputEncoding does not strip the preamble, so applying it to the raw stdout stream should not have the same effect.

Perhaps I'm misunderstanding what is broken here?

@eiriktsarpalis eiriktsarpalis modified the milestones: 5.0.0, 6.0.0 Aug 17, 2020
@mklement0
Copy link
Author

mklement0 commented Aug 17, 2020

My bad, @eiriktsarpalis - I missed that the referenced PR did fix the problem on Windows too, as you demonstrate.

Sorry for the noise - I'm closing this.

AS & ET Area Ownership automation moved this from To do to Done Aug 17, 2020
@adamsitnik adamsitnik removed this from the 6.0.0 milestone Aug 17, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Console enhancement Product code improvement that does NOT require public API changes/additions
Projects
No open projects
Development

No branches or pull requests

6 participants