Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Console]::OutputEncoding doesn't work to parse exe with unicode output #10789

Closed
SvenGroot opened this issue Oct 14, 2019 · 19 comments · Fixed by #10824
Closed

[Console]::OutputEncoding doesn't work to parse exe with unicode output #10789

SvenGroot opened this issue Oct 14, 2019 · 19 comments · Fixed by #10824
Assignees
Labels
Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a Resolution-Fixed The issue is fixed.
Projects
Milestone

Comments

@SvenGroot
Copy link

Steps to reproduce

I have a Windows executable that produces unicode (utf-16) output. In PowerShell 5.1, I can set the [Console]::OutputEncoding property so the output of that command gets correctly interpreted. On PowerShell Core 6.2.3, that doesn't appear to work.

I've also tried setting [Console]::InputEncoding and $OutputEncoding, but the problem persists.

For example, I use the wsl.exe binary here, so this should repro on any system that has the Windows Subsystem for Linux installed.

[Console]::OutputEncoding = [System.Text.Encoding]::Unicode
wsl.exe --list -v | ForEach-Object { $_ }

Expected behavior

PS C:\Users\svgroot> [Console]::OutputEncoding = [System.Text.Encoding]::Unicode
PS C:\Users\svgroot> wsl.exe --list -v | ForEach-Object { $_ }
  NAME            STATE           VERSION
* Ubuntu          Stopped         2
  Ubuntu-18.04    Stopped         2
  Alpine          Stopped         1

Actual behavior

PS C:\Users\svgroot> [Console]::OutputEncoding = [System.Text.Encoding]::Unicode
PS C:\Users\svgroot> wsl.exe --list -v | ForEach-Object { $_ }
    N A M E                         S T A T E                       V E R S I O N

 *   U b u n t u                     S t o p p e d                   2

     U b u n t u - 1 8 . 0 4         S t o p p e d                   2

     A l p i n e                     S t o p p e d                   1

Environment data

Name                           Value
----                           -----
PSVersion                      6.2.3
PSEdition                      Core
GitCommitId                    6.2.3
OS                             Microsoft Windows 10.0.19001
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0
@SvenGroot SvenGroot added the Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a label Oct 14, 2019
@mklement0
Copy link
Contributor

That's not good - the bug is still present as of PowerShell Core 7.0.0-preview.4.

Here's a repro that doesn't require WSL:

[Console]::OutputEncoding = [text.encoding]::unicode; sfc /? | Write-Output

As a Pester test:

[Console]::OutputEncoding = [text.encoding]::unicode; sfc /? | Write-Output | Should -Not -Match "`0"

@vexx32
Copy link
Collaborator

vexx32 commented Oct 16, 2019

Has this ever worked in PowerShell Core, or has it been broken ever since 6.0.0?

@mklement0
Copy link
Contributor

@vexx32: It's also broken in 6.0.0.

@0xd4d
Copy link

0xd4d commented Oct 16, 2019

This is due to a breaking change in .NET Core. You should initialize ProcessStartInfo.StandardInputEncoding/StandardErrorEncoding/StandardOutputEncoding if they're redirected. .NET Framework defaults to using Console.OutputEncoding if you don't initialize StandardOutputEncoding, but .NET Core defaults to calling Process.GetEncoding((int)Interop.Kernel32.GetConsoleOutputCP()) which is UTF8 (on my system).

This is the code that creates ProcessStartInfo:

https://github.com/PowerShell/PowerShell/blob/master/src/System.Management.Automation/engine/NativeCommandProcessor.cs#L1088-L1150

@vexx32
Copy link
Collaborator

vexx32 commented Oct 16, 2019

If I understand correctly, then, a fix should be to set the ProcessStartInfo.StandardInput(/Output)Encoding to match [console]::Input(/Output)Encoding values explicitly?

Should this respect [console] encoding settings, or $OutputEncoding? From what I recall, those values don't always align, if I'm not mistaken?

@0xd4d
Copy link

0xd4d commented Oct 16, 2019

I don't know if it should use $OutputEncoding or Console.OutputEncoding, but the code would be something like this:

        bool redirectStdOut = true;
        bool redirectStdErr = true;
        bool redirectStdIn = false;
        var startInfo = new ProcessStartInfo();
        if (redirectStdOut)
        {
            startInfo.RedirectStandardOutput = true;
            startInfo.StandardOutputEncoding = Console.OutputEncoding;
        }
        if (redirectStdErr)
        {
            startInfo.RedirectStandardError = true;
            startInfo.StandardErrorEncoding = Console.OutputEncoding;
        }
        if (redirectStdIn)
        {
            startInfo.RedirectStandardInput = true;
            startInfo.StandardInputEncoding = Console.InputEncoding;
        }

@0xd4d
Copy link

0xd4d commented Oct 16, 2019

Actually to match PS 5.1 behavior it should not use $OutputEncoding

@mklement0
Copy link
Contributor

mklement0 commented Oct 16, 2019

Agreed, @0xd4d: I don't know how .StandardInput comes into play, but on the output side It should definitely be [Console]::OutputEncoding, because that is how it has always worked in Windows PowerShell, where it determines how PowerShell decodes stream output from external programs.

$OutputEncoding controls what encoding is used to send data from Powershell to external programs, via a pipe. It defaults to UTF-8 in PSCore and to ASCII(!) in WinPS. In either edition it can differ from [Console]::OutputEncoding.

@SvenGroot
Copy link
Author

Thanks for looking into this, everyone. Hopefully this can get fixed soon.

@jszabo98
Copy link

Yep. Note that C:\Windows\system32\sfc.exe in Windows 10 outputs utf-16. It's a powershell question that comes up occasionally.

@vexx32
Copy link
Collaborator

vexx32 commented Oct 17, 2019

@mklement0 I guess that would mean .StandardInputEncoding should match $OutputEncoding, then? 🤔 On the assumption that we may be piping into such a command as well.

@SteveL-MSFT SteveL-MSFT added this to To do in Shell via automation Oct 17, 2019
@mklement0
Copy link
Contributor

@vexx32 I've only glanced at the code, and I see that the pipe that is connected to the child process' stdin explicitly uses $OutputEncoding:

Encoding pipeEncoding = _command.Context.GetVariableValue(SpecialVariables.OutputEncodingVarPath) as System.Text.Encoding ??
Utils.utf8NoBom;
_streamWriter = new StreamWriter(process.StandardInput.BaseStream, pipeEncoding);
_streamWriter.AutoFlush = true;

I don't fully understand how that relates to the default .StandardInput encoding - it looks like it may override it.

@SteveL-MSFT SteveL-MSFT added this to the 7.0-Consider milestone Oct 17, 2019
@SteveL-MSFT SteveL-MSFT self-assigned this Oct 17, 2019
@SteveL-MSFT
Copy link
Member

@SvenGroot do you have any examples on the input side where we have issues with encoding?

@SvenGroot
Copy link
Author

@SteveL-MSFT No, I only use OutputEncoding in my scenario.

@mklement0
Copy link
Contributor

mklement0 commented Oct 17, 2019

Here's my guess as to what we should do:

  • When piping data from PowerShell to an external process, it is $OutputEncoding that already drives the standard input encoding for the child process (no change there - this was never broken).

  • When not piping (starting an interactive console application, for instance), i.e. when stdin is not redirected, we should set .StandardInput to [Console]::InputEncoding.

  • Whether redirected or not, .StandardOutput should always be set to [Console]::OutputEncoding

@SteveL-MSFT
Copy link
Member

@mklement0 when not piping, what is value of setting .StandardInput to any encoding? For your 3rd bullet, I believe you meant [Console]::OutputEncoding. For my PR, I'm focusing on output only unless someone brings a case where input encoding is a problem.

@mklement0
Copy link
Contributor

mklement0 commented Oct 17, 2019

@SteveL-MSFT: Thanks for the correction re 3rd bullet point - I've fixed my previous comment.

what is value of setting .StandardInput to any encoding?

My thinking is: An interactive console application that reads from stdin probably expects the console's (terminal's) input encoding to be in effect (that's presumably how it works in Windows PowerShell).

Shell automation moved this from To do to Done Oct 19, 2019
@mklement0
Copy link
Contributor

Glad to see this was fixed for .StandardOutput.

As for setting .StandardInput to [Console]::InputEncoding: please see #10907, @SteveL-MSFT.

@iSazonov iSazonov added the Resolution-Fixed The issue is fixed. label Oct 27, 2019
@ghost
Copy link

ghost commented Nov 21, 2019

🎉This issue was addressed in #10824, which has now been successfully released as v7.0.0-preview.6.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a Resolution-Fixed The issue is fixed.
Projects
Shell
  
Done
Development

Successfully merging a pull request may close this issue.

6 participants