Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using .NET Framework instead of .NET Core, there are problems with stdin encoding #215

Closed
prcdpr opened this issue Aug 4, 2023 · 10 comments
Labels

Comments

@prcdpr
Copy link

prcdpr commented Aug 4, 2023

Version

3.6.4

Details

Consider this minimal repro (it runs on Windows with Powershell 7 installed):

using System.Text;
using CliWrap;

var pwshScript = "Write-Host 'hello'";

var cmd = await Cli.Wrap(@"C:\Program Files\PowerShell\7\pwsh.exe")
    .WithStandardInputPipe(PipeSource.FromBytes(Encoding.UTF8.GetBytes(pwshScript)))
    .WithStandardOutputPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stdout> " + line)))
    .WithStandardErrorPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stderr> " + line)))
    .WithValidation(CommandResultValidation.None)
    .WithArguments(new[] { "-NoProfile", "-Command", "-" })
    .ExecuteAsync();

This code will either work or not, depending whether it runs in .NET Framework or .NET Core.

Steps to reproduce

  • Launch code in .NET Framework 4.8 or .NET Core 7, it works and script completes with the following output:
stdout> hello
  • In Windows region settings, enable UTF-8 support (previously it was disabled)
    image

Code still works in .NET Core 7 but it breaks in .NET Framework with the error:

stderr> Write-Host: The term 'Write-Host' is not recognized as a name of a cmdlet, function, script file, or executable program.
stderr> Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

If you look closely, in the output above there is an invisible character so Powershell thinks the script is

\ufeffWrite-Host 'hello'

Which causes it to fail. The problem occurs in combination of .NET Framework and UTF-8 support enabled.
When looking at byte representation of the string which is provided to PipeSource, there are no extra bytes in the beginning of the array.

@prcdpr prcdpr added the bug label Aug 4, 2023
@Tyrrrz
Copy link
Owner

Tyrrrz commented Aug 4, 2023

Is it because Encoding.UTF8.GetBytes(...) produces a preamble on .NET Framework? Is it just the first few characters (i.e. the BOM) that are causing this issue?

@prcdpr
Copy link
Author

prcdpr commented Aug 4, 2023

I don't think so.

As I mentioned in report, byte representation is free of any extra bytes and looks valid.

The following example also doesn't work (.NET Framework 4.8):

using System.Text;
using CliWrap;

var pwshScript = "Write-Host 'hello'";

var utf8NoBom = new UTF8Encoding(false);
var bytes = utf8NoBom.GetBytes(pwshScript);

var cmd = await Cli.Wrap(@"C:\Program Files\PowerShell\7\pwsh.exe")
    .WithStandardInputPipe(PipeSource.FromBytes(bytes))
    .WithStandardOutputPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stdout> " + line)))
    .WithStandardErrorPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stderr> " + line)))
    .WithValidation(CommandResultValidation.None)
    .WithArguments(new[] { "-NoProfile", "-Command", "-" })
    .ExecuteAsync();

image

@Tyrrrz
Copy link
Owner

Tyrrrz commented Aug 4, 2023

What if you do FromStream(new MemoryStream(bytes))?

@prcdpr
Copy link
Author

prcdpr commented Aug 4, 2023

Same error in .NET 4.8, works in .NET 7
image

@Tyrrrz
Copy link
Owner

Tyrrrz commented Aug 7, 2023

I reduced the repro to this snippet, which indicates that the bug is in .NET Framework's implementation of Process, rather than in CliWrap:

using var process = new Process
{
    StartInfo = new ProcessStartInfo
    {
        FileName = @"C:\Program Files\PowerShell\7\pwsh.exe",
        Arguments = "-NoProfile -Command -",
        RedirectStandardInput = true,
        RedirectStandardError = true,
        RedirectStandardOutput = true,
        UseShellExecute = false
    }
};

process.Start();

using (process.StandardInput)
    await process.StandardInput.WriteLineAsync("Write-Host 'Hello world!'");

var output = await process.StandardOutput.ReadToEndAsync();
var error = await process.StandardError.ReadToEndAsync();

await process.WaitForExitAsync();

error.Should().BeEmpty();
output.TrimEnd().Should().Be("Hello world!");

The aforementioned Windows feature is in beta after all, so it might not be supported by .NET Framework (yet).

I don't know what to suggest in this case, apart from raising the issue with Microsoft and hope they fix this issue in .NET Framework.

@Tyrrrz Tyrrrz added invalid and removed bug labels Aug 7, 2023
@Tyrrrz Tyrrrz closed this as completed Aug 7, 2023
@prcdpr
Copy link
Author

prcdpr commented Aug 8, 2023

Actually found a workaround - add

Console.InputEncoding = new UTF8Encoding(false);

before constructing and starting Process. I believe CliWrap should handle that by detecting current input encoding and changing to one without BOM, otherwise PipeSource.FromBytes isn't reliable at all and should throw an exception in Windows systems that have UTF-8 beta support enabled.

@Tyrrrz
Copy link
Owner

Tyrrrz commented Aug 8, 2023

Actually found a workaround - add

Console.InputEncoding = new UTF8Encoding(false);

before constructing and starting Process. I believe CliWrap should handle that by detecting current input encoding and changing to one without BOM, otherwise PipeSource.FromBytes isn't reliable at all and should throw an exception in Windows systems that have UTF-8 beta support enabled.

It's not about BOM, as we've identified earlier, since you provided the raw bytes you decoded yourself. I also did FromBytes("Write-Host 'Hello World'"u8.ToArray()) in my tests and the result was the same.

CliWrap shouldn't change the value of Console.InputEncoding as it's a global variable that has overreaching effects on other parts of the program.

@prcdpr
Copy link
Author

prcdpr commented Aug 8, 2023

It is about BOM but CliWrap makes developer think that PipeSource.FromBytes will provide clean byte input while it isn't true (if UTF-8 Beta is enabled and Console.InputEncoding is default UTF8 with BOM).

I agree that CliWrap shouldn't change Console.InputEncoding.
However I also believe that CliWrap shouldn't silently accept PipeSource.FromBytes in this scenario.

If:

  • runtime is NET Framework
  • UTF8 Beta is enabled in Windows
  • Console.InputEncoding is set to default UTF-8 with BOM
  • user provides PipeSource.FromBytes,

an exception with useful error message should be thrown because CliWrap cannot guarantee that stdin won't contain garbage bytes. Currently developer is led to believe that CliWrap's PipeSource.FromBytes is encoding agnostic while it isn't true.
Yes, this is Microsoft bug, but a good error message encouraging developer to change Console.InputEncoding will reduce a lot of headache and possibly further bug reports.

This sample doesn't work in .NET 4.8:

Console.InputEncoding = Encoding.UTF8;

using var process = new Process
{
    StartInfo = new ProcessStartInfo
    {
        FileName = @"C:\Program Files\PowerShell\7\pwsh.exe",
        Arguments = "-NoProfile -Command -",
        RedirectStandardInput = true,
        RedirectStandardError = true,
        RedirectStandardOutput = true,
        UseShellExecute = false,
    }
};

process.Start();

using (process.StandardInput)
{
    await process.StandardInput.WriteLineAsync("Write-Host 'Hello world!'");
}

var output = await process.StandardOutput.ReadToEndAsync();
var error = await process.StandardError.ReadToEndAsync();

process.WaitForExit();

And after changing encoding to

Console.InputEncoding = new UTF8Encoding(false);

it starts working.

@prcdpr
Copy link
Author

prcdpr commented Aug 8, 2023

Also this works when InputEncoding is changed:

Console.InputEncoding = new UTF8Encoding(false);

var pwshScript = "Write-Host 'hello'";

var utf8NoBom = new UTF8Encoding(false);
var bytes = utf8NoBom.GetBytes(pwshScript);

var cmd = await Cli.Wrap(@"C:\Program Files\PowerShell\7\pwsh.exe")
    .WithStandardInputPipe(PipeSource.FromStream(new MemoryStream(bytes)))
    .WithStandardOutputPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stdout> " + line)))
    .WithStandardErrorPipe(PipeTarget.ToDelegate(line => Console.WriteLine("stderr> " + line)))
    .WithValidation(CommandResultValidation.None)
    .WithArguments(new[] { "-NoProfile", "-Command", "-" })
    .ExecuteAsync();

It's definitely a problem that developer thinks they provide clean byte input but it's actually input encoding dependent. Even if CliWrap shouldn't change global settings, at least an useful error message should be thrown.

@Tyrrrz
Copy link
Owner

Tyrrrz commented Aug 8, 2023

It's definitely a problem that developer thinks they provide clean byte input but it's actually input encoding dependent. Even if CliWrap shouldn't change global settings, at least an useful error message should be thrown.

The issue is that we can't detect when the bug manifests on our side, because the extra bytes are added when they're written to the process's stdin. On CliWrap's side, the byte sequence is exactly what it's supposed to be (without any extra data).

If we just assume that if that Windows option is enabled, and the Console.InputEncoding contains a preamble, we might easily start throwing false positives. For example, if (or really, when) Microsoft fixes that bug in .NET Framework, CliWrap will continue throwing exceptions until a new version is released. Note that "fixing the bug" does not necessarily entail changing Console.InputEncoding but, for example, just stripping the BOM when it's written to the child process.

If there were a way to reliably identify that the byte sequence is corrupted by the time it gets to the child process, I'd be happy to add a corresponding exception. But I think the proposed heuristics are not deterministic enough.

Ultimately, you are using a beta (read: unstable) Windows feature on an old and largely unmaintained platform, which is .NET Framework. Combining cutting edge tech with legacy tech is bound to cause some issues eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants