Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to setup "ProcessStartInfo.StandardOutputEncoding" #63

Closed
bernd5 opened this issue May 10, 2020 · 11 comments
Closed

How to setup "ProcessStartInfo.StandardOutputEncoding" #63

bernd5 opened this issue May 10, 2020 · 11 comments
Labels

Comments

@bernd5
Copy link

bernd5 commented May 10, 2020

I would like to start a CSharp app via cmd which prints UTF-8 Text.
To work correctly I need to execute chcp 65001 first or start it with a non ANS OutputEncoding.

This would be possible by setting StandardOutputEncoding of the class ProcessStartInfo.
The construction of this class is unfortunately handled in a private method. Could you make these internals somehow available? Best would be something like an "OnAdjustProcessStartInfo" event.

I have seen that you have an overload ExecuteAsync which affects the interpretation of the binary output... But it does not "inform" the application about that encoding.

@Tyrrrz
Copy link
Owner

Tyrrrz commented May 10, 2020

Depending on the execution model you're using, there should be overloads, for example:

// Treat both stdout and stderr as UTF8-encoded text streams
var result = await Cli.Wrap("path/to/exe")
    .WithArguments("--foo bar")
    .ExecuteBufferedAsync(Encoding.UTF8);

// Treat stdout as ASCII-encoded and stderr as UTF8-encoded
var result = await Cli.Wrap("path/to/exe")
    .WithArguments("--foo bar")
    .ExecuteBufferedAsync(Encoding.ASCII, Encoding.UTF8);

@Tyrrrz
Copy link
Owner

Tyrrrz commented May 10, 2020

There's no way to "inform" the application about the encoding. This is basically the same thing as setting StandardOutputEncoding.

@bernd5
Copy link
Author

bernd5 commented May 10, 2020

Unfortunately it is not the same. Usually an appliction asks the kernel which is the ConsoleOutput Encoding. This is done with the function:

UINT WINAPI GetConsoleOutputCP(void);

In my Environment this is code page 850. If you print something which simply does not exits in your code page - these characters are replaced with something like "?".

Have a look at System/ConsolePal.Windows. You can find the call at line 112.

This function must return 65001 to produce UTF-8.

@Tyrrrz
Copy link
Owner

Tyrrrz commented May 10, 2020

As far as I know, it just defaults to GetConsoleOutputCP but then you can set it to anything you want: https://docs.microsoft.com/en-us/dotnet/api/system.console.outputencoding?view=netcore-3.1

On the side of System.Diagnostics.Process, GetConsoleOutputCP is also used only as the default value for OutputEncoding which could be overriden:

https://github.com/dotnet/runtime/blob/59a24e56e70f018eec13bc82d43f485b039fe2b4/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Windows.cs#L640-L644

@bernd5
Copy link
Author

bernd5 commented May 10, 2020

Unfortunately I believe you are right, sorry.
The problem is just that Windows applications usually don't produce UTF-8 output.
=> Calling ExecuteBufferedAsync(Encoding.UTF8, Encoding.UTF8) is not really helpful.

Currently I create a CLI object / Command with the path to the exe. I could instead call cmd.exe and execute chcp 65001 followed by the real exe. But that seems to be not really nice.

The app I want to start is a dotnet exe. Take for example such a simple app:

using System;

namespace Tests
{
    public static class UnicodeSample
    {
        public static void Main()
        {
            System.Console.WriteLine("Some text with non - ascii: Российская Федерация!");
        }
    }
}

How can I call it with CliWrap?

@Tyrrrz
Copy link
Owner

Tyrrrz commented May 10, 2020

I'm not sure how to re-implement chcp in code (to be honest, I have never used it so not sure how it works).

I would assume that if in your app you do Console.OutputEncoding = Encoding.UTF8 first it should work. In worst case, you can always write the data directly to the stream via Console.OpenStandardOutput(). That, of course, assumes that you control the target executable.

If you don't control the target executable, and it writes unicode characters without actually changing the encoding to unicode, I'm not sure there's anything that can be done, short of re-implementing chcp in C# somehow.

@bernd5
Copy link
Author

bernd5 commented May 10, 2020

Thanks for your answer.

As far as I understand CliWrap does it via redirection. To inpret the received data it uses these encoding-settings -> totaly fine.
The problem is the OutputWriter which causes a conversion to an OutputEncoding, see here. Here we can loose data.

If Console.OutputEncoding = Encoding.UTF8 would be executed than it should behave equally (UTF-16 -> UTF-8) to Chcp which seems to set the encoding for the current console window.

Unicode handling on windows is really a mess.

@Tyrrrz
Copy link
Owner

Tyrrrz commented May 10, 2020

The problem is the OutputWriter which causes a conversion to an OutputEncoding, see here. Here we can loose data.

Yeah. The only way to work around it (besides changing OutputEncoding to UTF8) is to write the raw data to the standard stream directly. Then the app won't care about what encoding is set wherever.

If Console.OutputEncoding = Encoding.UTF8 would be executed than it should behave equally (UTF-16 -> UTF-8) to Chcp which seems to set the encoding for the current console window.

That seems to make sense. But it has to be done by the target executable.

Do you control the actual target executable you're trying to run with CliWrap?

@bernd5
Copy link
Author

bernd5 commented May 10, 2020

I can't change the code of the "slave-apps".

I "solved" it now with the following code:

var cli = Cli.Wrap("cmd").WithArguments($"/c chcp 65001 > null && {appPath} {arguments}")

@Tyrrrz
Copy link
Owner

Tyrrrz commented May 10, 2020

I also looked into WinAPI, there seems to be no way of changing the character page of another process, only the calling one. So it appears the chcp workaround is really the only way.

@bernd5
Copy link
Author

bernd5 commented May 11, 2020

Anyway, thanks for your support. Perhaps it is something for the docs (especially for the ExecuteBufferedAsync method with Encoding parameters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants