Change $OutputEncoding to use iso-8859-1 encoding rather than ASCII #5361

JamesWTruher · 2017-11-07T00:31:04Z

This enables a number of useful scenarios.
With this change it is now possible to do the following:

get-content -raw -encoding iso8859 /tmp/archive.tgz | gunzip | tar xvf -

and expand a compressed tar archive.
It also ensures that the bytes/characters passed to a native executable are not changed by the pipeline.
What is passed into the pipeline is passed to the next native application

This does not address #1908, as that requires a larger re-architecture of the pipeline. This PR continues to have a [environment]::newline which is added to the native output which should be addressed when 1908 is fixed.

This is marked as a breaking change because the output of the pipeline is not altered as it was previously, anyone relying on ascii encoding (especially for extended ascii sequences) will no longer have that behavior (broken though it was).

This enables a number of useful scenarios. With this change it is now possible to do the following: ```powershell get-content -raw -encoding iso8859 /tmp/archive.tgz | gunzip | tar xvf - ``` and expand a compressed tar archive. It also ensures that the bytes/characters passed to a native executable are not changed by the pipeline. What is passed to the pipeline is passed to the next native application.

daxian-dbw

@JamesWTruher Per our discussion, you are going to add newlines when reading the redirected output from a native command. I don't see that change in this PR. Will that be a separate PR?

daxian-dbw · 2017-11-07T00:37:36Z

src/System.Management.Automation/engine/NativeCommandProcessor.cs

@@ -1797,8 +1797,7 @@ internal void Start(Process process, NativeCommandIOFormat inputFormat)
            //Get the encoding for writing to native command. Note we get the Encoding
            //from the current scope so a script or function can use a different encoding
            //than global value.
-            Encoding pipeEncoding = _command.Context.GetVariableValue(SpecialVariables.OutputEncodingVarPath) as System.Text.Encoding ??
-                                    Encoding.ASCII;
+            Encoding pipeEncoding = _command.Context.GetVariableValue(SpecialVariables.OutputEncodingVarPath) as System.Text.Encoding ??  Utils.iso8859;


Minor comment: there is an extra space before Utils.iso8859.

daxian-dbw · 2017-11-07T00:39:15Z

src/System.Management.Automation/engine/InitialSessionState.cs

@@ -4410,7 +4410,7 @@ .ForwardHelpCategory Cmdlet
            // Variable which controls the encoding for piping data to a NativeCommand


It would be great if you can add more comments about the characteristics of this encoding and mention that's why we use it as the default output encoding.

daxian-dbw · 2017-11-07T00:59:15Z

test/tools/TestExe/TestExe.cs

+                encodingToUse = Encoding.GetEncoding(28591);
+            }
+            Console.InputEncoding = encodingToUse;
+            string pipedText = Console.In.ReadToEnd();


Maybe we should just spit out the characters read in without having the encoding involved? For example, like this:

int ch; while ((ch = Console.In.Read()) != -1) { Console.WriteLine(ch); }

I like that much better, thanks
updated

SteveL-MSFT · 2017-11-07T01:00:54Z

src/System.Management.Automation/engine/Utils.cs

-            new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
+        internal static readonly UTF8Encoding utf8NoBom = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
+        // encoding which is essentially a passthru
+        internal static readonly Encoding iso8859 = Encoding.GetEncoding(28591);


You should use the overload that takes a string instead:

GetEncoding("iso-8859-1");

SteveL-MSFT · 2017-11-07T01:04:04Z

test/tools/TestExe/TestExe.cs

+        {
+            // write a string of characters (bytes), this closest resembles a binary
+            // write the string "test" with an accent over the e
+            byte[] testbytes = new byte[] { 116, 233, 115, 116 };


Why not just write all the bytes from 0 to 255?

no need
[char]233 is all that is needed to show the problem, and is readable from the command line if you run it.

SteveL-MSFT · 2017-11-07T01:06:36Z

test/tools/TestExe/TestExe.cs

+                    }
+                    catch
+                    {
+                        ;


Seems like you should fail this and return non-zero exit code if invalid encoding is provided. Also suggest using the string overload version instead of the int making the code easier to read.

i've replaced this with something far simpler (see dongbo's suggestion above)

SteveL-MSFT · 2017-11-07T01:06:57Z

test/tools/TestExe/TestExe.cs

+            }
+            if ( encodingToUse == null )
+            {
+                encodingToUse = Encoding.GetEncoding(28591);


Use GetEncoding("iso-8859-1")

mklement0 · 2017-11-07T03:24:33Z

This PR is fundamentally misguided, as discussed here.

SteveL-MSFT · 2017-11-07T06:57:44Z

Based on the discussion @mklement0 linked, it seems that for console text output, we should default to UTF-8. Binary pipeline is something we'll have to defer to later.

mklement0 · 2017-11-07T10:27:29Z

@SteveL-MSFT: That's a great summary - thank you.

JamesWTruher · 2017-11-07T19:02:21Z

closing in favor of a new PR which simply changes the output encoding to utf8nobom.

JamesWTruher added the Breaking-Change breaking change that may affect users label Nov 7, 2017

JamesWTruher assigned daxian-dbw Nov 7, 2017

JamesWTruher requested review from daxian-dbw, SteveL-MSFT, adityapatwardhan and PaulHigin November 7, 2017 00:31

JamesWTruher requested review from BrucePay, dantraMSFT, lzybkr, mirichmo and TravisEz13 as code owners November 7, 2017 00:31

daxian-dbw reviewed Nov 7, 2017

View reviewed changes

daxian-dbw added the Documentation Needed in this repo Documentation is needed in this repo label Nov 7, 2017

SteveL-MSFT requested changes Nov 7, 2017

View reviewed changes

SteveL-MSFT added this to the 6.0.0-RC milestone Nov 7, 2017

JamesWTruher closed this Nov 7, 2017

joeyaiello mentioned this pull request Oct 15, 2018

Docs needed for 'Change $OutputEncoding to use iso-8859-1 encoding rather than ASCII' MicrosoftDocs/PowerShell-Docs#3066

Closed

joeyaiello removed Documentation Needed in this repo Documentation is needed in this repo labels Oct 15, 2018

JamesWTruher deleted the jameswtruher/outputencoding02 branch May 11, 2022 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change $OutputEncoding to use iso-8859-1 encoding rather than ASCII #5361

Change $OutputEncoding to use iso-8859-1 encoding rather than ASCII #5361

JamesWTruher commented Nov 7, 2017

daxian-dbw left a comment

daxian-dbw Nov 7, 2017

JamesWTruher Nov 7, 2017

daxian-dbw Nov 7, 2017

daxian-dbw Nov 7, 2017

JamesWTruher Nov 7, 2017

SteveL-MSFT Nov 7, 2017

JamesWTruher Nov 7, 2017

SteveL-MSFT Nov 7, 2017

JamesWTruher Nov 7, 2017

SteveL-MSFT Nov 7, 2017

JamesWTruher Nov 7, 2017

SteveL-MSFT Nov 7, 2017

mklement0 commented Nov 7, 2017

SteveL-MSFT commented Nov 7, 2017

mklement0 commented Nov 7, 2017

JamesWTruher commented Nov 7, 2017

		@@ -4410,7 +4410,7 @@ .ForwardHelpCategory Cmdlet
		// Variable which controls the encoding for piping data to a NativeCommand

Change $OutputEncoding to use iso-8859-1 encoding rather than ASCII #5361

Change $OutputEncoding to use iso-8859-1 encoding rather than ASCII #5361

Conversation

JamesWTruher commented Nov 7, 2017

daxian-dbw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mklement0 commented Nov 7, 2017

SteveL-MSFT commented Nov 7, 2017

mklement0 commented Nov 7, 2017

JamesWTruher commented Nov 7, 2017