Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Default/OEM encoding behavior PowerShell Core #3467

Merged
merged 8 commits into from Apr 6, 2017

Conversation

Projects
None yet
7 participants
@iSazonov
Copy link
Collaborator

commented Mar 31, 2017

Based on conclusion in discussion #3248

  • - OEM

We need internal fix because .Net Core haven't "OEM" term.
The fix should provide PowerShell Core behavior:

  • on Windows - as Windows PowerShell based on GetOEMCP() and legacy encodings (System.Text.Encoding.CodePages)

  • on Unix - the same as default encoding in PowerShell Core

  • - Default

We need internal fix because CoreFX default for all platforms is UTF8
Until we have completed Encoding RFC we should restore behaivor as in Windows PowerShell.
The fix should provide PowerShell Core behavior:

  • on Windows - as Windows PowerShell based on GetACP() and legacy encodings (System.Text.Encoding.CodePages)

  • on Unix - UTF-8 without BOM (We are expecting that the same will be in CoreFX)

  • - Tests

Now we don't have a full set of encoding tests (only for redirections). I believe we need to create them during future RFC implementation, not now.

@iSazonov iSazonov requested a review from daxian-dbw Mar 31, 2017

@iSazonov iSazonov force-pushed the iSazonov:fixdefaultencoding branch to 9580b4a Mar 31, 2017

@iSazonov

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 31, 2017

I was able to build locally with
<PackageReference Include="System.Text.Encodings.Web" Version="4.0.11" />
in Microsoft.PowerShell.SDK.csproj
but on CI build failed 😕 I revert to Version="4.3.0" but build failed again.
@daxian-dbw Could you please help?

@@ -7569,8 +7569,7 @@ private static Encoding GetEncodingFromEnum(FileSystemCmdletProviderEncoding typ

case FileSystemCmdletProviderEncoding.Oem:
{
uint oemCP = NativeMethods.GetOEMCP();
encoding = System.Text.Encoding.GetEncoding((int)oemCP);
encoding = ClrFacade.GetOEMEncoding();

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

The NativeMethods.GetOEMCP in this file can be removed if it's no longer used.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

It is used in ClrFacade.GetOEMEncoding

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

ClrFacade has its own copy of GetOEMEncoding, so the one in FileSystemProvider.cs can be removed.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

#if CORECLR

#if UNUX // PowerShell Core on Unix
s_defaultEncoding = Encoding.GetEncoding(65001);

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Why not using Encoding.UTF8? Isn't 65001 the same as utf8?

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

#if UNUX // PowerShell Core on Unix
s_defaultEncoding = Encoding.GetEncoding(65001);
#else // PowerShell Core on Windows
uint aCp = NativeMethods.GetACP();

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Maybe rename the variable to currentAnsiCp? It's helpful to know what GetACP does.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

// We will revisit this if it causes any failures when running tests on Core PS.
s_defaultEncoding = Encoding.GetEncoding(28591);
#else
#if CORECLR

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

The if/def pattern used in other places is

#if UNIX
// unix behavior
#elif CORECLR
// win-core behavior
#else
// win-full behavior
#endif

Could you please follow the same?

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

uint aCp = NativeMethods.GetACP();
if (s_oemEncoding == null)
{
Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Is it better to separate this into a method to reduce code duplication?

@@ -17,6 +17,7 @@ internal static class PinvokeDllNames
internal const string QueryDosDeviceDllName = "api-ms-win-core-file-l1-1-0.dll"; /*1*/
internal const string CreateSymbolicLinkDllName = "api-ms-win-core-file-l2-1-0.dll"; /*2*/
internal const string GetOEMCPDllName = "api-ms-win-core-localization-l1-2-0.dll"; /*3*/
internal const string GetACPDllName = "api-ms-win-core-localization-l1-2-0.dll"; /*3*/

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

The /*3*/ at the end is important to track if there is a mismatch. Please add this API to the end of the list and use an incremented number.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

@@ -139,6 +140,7 @@ internal static class PinvokeDllNames
internal const string QueryDosDeviceDllName = "kernel32.dll"; /*1*/
internal const string CreateSymbolicLinkDllName = "kernel32.dll"; /*2*/
internal const string GetOEMCPDllName = "kernel32.dll"; /*3*/
internal const string GetACPDllName = "kernel32.dll"; /*3*/

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

same here.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

@@ -555,10 +564,20 @@ internal static Encoding GetOEMEncoding()
{
if (s_oemEncoding == null)
{
#if CORECLR // The OEM code page '437' is not supported by CoreCLR.
// Use the default encoding (ISO-8859-1, CodePage 28591) as the OEM encoding in OneCore powershell.
#if CORECLR

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Same here regarding the pattern.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

uint oemCp = NativeMethods.GetOEMCP();
if (s_defaultEncoding == null)
{
Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

same here regarding the code duplication.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

@daxian-dbw daxian-dbw requested a review from mirichmo Mar 31, 2017

@daxian-dbw

This comment has been minimized.

Copy link
Member

commented Mar 31, 2017

Add @mirichmo to review the new windows API set added to PInvokeDllNames.cs, to see if that's fine for downlevel machines like win7.

@iSazonov I don't see you change the .csproj file. Did you miss that?

@daxian-dbw

This comment has been minimized.

Copy link
Member

commented Mar 31, 2017

<PackageReference Include="System.Text.Encodings.Web" Version="4.0.11" />

Why are you using System.Text.Encoding.Web? The CodePagesEncodingProvider type is available in System.Text.Encoding.CodePages

iSazonov added some commits Mar 31, 2017

@@ -85,7 +85,7 @@
<PackageReference Include="System.ServiceModel.Primitives" Version="4.3.0" />
<PackageReference Include="System.ServiceModel.Security" Version="4.3.0" />
<PackageReference Include="System.ServiceProcess.ServiceController" Version="4.3.0" />
<PackageReference Include="System.Text.Encoding.CodePages" Version="4.3.0" />
<PackageReference Include="System.Text.Encoding.CodePages" Version="4.0.11" />

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Why change it to 4.0.11? I don't even find such a version on nuget.org

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

With 4.0.11 I can compile locally on my computer. Now I changed to 4.0.1 but failed too.

EncodingRegisterProvider()

uint currentAnsiCp = NativeMethods.GetACP();
s_defaultEncoding = Encoding.GetEncoding((int)currentAnsiCp);

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Encoding.Default is coming back in netstandard2.0, are we going to change to it at that point?

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

We have RFC for that. No we only recover "Windows PowerShell" behavior.

@@ -530,17 +530,21 @@ internal static object[] GetCustomAttributes<T>(Assembly assembly)
#region Encoding

/// <summary>
/// Facade for Encoding.Default
/// Facade for getting Encoding.Default

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Depending on what we will do when Encoding.Default is back in netstandard2.0. If we are not going to change to Encoding.Default in powershell core, then the comment should be Facade for getting default encoding.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

I agree.


uint currentAnsiCp = NativeMethods.GetACP();
s_defaultEncoding = Encoding.GetEncoding((int)currentAnsiCp);

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Please remove the extra blank line.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

uint oemCp = NativeMethods.GetOEMCP();
s_oemEncoding = Encoding.GetEncoding((int)oemCp);

#else // Windows PowerShell
uint oemCp = NativeMethods.GetOEMCP();
s_oemEncoding = Encoding.GetEncoding((int)oemCp);
#endif

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

I didn't notice the duplicate code between coreclr and fullclr. In this case, maybe the following is better?

#if UNIX
         s_oemEncoding = GetDefaultEncoding();
#else
#if CORECLR
         EncodingRegisterProvider()
#endif
         uint oemCp = NativeMethods.GetOEMCP();
         s_oemEncoding = Encoding.GetEncoding((int)oemCp);
#end

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

I tried - it is difficult to read. In compile time no duplicate code is. So maybe leave as is?

uint oemCp = NativeMethods.GetOEMCP();
s_oemEncoding = Encoding.GetEncoding((int)oemCp);
#endif
}
return s_oemEncoding;
}

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Remove extra line, or move private static volatile Encoding s_defaultEncoding; here as well.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Moved.

private static volatile Encoding s_oemEncoding;

private static EncodingRegisterProvider()

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

This method should be enclosed with #if CoreCLR, as it won't build for fullclr

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done

{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
}

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

extra line

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Done.

{
if (s_defaultEncoding == null && s_oemEncoding == null)
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

It's possible this line will be called more than once when there are multiple runspace. Will it fail at the second call?

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

First I put the call in GetDefaultEncoding and GetOEMEncoding without "if" and second call was well.

@daxian-dbw daxian-dbw requested a review from JamesWTruher Mar 31, 2017

@iSazonov

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 31, 2017

@daxian-dbw
I change CodePages version on 4.0.11 but CodePagesEncodingProvider cannot be finded. 😕

@daxian-dbw

This comment has been minimized.

Copy link
Member

commented Mar 31, 2017

@JamesWTruher @mklement0, you have been closely following up with this issue (and related), could you please take a look at the fix?

s_defaultEncoding = Encoding.GetEncoding(28591);
#else
#if UNUX // PowerShell Core on Unix
s_defaultEncoding = Encoding.UTF8;

This comment has been minimized.

Copy link
@mklement0

mklement0 Mar 31, 2017

Contributor

I think it's a mistake to default to a UTF-8 encoding with BOM.
While .NET Core, at least currently, also reports the same BOM-based encoding - unofficially in Encoding.Default (not part of the contract yet) and officially in Encoding.GetDefaultEncoding(0) (and I consider that misguided too) -
at least it doesn't govern the actual default behavior in the sense of what happens if no encoding is specified. The methods of type System.IO.File default to BOM-less UTF-8 - both in .NET Framework since its inception and in .NET Core.
It think it's imperative that we use a BOM-less UTF-8 encoding in the Unix world (and, I would argue, even make that the default on Windows (Core) as well, with opt-in to legacy behavior).

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Fixed.

@@ -85,7 +85,7 @@
<PackageReference Include="System.ServiceModel.Primitives" Version="4.3.0" />
<PackageReference Include="System.ServiceModel.Security" Version="4.3.0" />
<PackageReference Include="System.ServiceProcess.ServiceController" Version="4.3.0" />
<PackageReference Include="System.Text.Encoding.CodePages" Version="4.3.0" />
<PackageReference Include="System.Text.Encoding.CodePages" Version="4.0.1" />

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Mar 31, 2017

Member

Why not use 4.3.0? The type System.Text.CodePagesEncodingProvider is available in that package.
You need to update SMA.csproj to add this package reference since that's the project depending on it. Then you can remove this entry from Microsoft.PowerShell.SDK.csproj.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Mar 31, 2017

Author Collaborator

Many thanks! It is solved the problem and PowerShell is builded well.

But currently Travis is failed on test phase with

Import-Module : Unable to load DLL 'api-ms-win-core-localization-l1-2-0.dll'

Could you comment please?

This comment has been minimized.

Copy link
@daxian-dbw

daxian-dbw Apr 1, 2017

Member

#if UNUX

As Steve pointed out, this is why 😄

iSazonov added some commits Mar 31, 2017

@mklement0

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2017

@daxian-dbw, @JamesWTruher : I know this debate is cumbersome, but I think it's a very important one to have, and I don't think we're ready to move ahead yet - I've just added further thoughts here.

// We will revisit this if it causes any failures when running tests on Core PS.
s_defaultEncoding = Encoding.GetEncoding(28591);
#else
#if UNUX // PowerShell Core on Unix

This comment has been minimized.

Copy link
@SteveL-MSFT

SteveL-MSFT Mar 31, 2017

Member

shouldn't this be UNIX?

This comment has been minimized.

Copy link
@iSazonov

iSazonov Apr 1, 2017

Author Collaborator

Oh... fixed.

@@ -135,6 +135,7 @@ internal static class PinvokeDllNames
internal const string CreateToolhelp32SnapshotDllName = "api-ms-win-core-toolhelp-l1-1-0"; /*120*/
internal const string Process32FirstDllName = "api-ms-win-core-toolhelp-l1-1-0"; /*121*/
internal const string Process32NextDllName = "api-ms-win-core-toolhelp-l1-1-0"; /*122*/
internal const string GetACPDllName = "api-ms-win-core-localization-l1-2-0.dll"; /*123*/

This comment has been minimized.

Copy link
@mirichmo

mirichmo Apr 1, 2017

Member

The API set matches the API on my laptop. I don't have access to a Nano VM at the moment to verify it there. On Nano, it must match the exporting API set exactly. You can verify it with dumpbin.exe /exports <path to api set>

This comment has been minimized.

Copy link
@iSazonov

iSazonov Apr 1, 2017

Author Collaborator

Now I don't run still dumpbin.exe /exports
MSDN docs say about:

What version we should use?

This comment has been minimized.

Copy link
@iSazonov

iSazonov Apr 3, 2017

Author Collaborator

GetACP is in api-ms-win-core-localization-l1-2-0.dll of runtime.win7-x64.Microsoft.NETCore.Windows.ApiSets dump.
But It seems Nano use runtime.win10-x64.Microsoft.NETCore.Windows.ApiSets

This comment has been minimized.

Copy link
@mirichmo

mirichmo Apr 3, 2017

Member

It is correct "as-is" here. No change needed.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Apr 3, 2017

Author Collaborator

Thanks! Clear.
Is Nano still under question?

This comment has been minimized.

Copy link
@mirichmo

mirichmo Apr 3, 2017

Member

No. Nano Server contains the 1-2-0 version of the API set

@iSazonov iSazonov force-pushed the iSazonov:fixdefaultencoding branch to 0f04279 Apr 1, 2017

@iSazonov

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 1, 2017

@daxian-dbw @SteveL-MSFT CI have passed - many thanks! Sorry for many typos (make the PR in the weekend night was bad my idea). Please continue code review.

@mklement0 Can you build the branch and check our expectations? If no, could you please attach files with your tests?

@mklement0

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2017

@iSazonov: I built on my macOS 10.12 machine and I can confirm that the following ad-hoc tests return true on macOS (and, judging by the source, therefore all Unix platforms), as desired:

# Setup: *create* a  BOM-less UTF-8 file.
'ö' | set-content -nonewline /tmp/$pid.txt

# Tests: Both should output $True

# Compare the raw bytes of the new file to the UTF-8 encoding of 'ö' (0xc3 0xb6)
# With the current alpha17, this would return $False, because Set-Content creates an
# ISO-8859-1 file.
$null -eq (Compare-Object (Get-Content -Encoding Byte -Raw /tmp/$pid.txt) (0xc3, 0xb6))

# See if the BOM-less UTF-8 file is *read* correctly.
'ö' -eq (Get-Content -Raw /tmp/$pid.txt)

(I wasn't able to run the tests - the powershell binary crashed at the following point:

Describing New-PSSession basic test
 [-] New-PSSession should not crash powershell 228ms
   Expected string length 66 but was 77. Strings differ at index 0.
   Expected: {InvalidOperation,Microsoft.PowerShell.Commands.NewPSSessionCommand}
   But was:  {System.DllNotFoundException,Microsoft.PowerShell.Commands.NewPSSessionCommand}
...

)

@iSazonov

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 1, 2017

@mklement0 Thanks for confirmation! It means the code is step in right direction!

All tests have passed on CI. I believe "New-PSSession basic test" too and you can ignore the error in context of the PR (or try discover a root of the error: copy-past test code in your session and see $error[0] - I think you don't install any package).

@iSazonov

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 2, 2017

I added comment about tests in first post.

@mklement0

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2017

@iSazonov:

Great idea, but I suggest, to avoid ambiguity, that you explicitly mention in your comment that this PR implements BOM-less UTF-8, especially given that your comment links to the .NET Core Encoding.Default property, which is currently UTF-8 with BOM (which, as stated, is not the actual default encoding (though, fortunately, it looks like getting Encoding.Default in line with that actual default is coming)).

To add some more fodder for upcoming tests, here's a simple one that tests whether PowerShell itself correctly reads BOM-less UTF-8 source code (such as scripts) by default in PS Core on Unix:

# Setup:
# Create a no-BOM UTF-8 file with the following literal content:
#   'ö' 
# which, when executed as a PS script, should echo 'ö' back, IF the script
# was correctly decoded from UTF-8 by PS.
# UTF-8 bytes: 0x27 (single quote), 0xc3 0xb6 (encoding of 'ö', U+00F6), 0x27 (single quote)
[byte[]] (0x27, 0xc3, 0xb6, 0x27) | Set-Content -Encoding Byte /tmp/$PID.ps1

# Test: See if the 'ö' is echoed back correctly.
#       Should return $True
'ö' -eq (& /tmp/$PID.ps1)
@iSazonov

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 2, 2017

@mklement0 I corrected first post about BOM-less, in the code it is obvious.

@daxian-dbw

This comment has been minimized.

Copy link
Member

commented Apr 4, 2017

@iSazonov are you going to add tests to this PR?

@iSazonov

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 4, 2017

I added in first post:

Now we don't have a full set of encoding tests (only for redirections). I believe we need to create them during future RFC implementation, not now.

I am confused. Partial test is meaningless. The full test also does not make sense until the approval of the Encoding RFC. What is compromise?

@daxian-dbw

This comment has been minimized.

Copy link
Member

commented Apr 4, 2017

@iSazonov the code changes look good to me. If the corresponding tests will come later, then we should keep that work item tracked in an issue. Is #3248 considered fixed with this PR? If not, the test work item should be tracked in the first post of that issue, similar to how you are tracking OEM, Default and tests in this PR.
And @mklement0 provides some good testing snippets here, we should also keep them in the same issue that tracks the test work item, so they can be easily referenced when it's time to complete the tests.

@iSazonov iSazonov referenced this pull request Apr 5, 2017

Open

Need a full set of encoding tests #3488

0 of 6 tasks complete
@iSazonov

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 5, 2017

@daxian-dbw Tracking Issue #3488.

@daxian-dbw

This comment has been minimized.

Copy link
Member

commented Apr 5, 2017

@iSazonov Thank you! I will merge this PR later today.

@daxian-dbw daxian-dbw merged commit 0883438 into PowerShell:master Apr 6, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@iSazonov iSazonov deleted the iSazonov:fixdefaultencoding branch Apr 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.