Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS: Marshal.PtrToStringAuto() is broken, it's not 'auto' (suspect similar issue in Linux) #12343

Closed
jpfreyen opened this issue Mar 26, 2019 · 17 comments
Labels
area-Interop-coreclr help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@jpfreyen
Copy link

Environment

macOS Mojave 14.3 (English)
.NETCore 2.2 with Rider 2018.3.4

Explanation

When doing an unmanaged native call, getpwuid(), to macOS to retrieve account stats, char encoding will be all screwed up if Marshal.PtrToStringAuto() is used. *Auto() is supposed to "figure it out" on what encoding to apply (ANSI, UTF8, Unicode, etc). Only if someone does a specific Marshal.PtrToStringAnsi() will they get legible, correct chars as represented on macOS system.

Note that the mono implementation of .NETFramework 4.6.2 has no issues with this and *Auto() works fine.

I've attached a screenshot highlighting the problem. I have also attached an example project to play with that replicates the problem. Should work out-of-the-box assuming you run this on a macbook. Please see the 'TODO' in the code in case you need to change the uid.

I suspect running this in a Linux environment will have the same problem, but this sample project is specifically set up to call libc.dylib (but if you just replace libc.dylib with libc.so.6 it will probably work)

Screen Shot 2019-03-26 at 12 32 11 PM

forMsft2-auto.tar.gz

@tannergooding
Copy link
Member

*Auto() is supposed to "figure it out" on what encoding to apply

AFAIK, PtrToStringAuto does not do any special logic to detect the encoding of a given string.

This existed in order to support differentiating between Windows 95/98 which were ANSI based and Windows 2000+ which were Unicode based.

The current implementation just forwards to PtrToStringUni and notes that ANSI based platforms are no longer supported: https://source.dot.net/#System.Private.CoreLib/shared/System/Runtime/InteropServices/Marshal.cs,93.

I imagine there are design notes somewhere on why this doesn't default to UTF-8 on Unix platforms, but I'm not sure where those would be.

@jpfreyen
Copy link
Author

jpfreyen commented Mar 26, 2019

Can the documentation be updated to reflect your comments please, esp the comment you highlighted in your code?

From Microsoft's documentation:

https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.marshal.ptrtostringauto?view=netcore-2.2

and this part:

Remarks
If the current platform is Unicode, each ANSI character is widened to a Unicode character and this method calls PtrToStringUni. Otherwise, this method calls PtrToStringAnsi.

I expect that PtrToStringAuto() to detect that macOS is an ANSI based platform and should have called PtrToStringAnsi.

So I have no reason to believe but to conclude that PtrToStringAuto() has a bug based on the documentation.

@tannergooding
Copy link
Member

I expect that PtrToStringAuto() to detect that macOS is an ANSI based platform and should have called PtrToStringAnsi.

I believe the default for Unix (both Linux and modern OSX) is considered to be UTF-8 rather than ANSI or ASCII, so it would likely be more appropriate for it to call PtrToStringUtf8.

@tannergooding
Copy link
Member

@jkotas, do you see any problems with the documentation being updated for PtrToStringAuto?

Also, do you know why it doesn't forward down to PtrToStringUtf8 or PtrToStringAnsi on Unix?

@jpfreyen
Copy link
Author

@tannergooding I tried PtrToStringUtf8(). Doesn't work. Only thing that works is *Ansi. The sample project is there for you to try. So if it PtrToStringUtf8() should be working correctly then that is a bug.

@jkotas
Copy link
Member

jkotas commented Mar 26, 2019

CoreCLR built-in interop treats Unix as UTF16 supporting system and maps "Auto" to UTF16 for this API and number of other places. I do not think we made it intentionally this way. Unix inherited the Windows behavior.

Mono is the opposite way. I would be fine with changing the default on Unix to match Mono. It would be a breaking change, but I think the risk is acceptable. I doubt that anybody is using the "Auto" marshaling with the current default on .NET Core today.

cc @jkoritzinsky @AaronRobinsonMSFT

@jkotas jkotas transferred this issue from dotnet/corefx Mar 26, 2019
@jkotas
Copy link
Member

jkotas commented Mar 26, 2019

https://github.com/dotnet/coreclr/issues/20504 is duplicate of this issue.

@jkotas
Copy link
Member

jkotas commented Mar 26, 2019

I tried PtrToStringUtf8(). Doesn't work.

Could you please share a repro for how PtrToStringUtf8 does not work?

@jpfreyen
Copy link
Author

jpfreyen commented Mar 26, 2019

The sample project is attached to this bug at the beginning, tar.gz.

Slight correction on PtrToStringUTF8(). It does work and build correctly in the sample project I attached on .NETCore 2.2. But it doesn't seem to build right targeting .NETStandard 2.0 and .NETFramework4.6.2 in our real project; I get the error:

"[CS0117] Marshal does not contain a defintion for "PtrToStringUTF8"

Not sure if the sample project I attached will be able to reproduce that problem. Neverless, we cannot use it, we have to use .PtrToStringAnsi() for macOS target.

@jpfreyen
Copy link
Author

CoreCLR built-in interop treats Unix as UTF16 supporting system and maps "Auto" to UTF16 for this API and number of other places. I do not think we made it intentionally this way. Unix inherited the Windows behavior.

Mono is the opposite way. I would be fine with changing the default on Unix to match Mono. It would be a breaking change, but I think the risk is acceptable. I doubt that anybody is using the "Auto" marshaling with the current default on .NET Core today.

cc @jkoritzinsky @AaronRobinsonMSFT

We are catching these things in .NETCore from our mono implementation being ported over to the .NETCore environment. So some "Auto"'s got used in the mono implementation which worked fine in macOS and Linux, and now I'm going through reworking this stuff for .NETCore because its not the same behavior.

@jpfreyen
Copy link
Author

Yes, back on "PtrToStringUTF8". We have a library targeting .NET Standard 2.0.0, at attempt for compliance between .NETFramework and .NETCore. "PtrToStringUTF8" doesn't exist in .NET Standard 2.0.0

@AaronRobinsonMSFT
Copy link
Member

If we are going to change this, it must be done in the 3.0 timeframe.

cc @jeffschwMSFT

@AaronRobinsonMSFT
Copy link
Member

@GrabYourPitchforks as an FYI, it looks like we are going to be defaulting to UTF-8 for Auto marshalling on non-Windows (matching Mono semantics). I doubt this has any impact on your UTF-8 work, but it is at least peripherally in your wheelhouse.

@jpfreyen
Copy link
Author

@AaronRobinsonMSFT Right now that won't work, there needs to be detection of what the OS supports. MacOS only works with PtrToStringAnsi().

@jpfreyen
Copy link
Author

Could I ask another PtrToString*() function and replacement?

Mono had Marshal.PtrToStringArray(). Could you please tell me what would be the equivalent in .NETCore? I have been reading the 'Marshal' .NETCore API documentation but I haven't found anything that tips me off that would be an equivalent call.

Thank you!

@jkotas
Copy link
Member

jkotas commented Mar 29, 2019

Mono had Marshal.PtrToStringArray()

This is a trivial helper function built out of other APIs available in .NET Core. You can either copy the implementation to your project from https://github.com/mono/mono/blob/master/mcs/class/Mono.Posix/Mono.Unix/UnixMarshal.cs, or get it via https://www.nuget.org/packages/Mono.Posix.NETStandard/ NuGet package that contains it as well. The NuGet package contains a ton of other stuff - if you need just this single function from it, copying it over is probably better option.

@jpfreyen
Copy link
Author

@jkotas Thanks for that pointer! I referred to UnixMarshal.cs for an idea.

jkoritzinsky referenced this issue in dotnet/coreclr Apr 3, 2019
Match Mono's behavior by changing the Auto character set to mean UTF-8 on non-Windows platforms (new behavior) and UCS-2/UTF-16 on Windows (current behavior).

Fixes #23464 
Fixes dotnet/corefx#32442

Impact of breaking change: It is highly unlikely that anyone is actively using current behavior since it is inconsistent with Mono and doesn't match any native system APIs on non-Windows platforms (they're all UTF-8 based).

We will need to update our documentation to reflect this updated behavior.
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Interop-coreclr help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

5 participants