Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Expose POSIX functions #15289

Closed
eerhardt opened this issue Jan 18, 2017 · 25 comments

Comments

@eerhardt
Copy link
Member

@eerhardt eerhardt commented Jan 18, 2017

To allow developers to develop applications on Unix operating systems, .NET Core should contain a library exposing a set of common Unix and Linux APIs.

Rationale

Today, .NET Core programs running on Unix do not have the ability to make calls into operating system functions in a portable manor, without writing native code. There are Unix-specific functions/capabilities that are not exposed in our current set of APIs. For example, chmod and chown are commonly used functions on Unix to set file permissions, but we have no System.IO APIs that expose this functionality.

It is hard to P/Invoke directly into “libc” from managed code because of type-size differences across platforms, and constants/error codes being different across platforms.

Design tenets

A guiding tenet is to keep any additional functionality and policy out of these exposed APIs. Instead, we will simply be exposing the OS function as raw as possible.

  • If deemed necessary, we can create "helper" methods in the future to make the APIs easier to invoke.

One place where we will be adding functionality is to make the .NET APIs work across our supported platforms as seamless as possible.

  • An example here is error numbers may map to different values on different platforms. We will have a facility that will allow mapping raw platform error numbers to platform-independent error names like “EACCES”, “EPERM”, etc.

A naming tenet will be to use the same name and casing as the C API, this includes method, parameter, enum value, and field names. This allows people to consult the appropriate man page if necessary.

  • One exception is type names will still be PascalCased, to make consuming code read like standard managed code. Having a class named “group” would look confusing to most .NET developers.

Example Usages

using System.Unix.Native;

static void Main()
{
    // chmod
    int result = UnixFunctions.chmod(
        "/home/user/hello.txt", 
        FileModes.S_IRWXU | FileModes.S_IRGRP | FileModes.S_IROTH);

    if (result < 0)
    {
        UnixErrorInfo errorInfo = UnixErrorInfo.GetLastErrorInfo();
        if (errorInfo.Error == UnixError.EPERM)
            // handle permission issue
        ...
    }


    // getgrnam_r
    unsafe
    {
        Group group = new Group();
        Group* groupPointer;
        int bufferSize = 1024;
        byte* stackBuffer = stackalloc byte[bufferSize];
        result = UnixFunctions.getgrnam_r("somegroup", ref group, stackBuffer, bufferSize, out groupPointer);
        if (result != 0)
        {
            UnixErrorInfo errorInfo = new UnixErrorInfo(result);
            while (errorInfo.Error == UnixError.ERANGE)
            {
                bufferSize *= 2;
                byte[] heapBuf = new byte[bufferSize];
                fixed (byte* heapBuffer = heapBuf)
                {
                    result = UnixFunctions.getgrnam_r("somegroup", ref group, heapBuffer, bufferSize, out groupPointer);
                    if (result == 0)
                    {
                        break;
                    }

                    errorInfo = new UnixErrorInfo(result);
                }
            }
        }

        if (result != 0)
          // handle permission issue

        if (groupPointer == null)
        {
            // group wasn't found
        }
        else
        {
            string groupName = Marshal.PtrToStringAnsi(group.gr_name);
            string groupPassword = Marshal.PtrToStringAnsi(group.gr_passwd);
            uint groupId = group.gr_gid;

            List<string> groupMembers = new List<string>();
            IntPtr* memberPointer = group.gr_mem;
            IntPtr currentMember = *memberPointer;
            while (currentMember != IntPtr.Zero)
            {
                groupMembers.Add(Marshal.PtrToStringAnsi(currentMember));
                memberPointer++;
                currentMember = *memberPointer;
            }
        }
    }
}

APIs to Expose

The exposed functions will be implemented in phases. To determine the first phase of functions, I have analyzed Mono, Go, and Phython system functions. Phase 1 of this work will include functions that are:

  • Exposed by at least 2 out of the 3 platforms
  • POSIX compliant
  • Not Socket related (since there are already Socket APIs in .NET, these are lower priority)
  • Not Exec related (since there are already Process APIs in .NET)
  • Not Memory Map related (since there are already MemoryMappedFile APIs in .NET)
  • If there is an equivalent “_r” function, don’t expose the corresponding function that returns values that may point to a static area. This way our users don’t try executing non-thread safe functions. Ex. Expose “getgrnam_r()”, but don’t expose “getgrnam()”.

That results in the following list of functions for Phase 1:

  1. access
  2. chdir
  3. chmod
  4. chown
  5. close
  6. closelog
  7. confstr
  8. creat
  9. dup
  10. dup2
  11. faccessat
  12. fchdir
  13. fchmod
  14. fchmodat
  15. fchown
  16. fchownat
  17. fcntl
  18. fdatasync
  19. fpathconf
  20. fstat
  21. fstatvfs
  22. fsync
  23. ftruncate
  24. getcwd
  25. getegid
  26. geteuid
  27. getgid
  28. getgrgid_r
  29. getgrnam_r
  30. getgroups
  31. getlogin_r
  32. getpgid
  33. getpgrp
  34. getpid
  35. getppid
  36. getpriority
  37. getpwnam_r
  38. getpwuid_r
  39. getsid
  40. gettimeofday
  41. getuid
  42. isatty
  43. kill
  44. lchown
  45. link
  46. linkat
  47. lockf
  48. lseek
  49. lstat
  50. mkdir
  51. mkdirat
  52. mkfifo
  53. mknod
  54. mknodat
  55. mlock
  56. mlockall
  57. mprotect
  58. munlock
  59. munlockall
  60. nanosleep
  61. nice
  62. open
  63. openat
  64. openlog
  65. pathconf
  66. pause
  67. pipe
  68. poll
  69. posix_fadvise
  70. posix_fallocate
  71. posix_madvise
  72. pread
  73. pwrite
  74. read
  75. readlink
  76. readv
  77. rename
  78. renameat
  79. rmdir
  80. setegid
  81. seteuid
  82. setgid
  83. setlogmask
  84. setpgid
  85. setpgrp
  86. setpriority
  87. setregid
  88. setreuid
  89. setsid
  90. setuid
  91. stat
  92. statvfs
  93. strerror_r
  94. symlink
  95. symlinkat
  96. sync
  97. sysconf
  98. syslog
  99. tcgetpgrp
  100. tcsetpgrp
  101. time
  102. times
  103. truncate
  104. ttyname_r
  105. umask
  106. uname
  107. unlink
  108. unlinkat
  109. utime
  110. utimes
  111. wait
  112. waitpid
  113. write
  114. writev
@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Jan 18, 2017

Appendix A: Alternate names

Alternate Library/Namespace names :

  • System.Unix
  • System.Runtime.Unix[Runtime]
  • System.Runtime.InteropServices.Unix[Runtime]
  • Unix.Native

Alternate class names :

  • SysCall
  • Syscall
  • SystemCall
  • OS
  • OperatingSystem
  • SystemFunctions
  • OSFunctions

An option we have when introducing Linux-specific functions is to either put them in the same class as the POSIX functions, or group them in to a new class: LinuxFunctions .

@benaadams

This comment has been minimized.

Copy link
Collaborator

@benaadams benaadams commented Jan 18, 2017

If they are only POSIX, then wouldn't a variant on System.Posix make more sense rather than Unix?

Would they intertop with Windows or Windows Subsystem for Linux if available? Or throw PlatformNotSupported?

@danmosemsft

This comment has been minimized.

Copy link
Member

@danmosemsft danmosemsft commented Jan 18, 2017

@eerhardt I assume you've looked at Mono.Posix - is there any value in reusing their API surface? I figure we can't reuse the package, as it calls into their runtime.

@jonpryor

This comment has been minimized.

Copy link

@jonpryor jonpryor commented Jan 18, 2017

@danmosemsft wrote:

we can't reuse the package, as it calls into their runtime.

Mono.Posix.dll calls into MonoPosixHelper, which is just a C library, with no dependencies on the rest of mono. This shouldn't be a problem, IMHO.

@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Jan 18, 2017

If they are only POSIX, then wouldn't a variant on System.Posix make more sense rather than Unix?

Only the initial phase of added APIs are in the POSIX standard (according to http://pubs.opengroup.org/onlinepubs/9699919799/nframe.html). I plan on adding Linux-specific APIs as well in the future, as the need arises.

Would they intertop with Windows or Windows Subsystem for Linux if available? Or throw PlatformNotSupported?

I don't think it is a goal to support these APIs on Windows proper. They will throw PlatformNotSupported exceptions.
On Windows Subsystem for Linux: these APIs will work as well as they would work from a "C" library running on Windows Subsystem for Linux.

@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Jan 18, 2017

I assume you've looked at Mono.Posix - is there any value in reusing their API surface?

Yes, I've heavily based this proposal off of Mono.Posix and plan to reuse designs/approaches from it as much as possible.

As for reusing their API surface directly:

  1. I assume we don't want a "Mono" namespace in corefx.
  2. Some APIs that Mono exposes I don't feel should be exposed. One example is from the proposal above: getgrnam, which is not a thread-safe method since it returns pointers to static memory.
  3. Mono also exposes Stdlib APIs (fopen, printf), which this proposal doesn't include.
  4. Mono also exposes a lot of "higher-level" and convenience APIs, which I think are out of scope for this initial effort. The Phase 1 is strictly raw, portable APIs without convenience.
  5. Lastly, some of Mono's public APIs are direct "libc" DllImports, which I don't think we want.
@mellinoe

This comment has been minimized.

Copy link
Contributor

@mellinoe mellinoe commented Jan 18, 2017

One part that is missing (IMO) is what our rules for translating types would be. A char* could be marshaled in a variety of different ways, for example. It's not clear which rules you used to create the examples, other than ref and out seemed to be favored over pointer types. If we want as thin a layer as possible, then it makes sense to translate the pointer types directly, but that means consumers have to use ugly unsafe code to interact with otherwise simple calls (like chmod). I think we need some clear rules to clarify.

@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Jan 19, 2017

One part that is missing (IMO) is what our rules for translating parameter types would be.

This is a good point. Here is my first "stab" at it, and once we solidify on an approach, I'll add it to the proposal.

For the most part, we will follow the same marshaling strategies employed internally in our existing System.Native shims:

  1. All parameters and return values must have constant sizes across all supported platforms.
  2. Functions that take string parameters will take string on the managed/extern signature and const char* on the native shim.
  3. When appropriate, simple pointer parameters should use ref and out on the managed signature, since this will guarantee the pointers are not garbage. This doesn't hold for all parameters, for example byte* buffers, since callers need to allocate buffers in a number of different ways.
  4. File descriptors should be marshaled as IntPtr or SafeHandle, not int.
  5. off_t is marshaled as int64_t
  6. size_t is marshaled as int32_t

In the example above, I was expecting the Group struct to have IntPtr members, but it could also have byte* members instead. I don't see a clear winner between the two. On the one hand, the thing you want to do with the member is turn it into a string, which is a Marshal call away if it is an IntPtr. But if you have a byte*, you need to first cast to IntPtr to pass it into Marshal. But I can see us going either way. Using pointers by default may make the most sense in the most general cases.

Another thing to note is that while we should have guiding rules and principles, each API will need to make appropriate trade-offs on a case-by-case basis. We should be as consistent across the board as possible, but also allow flexibility for each API to do what is the most correct thing for it.

@jonpryor

This comment has been minimized.

Copy link

@jonpryor jonpryor commented Jan 19, 2017

Regarding const char* marshaling, there is one additional complication: filenames.

Consider open(2):

int open(
        const char *pathname,
        int flags
);

Q: In what encoding is pathname?

A: "What's an encoding?"

Which is to say, the encoding is wholly unspecified, and it's not difficult to create a filesystem entry which may contain invalid UTF-8:

/* What does this create? */
int r = creat ("\xff", 0664);

...and I do mean "may". On macOS Sierra, the above creates a file named %FF, not a filename with bytes { 0xff }.

Update: What's Linux do? For a co-worker of mine in a Japanese locale, ls -lFtr prints:

-rw-r--r-- 1 atsushi atsushi      0  1月 19 13:42 ?

i.e. the filename is a question mark ?. Nautilus says -- correctly -- that the filename has an invalid encoding.

In short, it isn't necessarily sane to assume that filenames retrieved from e.g. readdir(3) are in any known format, and this is a problem doubly faced by System.IO. (I realize that readdir(3) isn't currently mentioned as a binding opportunity, but passing such a "bizarre" byte array to open(2) is certainly permitted by POSIX...)

Mono's System.IO largely ignores this problem, and just assumes that user space is UTF-8, as is good and proper.

Mono.Unix has a FileNameMarshaler which, along with custom marshaling (e.g. on Syscall.open(), allows a degree of supporting arbitrary byte strings in filenames, should one be retrieved from Syscall.readdir().

I'm less certain about what CoreFX should do about this. It's really tempting to just "require" that user space be UTF-8 -- though that also introduces questions around normalization -- and otherwise ignore the problem. Alternatively, it would be Really Handy™ if System.IO "somehow" handled non-UTF-8 filenames, at which point the POSIX functions could follow suit.

@jonpryor

This comment has been minimized.

Copy link

@jonpryor jonpryor commented Jan 19, 2017

Various replies to an earlier comment:

Some APIs that Mono exposes I don't feel should be exposed. One example is from the proposal above: getgrnam, which is not a thread-safe method since it returns pointers to static memory.

While getgrnam(3) returns a pointer into static memory, managed use of Syscall.getgrnam() uses a managed-side lock, so managed use of the API is thread-safe. (If you're running concurrently with native code that also uses getgrnam(3), all bets are obviously off, unfortunately.)

So I'd argue it's a possibly-thread-safe method. Which doesn't necessarily help anything...

Mono also exposes a lot of "higher-level" and convenience APIs, which I think are out of scope for this initial effort. The Phase 1 is strictly raw, portable APIs without convenience.

This makes lots of sense, limiting project scope. That said, considering them as a part of a future project should be done now, to attempt to help with naming and organization. In the original work for Mono.Posix and the "version 2" Mono.Unix, everything was placed into the same namespace, both high-level wrappers and low-level types. This was okay for Mono.Posix, as there are only 14 types in that namespace, but Mono.Posix.Syscall doesn't bind that many methods, certainly not that many methods that require new types.

During development of Mono.Unix, lots of additional methods were bound, which required binding a number of additional types. It was during this process that the Mono.Unix and Mono.Unix.Native "split" occurred, with low-level types in Mono.Unix.Native and higher-level types in the Mono.Unix namespace. The Mono.Unix.Native namespace currently contains 101 types; in such an environment, the "high level" types could get lost in code completion, particularly since there are only 36 higher-level types in the Mono.Unix namespace.

Thus, whether or not high-level types exist is of interest and concern in Phase 1, even if they won't be added until later (if ever). Perhaps the appropriate fix is to "invert" what Mono.Unix does, and have low-level types in a Unix namespace, and higher-level types in a Unix.Whatever sub-namespace, e.g. Unix.IO.UnixStream.

Whatever is done, we should keep the larger picture in mind.

File descriptors should be marshaled as IntPtr or SafeHandle, not int.

I'm not quite sure I understand the logic here. File descriptors are ints in POSIX. Not variable-sized integers like size_t or off_t, but plain old C ints. (Which could still change in size -- there are ILP64 platforms, after all, but those are crazy ;-).

"Convention" aside, such a choice/requirement means that close(2) can't be "just" a P/Invoke into libc.so!close, it would have to be into an internal library function. This is because on LP64 platforms (Linux, macOS), close(2) will be expecting a 32-bit int. IntPtr and SafeHandle, on the other hand, will marshal as 64-bit values. The result is a parameter misalignment, the only fix for which is to instead go through a C intermediary:

int
corefx_close (void *fd)
{
    return close ((int) fd);
}

(The above is certainly true for IntPtr. I believe it to be true for SafeHandle, because I'm not aware of a way to tell SafeHandle to marshal as a 32-bit value, but I might be unaware of something...)

size_t is marshaled as int32_t

...what?!

I do not understand why size_t should be a 32-bit value on 64-bit platforms, particularly when it's a 64-bit value on 64-bit platforms.

Additionally, size_t is an unsigned value, so binding as a signed value also seems odd.

@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Jan 19, 2017

Regarding const char* marshaling, there is one additional complication: filenames.
...encoding

.NET Core's System.IO also ignores this problem. On the managed side, the DllImport uses string and const char* on the native C side.
My thought is that using string on the managed side supports 90+% of the use cases. If this ends up being an issue, we can add overloads that take byte* filename on the managed side, and the caller can do whatever encoding is appropriate.
But maybe this is breaking the "keep any additional functionality and policy out of these exposed APIs" tenet and the byte* overloads should be the first ones added. Then we can also create the string overloads that encode the string to UTF8 (or whatever the appropriate platform encoding is) and pass the byte[] into the raw API.

@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Jan 19, 2017

While getgrnam(3) returns a pointer into static memory, managed use of Syscall.getgrnam() uses a managed-side lock, so managed use of the API is thread-safe.

My opinion is getgrnam_r provides for all the same use cases. So unless there is a scenario that can only be achieved by calling getgrnam, we shouldn't expose the API. We can add it later, if deemed necessary. But if we add it now, it can't be taken away.

This makes lots of sense, limiting project scope. That said, considering them as a part of a future project should be done now, to attempt to help with naming and organization.
Whatever is done, we should keep the larger picture in mind.

100% agreed. My current thinking is these low-level, native APIs go in System.Unix.Native, which leaves open the possibility of putting higher-level APIs in System.Unix or in a sibling namespace System.Unix.Whatever. I think it provides us with a lot of flexibility in the future, and makes sense for right now.

I'm not quite sure I understand the logic here. File descriptors are ints in POSIX. Not variable-sized integers like size_t or off_t, but plain old C ints. (Which could still change in size -- there are ILP64 platforms, after all, but those are crazy ;-).
"Convention" aside, such a choice/requirement means that close(2) can't be "just" a P/Invoke into libc.so!close, it would have to be into an internal library function. This is because on LP64 platforms (Linux, macOS), close(2) will be expecting a 32-bit int. IntPtr and SafeHandle, on the other hand, will marshal as 64-bit values.

See #2904 and #4791. Basically it assists in SafeHandle usages. If we used int32_t, managed code would have to do a lot of gymnastics in order to use SafeHandle and call these methjods, i.e. .DangerousGetHandle().ToInt32().

The result is a parameter misalignment, the only fix for which is to instead go through a C intermediary:

Yes, in .NET Core all of our interop with native Unix functions goes through a C intermediary, for example System.Native.so|dylib. We only P/Invoke directly into Windows APIs and OSX APIs that are guaranteed to have a consistent binary interface across all versions we support.

size_t is marshaled as int32_t
...what?!

Oops, this was a mistake to make a blanket statement. I had quickly looked at a couple of our internal interop calls and assumed that was the rule we used when first creating the shims. The rules we used are located at https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/interop-guidelines.md#unix-shims, which say "it is most convenient to line up with the managed int as int32_t (e.g. scratch buffer size for read/write), but sometimes we need to handle huge sizes (e.g. memory mapped files) and therefore use uint64_t.".

Maybe to be as consistent as possible across these APIs, we should just always map size_t to a 64-bit integer.

Additionally, size_t is an unsigned value, so binding as a signed value also seems odd.

I really struggle on what to do here because IMO the boat has sailed in .NET for using signed values for lengths/indexes into collections. If these APIs use an unsigned integer, callers will always need to cast when passing in array.Length, and other usages of int in their code.

Thoughts? Opinions? Should these be unsigned or signed?

@jonpryor

This comment has been minimized.

Copy link

@jonpryor jonpryor commented Jan 19, 2017

My opinion is getgrnam_r provides for all the same use cases. So unless there is a scenario that can only be achieved by calling getgrnam, we shouldn't expose the API.

The only reason that comes to mind, and the reason I bound getgrnam() in Mono.Unix, is that it's plausible that some random Unix platform you're trying to run on won't support the getgrnam_r() variant. (This is arguably a stupid reason, and maybe one should instead get a more recent OS, but that was the inane rationale. ;-)

Thoughts? Opinions? Should these be unsigned or signed?

Mono.Unix.Native has been using ulong for size_t for years, and I haven't heard anyone complain about the need to cast. I am thus of the opinion that it largely doesn't matter; the POSIX-documented API is for unsigned types, so the binding should follow suit, IMHO.

@jonpryor

This comment has been minimized.

Copy link

@jonpryor jonpryor commented Jan 19, 2017

But maybe this is breaking the "keep any additional functionality and policy out of these exposed APIs" tenet and the byte* overloads should be the first ones added.

The problem with byte* overloads is that they will be painful to use. You either have an extra Marshal.StringToHGlobalAnsi() + Marshal.FreeHGlobal() around everything dealing with paths, plus the required try/catch to avoid memory leaks in case of exceptions...or you stick with strings and have this filename marshaling corner case.

...or go with the Mono.Unix approach of a custom marshaler, but that certainly adds additional runtime overhead. It Works™, and allows you to have an easier-to-use string-based API, but it's not without it's issues.

"Better", arguably, would be to implement a solution that System.IO uses, but Mono hasn't done that either, so I'm not holding out much hope of that happening...

There's also the question of whether it's a scenario worth worrying about.

@migueldeicaza

This comment has been minimized.

Copy link
Member

@migueldeicaza migueldeicaza commented Jan 20, 2017

There is no reason to create a new API when a perfectly working, debugged, battle-tested version exists already in the form of Mono.Posix.

There are no good reasons to not use the existing Mono.Posix:

I assume we don't want a "Mono" namespace in corefx.

Not worth creating a different API for that reason.

Some APIs that Mono exposes I don't feel should be exposed. One example is from the proposal above: getgrnam, which is not a thread-safe method since it returns pointers to static memory.

It does not hurt anyone. Nothing prevents us from using [Obsolete] to give better guidance to our users.

Mono also exposes Stdlib APIs (fopen, printf), which this proposal doesn't include.

Not a reason to not bring it. A year from now, you will realize "Shoot, those would be useful, guess we should bind them".

You talk about this being "Phase 1". The problem is that you do not have a roadmap or understanding for what "Phase N" will be. So we can dismiss this.

Mono also exposes a lot of "higher-level" and convenience APIs, which I think are out of scope for this initial effort. The Phase 1 is strictly raw, portable APIs without convenience.

This is because you have a narrow view of the problem, summarized in this incremental approach at designing the API that you call "Phase 1". You are trying to bind APIs blindly and you think that this will get the job done. But it won't, it is only the start of a process. Once the API is out there, you will get real world feedback, you will come to the same conclusions that we did - except that you would have created a fork in the road, and slowed down everyone in the process by delivering your currently sub-standard API instead of embracing the battle tested implementation.

Higher-level convenience APIs that are there because they addressed real-world use cases and reduced user bugs. Use cases that exist from almost 15 years of building Unix applications with .NET, a space that you are only entering now.

You are not forced to use them if you do not want them.

Lastly, some of Mono's public APIs are direct "libc" DllImports, which I don't think we want.

This is good guidance for newcomers and people that are not familiar with the domain.

Let me explain why this is not a problem.

Let us start with the basics. This is an implementation detail, so if there is a problem with this approach, it can trivially be changed without affecting the API surface.

The core issue with P/Invoke into libc is whether you can express the ABI requirements of calling a function. This happens in particular with functions like stat that take a struct stat buffer which is platform, architecture and ABI specific. That is why you can not P/Invoke stat directly it is not possible to express in managed code what the underlying data structure will be at runtime.

On the other hand we have APIs like write, whose signature can be perfectly be expressed with P/Invoke. We have yet to find a platform where Mono.Posix signatures for invoking libc methods fail, and Mono runs on many more platforms, more architectures and more ABIs than .NET Core does, so we believe that our current use of P/Invoking into libc works.

If the day comes where we find a platform where P/Invoking a specific method of libc does not work, we would change the implementation and it would not affect the API surface, nor 99.9% of the existing implementation.

We are deeply familiar with this problem - after all, it was my team that provided the guidance to the .NET Core team on why they should not P/Invoke into libc directly for a series of APIs on the early days of the port. Work that had to be done later on, as the team discovered the very problem that we had described to them.

@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Feb 2, 2017

@migueldeicaza - you have me convinced. I agree that we should port Mono.Posix to .NET Core instead of inventing a new API that does the same things, but not covering all the same scenarios.

I took a first stab at building it for .NET Core here: https://github.com/eerhardt/mono/commits/MonoPosixNETCore. I've run a few simple functions on .NET Core 2.0 on my mac, which worked. I also hit a bug in .NET Core 2.0: dotnet/coreclr#9287, which needs to be fixed.

Overall, I think an approach we can take would work like this:

  1. Keep the code building and maintained in the mono repo. That way everyone gets new bug fixes/features/etc, and we don't have to maintain a fork of the code.
  2. The mono repo would build a NuGet package that is published on nuget.org (and on myget.org while it is pre-release).
  3. The NuGet package would contain both the managed and native assets necessary for the code to run on .NET Core 2.0.

This NuGet package would be in the same category as the Newtonsoft.Json or Microsoft.Extensions.Configuration packages. They don't ship in the .NET Core runtime, but they are supported by Microsoft.

All the APIs of Mono.Posix that work on .NET Core 2.0 would be available.

If we agree on this overall approach, I can close out this issue and open a new issue on the mono repo to start the work of building this new NuGet package.

@migueldeicaza

This comment has been minimized.

Copy link
Member

@migueldeicaza migueldeicaza commented Feb 9, 2017

Hello @eerhardt

This is music to my ears. I do like the approach.

One thing that we have done in the past for some pieces of Mono code is to extract them into a separate top-level module (and we have a script that extracts the code with the complete Git history) and then we have taken a dependency from Mono into the module, and .NET core takes a dependency into that module as well.

We can start with your suggestion today, but we can also extract the code (it will take us just a couple of days to get it done).

@jnm2

This comment has been minimized.

Copy link
Collaborator

@jnm2 jnm2 commented Feb 11, 2017

  1. The NuGet package would contain both the managed and native assets necessary for the code to run on .NET Core 2.0.

I would be fascinated to learn how this works. I have been unable to achieve this with NuGet v3+. NuGet/Home#3970

@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Feb 14, 2017

@jnm2 - The classic example is the libuv package, which is a "fat" package that contains native assets for all the runtimes it supports. See http://www.nuget.org/packages/Libuv/1.9.1 for the package, https://github.com/aspnet/libuv-build for building the source code, and then https://github.com/aspnet/libuv-package for building the package.

You would just also add in the managed IL assembly into the same package.

I'm unsure on a "clean and easy" way of enabling this using MSBuild or the dotnet pack command, but I've built packages like this in the past using a .nuspec file and using nuget.exe to pack the .nuspec file. Here's a snippet of the nuspec file:

  <files>
    <file src="MyRidLib/bin/Debug/netstandard1.3/MyRidLib.dll" target="ref\netstandard1.3\" />
    <file src="PathToWin7Native/NativeAssembly.dll" target="runtimes\win7-x64\native" />
    <file src="PathToWin10Native/NativeAssembly.dll" target="runtimes\win10-x64\native" />
  </files>

And just replace the win7-x64 and win10-x64 folders with the appropriate RIDs.

@jnm2

This comment has been minimized.

Copy link
Collaborator

@jnm2 jnm2 commented Feb 18, 2017

I'm relieved that I finally have something that works! .nuspec is perfect.

I have some complaints.

1. Docs

I can't find the magic strings runtimes\win\native, runtimes\win-x86\native and runtimes\win-x64\native documented anywhere, so the following is figured out by trial and error. Docs would be nice.

2. Concerning NuGet v2 (net462 .csproj + packages.config)

NuGet v2 (packages.config) projects will not copy the native asset to the build output at all. Not a problem because everyone has moved to NuGet v3 (.csproj + project.json) and soon v4, but I want a way to fail the package installation for NuGet v2 projects. minClientVersion doesn't have an effect, because the client version is higher than 2 even though it's operating on a .csproj using v2 API (packages.config).

3. Concerning NuGet v3 (net462 .csproj + project.json)

(To clarify, when I say project.json, I'm not talking about .NET Core. I'm talking about a traditional net462 .csproj + project.json.)

Ideally I'd use runtimes\win7-x86\native because that's what the binary is, or runtimes\win7\native because it's not loaded in-process so it doesn't matter what the consuming application's architecture is. The binary only supports Windows 7+.
However, this is not possible. In order for the native dll to appear in the build output, I must use the following three magic strings:

  • runtimes\win\native (for AnyCPU .NET apps)
  • runtimes\win-x86\native (for x86 .NET apps)
  • runtimes\win-x64\native (for x64 .NET apps)

This means that the same file is duplicated three times in the nupkg just in order to show up in the build output of .NET apps of each of the three types. I don't like the duplication, but mostly what I don't like is the fact that there's no way to constrain it to win7- if I do, it won't show up in any NuGet v3 project build output. .NET forces the project.json runtime to be one of win, win-x86, or win-x64 and fails the build if you try to use win7, and won't copy a native win7 DLL if project.json has win.

I would expect something hierarchical- anything in runtimes\win\native should get picked up by win*-* NuGet v3 projects.

4. Concerning NuGet v4 (net462 .csproj, CPS msbuild)

If I use the <RuntimeIdentifier> win, win-x86 or win7-x64, everything works properly.
But as soon as I specify win7, which I can't do in a NuGet v3 project but I can in a NuGet v4 project, the native asset is no longer copied to the build output. I have to add the DLL again for win10, apparently it's not good enough to add win7.

Now the nice thing is that unlike NuGet v3, NuGet v4 seems to let me skip adding the -x86 and -x64. Since NuGet v3 can't target win7 anyway, no point in adding win7-x86 because NuGet v4 grabs win7 for win7-x86 (as it should).

This means, in order to work properly with NuGet v4 projects, I must package the same DLL seven times at least, and I'm still probably missing something important:

  • runtimes\win\native
  • runtimes\win-x86\native
  • runtimes\win-x64\native
  • runtimes\win7\native
  • runtimes\win8\native
  • runtimes\win81\native
  • runtimes\win10\native

Something seems broken here.

In each of these scenarios, it seems that either I'm missing crucial documentation, or the design has not really been thought through yet. (If the design is still in progress, I'd appreciate knowing which issues to subscribe to.)


So I'm happy that I can hack together a nuspec that works. Is there any way to ease the pain?

@terrajobst terrajobst modified the milestones: 2.0.0, Future Feb 21, 2017
@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Mar 1, 2017

Docs

There are some "runtimes" sections here: https://docs.microsoft.com/en-us/nuget/create-packages/project-json-and-uwp that describe this feature when it was added in NuGet v3.

Another good doc on RIDs is here: https://github.com/dotnet/corefx/blob/master/pkg/Microsoft.NETCore.Platforms/readme.md

These docs should help solve your problems for NuGet v4. There is a "fallback" strategy as outlined in the "How does NuGet use RIDs?" section of the 2nd document.

Feel free to log issues in https://github.com/nuget/Home for things that don't work.

@Drawaes

This comment has been minimized.

Copy link
Collaborator

@Drawaes Drawaes commented Apr 18, 2017

I am wondering if this API shouldn't be cast in stone yet. There is discussion going on around native int And other native data types and if that goes through should this API use those types rather than fixing in cases like

File descriptors should be marshaled as IntPtr or SafeHandle, not int.
off_t is marshaled as int64_t
size_t is marshaled as int32_t

@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Apr 18, 2017

The current plan is to package Mono.Posix into a NuGet package that is compatible with .NET Core 2.0. This would involve using the existing APIs exposed by this library.

My understanding is @bholmes is working on this. @bholmes - any update?

@bholmes

This comment has been minimized.

Copy link

@bholmes bholmes commented May 4, 2017

Yes there is some progress. There is a pull request in Mono that has the changes to Mono.Posix that are needed. Next I have a Mono.Posix folder in a fork that will handle build and test scripts for .Net Core 2.0. Finally here is the Jenkins location where the package is built. You can grab a nupkg there if you want to test.

@eerhardt

This comment has been minimized.

Copy link
Member Author

@eerhardt eerhardt commented Jun 21, 2017

I'm going to close out this issue, since this proposal is no longer relevant.

We now have a version of Mono.Posix that works on .NET Core 2.0 published to https://www.nuget.org/packages/Mono.Posix.NETStandard/1.0.0-beta1. Anyone who is interested, please try it out and give feedback.

Thanks, @bholmes for all the work you and your team did to making it easier to call these common Unix and Linux APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.