Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Marshal.Alloc api that accepts alignment argument #33244

Closed
tmds opened this issue Mar 5, 2020 · 29 comments · Fixed by #54006
Closed

Provide Marshal.Alloc api that accepts alignment argument #33244

tmds opened this issue Mar 5, 2020 · 29 comments · Fixed by #54006
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices
Milestone

Comments

@tmds
Copy link
Member

tmds commented Mar 5, 2020

API proposal

namespace System.Runtime.InteropServices
{
    public static partial class Marshal
    {
          // Existing APIs
          public static System.IntPtr AllocCoTaskMem(int cb);
          public static System.IntPtr AllocHGlobal(int cb);
          public static System.IntPtr AllocHGlobal(System.IntPtr cb);
          public static void FreeCoTaskMem(System.IntPtr ptr);
          public static void FreeHGlobal(System.IntPtr hglobal);
          public static System.IntPtr ReAllocCoTaskMem(System.IntPtr pv, int cb);
          public static System.IntPtr ReAllocHGlobal(System.IntPtr pv, System.IntPtr cb);
+        // Portable wrapper for C-runtime malloc
+        public static IntPtr AllocMemory(nint byteCount);
+        // Portable wrapper for C-runtime realloc
+        public static IntPtr ReAllocMemory(IntPtr ptr, nint byteCount);
+        // Portable wrapper for C-runtime free
+        public static void FreeMemory(IntPtr ptr);
+        // Portable wrappers for equivalent C-runtime method (Windows: _aligned_malloc, Posix: posix_memalign)
+        public static IntPtr AllocAlignedMemory(nint byteCount, int alignment);
+        public static void FreeAlignedMemory(IntPtr ptr);
    }
}

Rationale

The alignment returned by Marshal.AllocHGlobal may differ based on machine architecture and OS, see #33228. Some native APIs require specific memory alignment, some operations offer much better performance when the memory is aligned.

Example

IntPtr alignedAddr = Marshal.AllocAlignedMemory(16 * 32, 16);
Marshal.FreeAlignedMemory(alignedAddr);

Implementation

On posix platform, this may be implemented using posix_memalign.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-Meta untriaged New issue has not been triaged by the area owner labels Mar 5, 2020
@jkotas
Copy link
Member

jkotas commented Mar 5, 2020

This needs to be a new pair of alloc / free APIs to make it cross-platform. There is no away to allocate aligned HGlobal memory on Windows. These APIs need to map to _aligned_malloc / _aligned_free on Windows. The names of the new APIs should not contain HGlobal.

@jkotas jkotas added area-System.Runtime.InteropServices api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed area-Meta labels Mar 5, 2020
@jeffschwMSFT jeffschwMSFT added this to the Future milestone Mar 5, 2020
@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 5, 2020
@scalablecory
Copy link
Contributor

scalablecory commented Mar 6, 2020

Do you explicitly need an HGLOBAL (or whatever you get on *nix, presumably just void* from malloc)?

If not, and a pinnable array would work, then check out @Maoni0's coming APIs #27146

@jkotas
Copy link
Member

jkotas commented Mar 6, 2020

The downside of pinnable heap is that it does not provide explicit lifetime control, so it is not suitable for large buffers where you cannot depend on GC for lifetime control.

If we were to implement this API, I think we would make it a thin wrapper over AllocHGlobal and not depend on posix_memalign at all.

public static IntPtr AllocAlignedMemory(int cb, int alignment)
{
    if (alignment is not power of two) throw new ArgumentException();

    IntPtr block = AllocHGlobal(checked(cb + sizeof(IntPtr) + (alignment - 1)));

    // Align the pointer
    IntPtr aligned = (IntPtr)((nint)(block + sizeof(IntPtr) + (alignment - 1)) & ~(alignment - 1));

    // Store the pointer to the memory block to free right before the aligned pointer 
    *(((IntPtr*)aligned) - 1) = block;

    return aligned;
}

public static IntPtr FreeAlignedMemory(IntPtr p)
{
    if (p != IntPtr.Zero) FreeHGlobal(*(((IntPtr*)p) - 1));
}

@tmds tmds changed the title Provide overload for Marshal.AllocHGlobal that accepts alignment argument Provide Marshal.Alloc api that accepts alignment argument Mar 10, 2020
@tmds
Copy link
Member Author

tmds commented Mar 10, 2020

I've removed HGlobal from the name, and added a Free method also.

@EatonZ
Copy link
Contributor

EatonZ commented May 16, 2020

+1 for this, would come in handy when working with FILE_FLAG_NO_BUFFERING.

@sunkin351
Copy link

+1 from me as well, an overload of void* Alloc(int cb) would also be nice for just calling into the CRT's normal malloc() implementation.

@adamsitnik
Copy link
Member

adamsitnik commented Apr 28, 2021

this would help a lot with #27408 (which blocks #24847)

@tmds is there any chance that you would be interested in implementing that (assuming that I would get the API approved?)

@jkotas
Copy link
Member

jkotas commented Apr 28, 2021

@adamsitnik We need to get the API shape finalized first - see #33244 (comment).

The implementation itself is trivial. I have shared the portable version in #33244 (comment) . Optionally, we can also have a more optimized implementation for platforms where posix_memalign is available.

@tannergooding
Copy link
Member

I'd really rather we didn't implement this API in terms of AllocHGlobal, even just P/Invoking to malloc would be better (but calling _aligned_malloc or posix_memalign is better).

AllocHGlobal is very slow and calls into an API that is no longer recommend for use on Windows. When it was compared against my mimalloc port, which is almost entirely managed code it was nearly 20x slower (https://twitter.com/tannergooding/status/1365714307145629697?s=20):
image

@jkotas
Copy link
Member

jkotas commented Apr 28, 2021

I won't disagree that a fast explicit lifetime allocator would be nice. It is what the POH and ArrayPool should really be.

@Maoni0
Copy link
Member

Maoni0 commented Apr 30, 2021

POH was designed to solve a specific category of problems and being "a fast explicit lifetime allocator" was never one of its goals.

a good way for us to move forward is to illustrate the current problems with ArrayPool and what behavior you want it to have with usage examples. then we can talk about whether a GC mechanism is needed. I will open an issue for us to discuss that.

@Maoni0
Copy link
Member

Maoni0 commented Apr 30, 2021

I opened #52098 to track this discussion

@adamsitnik adamsitnik added the api-ready-for-review API is ready for review, it is NOT ready for implementation label May 5, 2021
@adamsitnik adamsitnik self-assigned this May 5, 2021
@adamsitnik adamsitnik removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label May 5, 2021
@adamsitnik adamsitnik modified the milestones: Future, 6.0.0 May 5, 2021
@adamsitnik
Copy link
Member

We need to get the API shape finalized first

@jkotas I've updated the proposal and marked it as ready for review. PTAL

When it comes to implementaiton it has to be simply as fast as possible because it would be used mostly for high perf scenarios.

@jkotas
Copy link
Member

jkotas commented May 5, 2021

I have edited the proposal above some more:

  • Added existing methods for reference
  • We should add wrappers for simple malloc/free/realloc at the same time. It will allow us to address the AllocHGlobal overhead on Windows. Also, it looks odd to use APIs with Windows-specific names like AllocHGlobal on non-Windows.
  • The byteCount should be native int

Some questions up for discussion:

  • Should the new APIs return void* instead of IntPtr? We have been leaning towards using actual pointers for pointers to buffers.
  • Should the new APIs take the buffer size as nuint to match the underlying unmanaged API?
  • Existing APIs use Mem abbreviation. Do we want to match it (Marshal.AllocMem/FreeMem)? Or should we omit Mem completely (Marshal.Alloc/Free)?

@tannergooding
Copy link
Member

Some questions up for discussion:

I'd like us to:

  • return void*
  • take nuint, because that's what the C APIs take/it avoids questions about negative inputs
  • omit Mem (Alloc or Allocate)

Would it be worth just having these on Unsafe and not on Marshal? Or potentially somewhere else?
They aren't really related or restricted to marshalling, they are just unmanaged memory allocation APIs and I'd bet they'll get usage in a few other scenarios as well.

Note on the below: I'm definitely not saying we need or should expose all of these, this is just giving an example that this is quite a "varied" space and there is room to do quite a bit here just from what POSIX/Windows expose.

For example, the set of "standard" APIs that mimalloc (https://github.com/microsoft/mimalloc/blob/master/include/mimalloc.h) provides is:

  • malloc - allocate memory
  • calloc - allocate array and clear to zero
  • realloc - reallocate memory and don't free previous allocation
  • expand - expand allocated memory, if space exists
  • free - free memory

It also provides some "extended" APIs:

  • malloc_small - allocate less than 128 * sizeof(void*) bytes
  • zalloc - allocate memory and clear to zero
  • zalloc_small
  • mallocn - allocate array
  • reallocn - reallocate array and don't free previous allocation
  • reallocf - reallocate memory and free previous allocation

It also provides some _aligned and aligned_at variants of the above (except for expand and free).
It also provides variants of the above that take an explicit heap.
It also provides variants of the above that explicitly zero the memory.
It also provides templated variants of the above that return T* rather than void*

It also provides some POSIX/Windows functions, such as:

  • malloc_size
  • malloc_usable_size
  • posix_memalign
  • memalign
  • valloc
  • pvalloc
  • aligned_alloc
  • reallocarray
  • free_size
  • free_size_aligned
  • free_aligned
  • new - compatible with C++ new

@GrabYourPitchforks
Copy link
Member

Agree with Tanner's comments about preferring to take nuint instead of nint.

We're not explicitly stating that these are equivalent to calling malloc, free, or any other CRT-provided API; or HeapAlloc or any other OS-provided API, correct? So calling AllocMemory in C# and free in C would be undefined behavior. (We might not even be compiled against the same CRT.) The contract would be that AllocMemory must always be paired with FreeMemory, and the consumer can't rely on the implementation.

@jkotas
Copy link
Member

jkotas commented May 7, 2021

We're not explicitly stating that these are equivalent to calling malloc, free, or any other CRT-provided API;

Why not? I have explicitly stated it in the comments in my edit of the proposal. I do not see enough value in .NET ever providing private implementation of malloc/free in the core platform.

@tannergooding
Copy link
Member

I'd agree here. I think documenting ourselves as calling malloc and free is fine. The same goes for APIs like _aligned_malloc, _aligned_free or the POSIX APIs, and other variants that are considered part of the platform's C/Core runtime
Such APIs provide minimal guarantees and are open to being changed in the future, because the CRT changes them from time to time as well.

I think its only problematic for us to tie ourselves to actual platform APIs, like LocalAlloc, GlobalAlloc, or HeapAlloc. Because these APIs tend to have much stricter guarantees/semantics and don't get changed.
This is the problem with AllocHGlobal and why we can't save 15% or more by switching to HeapAlloc (that was a bad pun, I'm sorry)

@GrabYourPitchforks
Copy link
Member

Why not? I have explicitly stated it in the comments in my edit of the proposal. I do not see enough value in .NET ever providing private implementation of malloc/free in the core platform.

I don't care if we wrap malloc under the covers as an implementation detail. I meant that we shouldn't make any statements that developers are able to call AllocMemory in C#, pass that to their C code, and call free in their own CRT. (Or vice versa.) That seems error-prone.

@jkotas
Copy link
Member

jkotas commented May 7, 2021

We can document that this does not work with statically linker CRT or legacy SxS CRTs on Windows.

Otherwise, I think it would be fine to document that this wraps C malloc/free. There is one-and-only CRT per process in any recent OS. It is not unusual for library APIs to depend on it, in particular on Unix.

@bartonjs
Copy link
Member

bartonjs commented May 18, 2021

Video

  • There was an argument that Alloc methods on Marshal should return IntPtr. So we made a new type to allow it to use void* and nuint.
  • We got rid of all of the "Memory" words, since that's part of the type name.
  • We aligned with C runtime names (Alloc => Malloc, AllocAligned => AlignedAlloc)
  • We added Calloc, because why not?
namespace System.Runtime.InteropServices
{
    public static class NativeMemory
    {
        public static unsafe void* Calloc(nuint elementCount, nuint elementSize);
        public static unsafe void* Malloc(nuint byteCount);
        public static unsafe void* Realloc(void* ptr, nuint byteCount);
        public static unsafe void Free(void* ptr);

        public static unsafe void* AlignedAlloc(nuint alignment, nuint byteCount);

        // This may not be needed, if all of our runtimes guarantee that AlignedAlloc can be passed straight to Free.
        public static unsafe void AlignedFree(void* ptr);
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 18, 2021
@jkotas
Copy link
Member

jkotas commented May 18, 2021

Are the APIs going to throw OOM or return NULL on failure? It would make sense for them to throw - everybody would have to check for null otherwise, but then it is confusing that they use the C names since the C functions are non-throwing.

We added Calloc, because why not?

The reason for why not - Calloc tends to be a pit of failure from performance point of view since it has to return zero-filled memory.

@GrabYourPitchforks
Copy link
Member

It would make sense for them to throw

Agree. They should throw rather than return nullptr.

then it is confusing that they use the C names since the C functions are non-throwing.

I don't think this will be an issue. The names indicate "this is a thin wrapper around the CRT functions." Adding minimal .NET-specific policy (like throwing instead of returning nullptr) shouldn't be a dealbreaker even if we keep these names.

Calloc tends to be a pit of failure from performance point of view since it has to return zero-filled memory

Wouldn't perf-oriented people naturally tend to shy away from calloc? The scenarios where I've used it are scenarios where I needed the zero-init guarantees.

@jkotas
Copy link
Member

jkotas commented May 18, 2021

Adding minimal .NET-specific policy (like throwing instead of returning nullptr) shouldn't be a dealbreaker even if we keep these names.

If it has mimimal .NET-specific policy, it should also have mimimal .NET-specific name. I would call it Alloc to avoid this confusion. It is also better looking name.

Wouldn't perf-oriented people naturally tend to shy away from calloc?

You need to read fine print in the manual to understand the differences. Many people see callloc as something that will conviently multiple two numbers for them, without realizing that it also comes with very non-trivial performance overhead.

@bartonjs
Copy link
Member

Many people see callloc as something that will conviently multiple two numbers for them, without realizing that it also comes with very non-trivial performance overhead.

I've always known it as "clear-alloc", so the multiplying thing confuses me, since it hides my intent of cleared memory. (So I just pass 1 for elsize)

@GrabYourPitchforks
Copy link
Member

If it has mimimal .NET-specific policy, it should also have mimimal .NET-specific name. I would call it Alloc to avoid this confusion. It is also better looking name.

I'm fine with this suggestion. We gave the hardware intrinsics .NET-friendly names, after all. If we rename 'Malloc', we should also rename 'Calloc'. I'm otherwise fine with 'AlignedAlloc' and 'Free'.

@jkotas
Copy link
Member

jkotas commented May 18, 2021

I've always known it as "clear-alloc

Other popular calloc mnemotechnics are "contiguous allocation" or "allocate memory for given count"

Do we really need calloc equivalent? I think it has marginal value. If we do need it, how about AllocZeroInitialized(nuint byteCount)?

@tannergooding
Copy link
Member

Do we really need calloc equivalent

Not strictly speaking, but it rounds out the exposed APIs to match those in C11 and C++17.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 10, 2021
@tannergooding
Copy link
Member

tannergooding commented Jun 16, 2021

As per #54006 the API surface has changed somewhat and was approved via an internal e-mail to @dotnet/fxdc

namespace System.Runtime.InteropServices
{
    public static class NativeMemory
    {
        public static unsafe void* Alloc(nuint byteCount);
        public static unsafe void* Alloc(nuint elementCount, nuint elementSize);

        public static unsafe void* AllocZeroed(nuint byteCount);
        public static unsafe void* AllocZeroed(nuint elementCount, nuint elementSize);

        public static unsafe void* Realloc(void* ptr, nuint byteCount);

        public static unsafe void Free(void* ptr);

        public static unsafe void* AlignedAlloc(nuint byteCount, nuint alignment);
        public static unsafe void AlignedFree(void* ptr);
        public static unsafe void* AlignedRealloc(void* ptr, nuint byteCount, nuint alignment);
    }
}

In particular:

  • Friendly .NET names are preferred (Calloc -> AllocZeroed, Malloc->Alloc)
  • Overloads that take byteCount and elementCount, elementSize were added to reach parity
  • Parameter order for AlignedAlloc was swapped
  • AlignedRealloc will be exposed since calling Realloc is not viable

Try* APIs were also raised but will wait for additional feedback before being exposed.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 18, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices
Projects
None yet
Development

Successfully merging a pull request may close this issue.