Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a generic API to read from and write to a pointer #16143

Closed
terrajobst opened this issue Jan 15, 2016 · 59 comments
Closed

Provide a generic API to read from and write to a pointer #16143

terrajobst opened this issue Jan 15, 2016 · 59 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@terrajobst
Copy link
Member

C# doesn't support pointer operations on generic types. We can, however, express them in IL. A prototype could like this:

public static class Unsafe
{
      [MethodImpl(MethodImplOptions.AggressiveInlining)]
      public static T Read<T>(void * p)
      {
          // Can't be expressed in C#, thus in IL:
          ldarg.0
          ldobj !!T
          ret
      }

      [MethodImpl(MethodImplOptions.AggressiveInlining)]
      public static void Write<T>(void * p, T value)
      {
          // Can't be expressed in C#, thus in IL:
          ldarg.0
          ldarg.1
          stobj !!T
          ret
      }
}

This would also solve issues like #16026 without any additional work.

@nietras
Copy link
Contributor

nietras commented Jan 17, 2016

I assume the discussion should continue here. Since I can't wait to be able to use this I created an implementation of this and published it as a NuGet package called DotNetCross.Memory.Unsafe this contains just the static class Unsafe and the proposed functions. This package has been marked alpha so be sure to include prereleases if you want to check this out. I haven't had time to add unit tests and tried doing some benchmark tests but BenchmarkDotNet does not work out of the box for this.

I have constrained the type to be a value type, as compared to @jkotas code and the proposal above, so the definition is:

public static class Unsafe
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static unsafe T Read<T>(void* p) where T : struct

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static unsafe void Write<T>(void* p, T value)  where T : struct
    }

I assume this shouldn't work for reference types. I also have not changed the order of parameters for the Write method, although I would prefer it to be Write<T>(T value, void* p), as it seems this was chosen to match existing behaviour and the IL, so it might be more efficient this way (maybe?).

The implementation in IL, which I did by hand from a normal class library, can be found at https://github.com/DotNetCross/Memory.Unsafe/blob/master/il/DotNetCross.Memory.Unsafe.il which contains the following definition of the Unsafe class:

.class public abstract auto ansi sealed beforefieldinit DotNetCross.Memory.Unsafe
       extends [System.Runtime]System.Object
{
  .method public hidebysig static !!T  Read<valuetype .ctor ([System.Runtime]System.ValueType) T>(void* p) cil managed aggressiveinlining
  {
    // Can't be expressed in C#, thus in IL:
    .maxstack 1
    ldarg.0
    ldobj !!T
    ret
  } // end of method Unsafe::Read

  .method public hidebysig static void  Write<valuetype .ctor ([System.Runtime]System.ValueType) T>(void* p, !!T 'value') cil managed aggressiveinlining
  {
    // Can't be expressed in C#, thus in IL:
    .maxstack 2
    ldarg.0
    ldarg.1
    stobj !!T
    ret  
    } // end of method Unsafe::Write
}

With regard to alignment, I agree with @CarolEidt that unaligned read/write does only have a small extra cost for Intel Sandy Bridge or later. However, it is still an extra cost and also means higher cache usage if I remember correctly (I am not an assembly/micro-architecture expert). More importantly, what about other architectures such as ARM? Some ARM architectures don't even have unaligned read/wrire.

Additionally, what about dynamically allocated memory on the stack. Does this also assume aligned access? That is, a contrived exampled could be:

 public static unsafe void StackAllocAlignmentDoNotDoThis()
{
      var ptr = stackalloc byte[128];
      ptr += 3;
      var value = Unsafe.Read<double>(ptr);    
}

Now, you definitely shouldn't do this, but just trying to understand how alignment will work. Perhaps this should be discussed separately, as I have nothing against the proposed API. I would just prefer an API which also had explicit AlignedRead/AlignedWrite, UnalignedRead/UnalignedWrite, with fallback to whatever makes sense for a given platform. There still should be Read/Write which have sensible defaults as @CarolEidt have mentioned are:

  • Heap pointers (e.g. unmanaged or fixed managed: unaligned access.
  • Stack pointers: aligned access.

@jkotas
Copy link
Member

jkotas commented Jan 17, 2016

I assume this shouldn't work for reference types

Constraining T to reference type does not prevent one from using it for reference types. One just needs to go through extra hassle to wrap the reference type in struct:

struct Wrapper<T> { public T Value; };
Unsafe.Read<Wrapper<String>>(p).Value

There are valid situations for Unsafe.Read on reference type (not with Vector<T>). It is interesting question whether the Unsafe class should have artificial restrictions to make it "safer" ... the artificial restrictions in current C# unsafe constructs is why we are having this discussion.

how alignment will work

The abstraction level that this is coded against is IL. In IL, all access is assumed to be sufficiently aligned by default. The unaligned access has to be explicitly requested by unaligned prefix for IL instruction. So the simple natural way to implement UnalignedRead would be:

    ldarg.0
    unaligned. 1 
    ldobj !!T
    ret

It is JIT problem to figure out the most efficient way to translate this to machine instructions on given platform in given context. E.g. on platforms without direct instructions for unaligned access, the JIT has to expand this into series of byte loads unless it can prove that the pointer is aligned.

@nietras
Copy link
Contributor

nietras commented Jan 17, 2016

@jkotas of course, I wasn't trying to imply this would make it "safe" as such, just that this could indicate the expected use case, guess I wasn't imaginative enough :). Could you elaborate on the valid situations for using Unsafe.Read for reference types? And how this would work?

@jkotas
Copy link
Member

jkotas commented Jan 17, 2016

@nietras
Copy link
Contributor

nietras commented Jan 18, 2016

I have updated DotNetCross.Memory.Unsafe to no longer have the generic value type constraint. I have also added SizeOf<T> and AddressOf<T>.

@jkotas not sure I understand how the code you point to would benefit from Unsafe.Read for a reference type?

            fixed (IntPtr* pObj = &obj.m_pEEType)
            {
                IntPtr pData = (IntPtr)pObj;
                IntPtr pField = pData + fieldOffset;
                return LoadReferenceTypeField(pField);
            }

However, I do understand that it is beneficial to be able to cast a pointer (i.e. address) to be seen as a reference to a reference type. That makes sense. Despite all the issues that might follow with the GC etc.

The abstraction level that this is coded against is IL. In IL, all access is assumed to be sufficiently aligned by default. The unaligned access has to be explicitly requested by unaligned prefix for IL instruction.

From this I would assume that all Unsafe.Read/Write should be translated to aligned. However, according to @CarolEidt:

the JIT currently only assumes alignment for SIMD types on the stack. For reads or writes to non-stack locations it uses unaligned writes. It is my understanding that these incur
0-1 cycle of additional overhead for modern processors if the memory location is actually aligned.

Which means that no SIMD non-stack locations will be accessed unaligned. Which would be terrible if it had to do byte loads as mentioned:

It is JIT problem to figure out the most efficient way to translate this to machine instructions on given platform in given context. E.g. on platforms without direct instructions for unaligned access, the JIT has to expand this into series of byte loads unless it can prove that the pointer is aligned.

And this is exactly my concern. That we have no way of forcing through IL the JIT to do aligned accessing, but as you say we are expressing in IL that this should be aligned, the JIT, however, does what it thinks best. So I am not saying we should change the API, I'm just trying to express what we would ideally want. Full control ;)

Finally, there is one scenario that isn't covered. Casting a pointer to a value type pointer i.e. without loading the value type to the stack/register. This could be an issue for big value types e.g.

public struct BigValueType
{
    double M00;
    double M01;
    double M02;
    double M10;
    double M11;
    double M12;
    double M20;
    double M21;
    double M22;
}

Really, we do not want to load this completely if we only wanted to change a part of it e.g.

public void AddToM02(void* ptr, double value)
{
    BigValueType* big = Unsafe.Cast<BigValueType>(ptr);
    big->M02 += value;
}

Something we would like to do in a generic way. So would it even be possible to cast a pointer too? Like this:

   T* Cast<T>(void* p)

Pretty sure C# won't accept this or am I wrong? We would to have the unmanaged constraint for this right?

@mikedn
Copy link
Contributor

mikedn commented Jan 18, 2016

From this I would assume that all Unsafe.Read/Write should be translated to aligned. However, according to @CarolEidt:

The thing is that IL knows nothing about SIMD. As far as IL is concerned Vector<float> is a struct that needs to be 4 byte aligned.

I'm just trying to express what we would ideally want. Full control ;)

It would be better to investigate if aligned SIMD access is truly needed. AFAIK there's no perf penalty if you use unaligned loads/stores (movups & co.) if the address is in fact properly aligned.

Casting a pointer to a value type pointer i.e. without loading the value type to the stack/register. This could be an issue for big value types e.g.

Fixing this requires compiler support. Either ref returns or generic pointers.

We would to have the unmanaged constraint for this right?

I don't think that's needed in this case. The input pointer is already unmanaged.

@jkotas
Copy link
Member

jkotas commented Jan 18, 2016

not sure I understand how the code you point to would benefit from Unsafe.Read

LoadReferenceTypeField can be replaced with a clean Unsafe.Read<Object>(pField) instead of abusing internal Interlock overload that takes address as IntPtr.

we have no way of forcing through IL the JIT to do aligned accessing

This sounds like as optimizer hints, like what C/C++ compilers have. A potential way to express this in IL would be via void Unsafe.Assume(bool condition) that works very similar. So you would write:

    Unsafe.Assume((p & 0x1F) == 0);
    Unsafe.Write<Vector<double>>(p, v);

I would track the optimizer hints separately because of it is a non-trivial feature on its own, and it is unlikely to help the Vector<T> case much.

@nietras
Copy link
Contributor

nietras commented Jan 18, 2016

LoadReferenceTypeField can be replaced with a clean Unsafe.Read<Object>(pField) instead of abusing internal Interlock overload that takes address as IntPtr.

Yeah ok I thought the Interlocked was there for a reason. So the problem was in LoadReferenceTypeField where the Interlocked.Compare was. That makes sense then.

It would be better to investigate if aligned SIMD access is truly needed. AFAIK there's no perf penalty if you use unaligned loads/stores (movups & co.) if the address is in fact properly aligned.

It would definitely be much better to test this instead of my perhaps unfounded theoretical concerns. Although it is not unheard of that unaligned reads/writes can have significant consequences, see http://www.agner.org/optimize/blog/read.php?i=285 where:

Store-to-load forwarding is generally good, but in some unfortunate cases of an unaligned 256-bit read after a smaller write, there is an unusually large delay of more than 200 clock cycles.

But that is probably an outlier. Agner has more info at http://www.agner.org/optimize/#manuals

@mikedn
Copy link
Contributor

mikedn commented Jan 18, 2016

But that is probably an outlier. Agner has more info at http://www.agner.org/optimize/#manuals

Incidentally Agner's manual is one source for my "AFAIK":

13.5 Accessing unaligned data and partial vectors
...
On contemporary processors, there is no penalty for using the unaligned instruction MOVDQU rather than the aligned MOVDQA if the data are in fact aligned. Therefore, it is convenient to use MOVDQU if you are not sure whether the data are aligned or not.

unaligned 256-bit read after a smaller write

That's a bit more complicated and not solely caused by unaligned data. And again, there's a distinction between using unaligned load instructions and unaligned loads.

@nietras
Copy link
Contributor

nietras commented Jan 19, 2016

@jkotas so Unsafe.Assume is a proposed feature? Interesting. I agree, though, that this should be discussed separately.

@mikedn I agree and for x64 this discussion is probably mute, but given that .NET Core is targetting cross platform, we should at least consider consequences for ARM etc. However, this should probably be discussed separately too, I guess.

Fixing this requires compiler support. Either ref returns or generic pointers.
I don't think that's needed in this case. The input pointer is already unmanaged.

Not sure I understand this, isn't it the output pointer type that is the problem for T* Cast<T>(void* p)? For C# to accept this would they not want to make sure T is an unmanaged value type or? Does T* make sense for a reference type? Or is that why you say something like ref T Cast<T>(void* p)?

By the way I have noticed many other projects that have the Unsafe.Read/Write and other unsafe methods e.g. https://github.com/sharpdx/SharpDX/blob/master/Source/SharpDX/Interop.cs So I am sure a lot of people will find value in these.

With regards to API I am happy with its elegant simplicity, so perhaps to move forward placement of API is a more pertinent question. We would prefer it to be as central as possible, as this would also make it useable for most projects without extra dependencies.

@mikedn
Copy link
Contributor

mikedn commented Jan 19, 2016

we should at least consider consequences for ARM etc. However, this should probably be discussed separately too, I guess.

Yep, that's a long story since the SIMD support for ARM is simply non-existent at the moment. And before discussing about Unsafe.Read/Write there's a lot of other things to settle and that includes how SIMD vectors are loaded from normal arrays.

Not sure I understand this, isn't it the output pointer type that is the problem for T* Cast(void* p) ?

The devil is in the details. What does T* mean when T is a reference type? It can be a pointer to an object of type T (which may sound weird but it is technically doable provided that the object is pinned) or it can be a pointer to a reference of type T. I think that the later is what we need in this case.

Either way it works fine as long as the location pointed by the pointer is in stack or in a pinned GC heap object or in native memory. And that's expected since the input to these functions is a void*.

I don't think we'll see T* any time soon in C#. ref returns are more likely and ref T Cast<T>(void* p) makes perfect sense.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2016

@jkotas so Unsafe.Assume is a proposed feature?

I have created one.

@nietras
Copy link
Contributor

nietras commented Jan 21, 2016

@mikedn I though ARM NEON was pretty wide spread in high end smart phones and server ARM.

I have been playing around a bit with Unsafe.Read/Write and for the work we do we think it is essential to add SizeOf to the Unsafe class e.g.:

  [MethodImpl(MethodImplOptions.AggressiveInlining)]
  public static unsafe int SizeOf<T>()

Or in IL:

  .method public hidebysig static int32  SizeOf<T>() cil managed aggressiveinlining
  {
    sizeof !!T
    ret
  }

Not sure how this differs from System.Runtime.InteropServices.Marshal.SizeOf<T>() though, haven't had time to test it, but the Marshal.SizeOf<T> does not appear to be as efficient.

Not sure if SizeOfshould be added as a new issue or it can be handled here too.

@nietras
Copy link
Contributor

nietras commented Jan 21, 2016

A completely different issue I have with Unsafe.Read/Write for Vector<T> types, or probably with Vector<T> as such, is this only allows for reading the full or largest SIMD register i.e. for AVX(2) a register is 256-bit or 32 bytes. For image processing this means we can handle 32 bytes in one SIMD op.

However if we have say 31 bytes, AVX2 is as such not possible since we have less than 32 bytes, therefore, we often revert to using 128-bit (e.g. SSE2) registers instead, for the remaining, so at least 16 of the 32 bytes can be handled fast. As far as I can tell there is no way to use SIMD registers less than the largest available for the given architecture one is running on. I.e. if 256-bit registers are available, there is no way to access/use 128-bit registers, and with Unsafe.Read/Write there is no way to be able to do...

Unless one would have specific types for vectors with static sizes e.g. Vector256bit<T> (pretty bad naming), but I hope you understand. Coming from the world of C++ intrinsics where one can choose the exact instructions to use and one can leverage the fact that a AVX2 CPU also supports SSE2, it seems less than ideal that there is no way of doing the same in C# (and for our case in unmanaged memory). I presume this is actually an issue with how Vector<T> works. If so, I should file an issue for Vector<T>.

@mikedn
Copy link
Contributor

mikedn commented Jan 21, 2016

I though ARM NEON was pretty wide spread in high end smart phones and server ARM.

Yes, it is. But .NET's SIMD support for NEON does not exist at the moment.

Not sure how this differs from System.Runtime.InteropServices.Marshal.SizeOf() though, haven't had time to test it, but the Marshal.SizeOf does not appear to be as efficient.

Marshal.SizeOf<T> returns the object size after marshaling. That size can be very different from the actual object size.

@mikedn
Copy link
Contributor

mikedn commented Jan 21, 2016

However if we have say 31 bytes, AVX2 is as such not possible since we have less than 32 bytes, ...

Something like that is discussed in #16094

@mellinoe
Copy link
Contributor

Since we had some internal discussions regarding this API, and since we are still trying to finalize the API itself, I'm pushing this back to RTM as there's no reason to try to rush this to completion in the next few days.

@terrajobst
Copy link
Member Author

Makes sense to me.

@nietras
Copy link
Contributor

nietras commented Jan 31, 2016

Let us know if there is anything we could do to help with implementing it. I have added CopyBlock, InitBlock to the DotNetCross.Memory.Unsafe project and basic tests for all methods.

One thing that surprised me a bit was how explicit layout was not handled as I expected see https://github.com/DotNetCross/Memory.Unsafe/blob/master/src/DotNetCross.Memory.Unsafe.UnitTests/Unsafe.SizeOf.Test.cs#L29 basically SizeOf ignores the explicit layout attributes and only takes the size of actual members. I assume this is due to how sizeofworks in IL.

@JonHanna
Copy link
Contributor

Does cpblk still have poor performance on x86? The closing of https://connect.microsoft.com/VisualStudio/feedback/details/766977/il-bytecode-method-cpblk-badly-implemented-by-x86-clr isn't clear as to whether cpblk was improved, or no longer used in the cases where it had been.

@nietras
Copy link
Contributor

nietras commented Feb 1, 2016

@JonHanna I added cpblk in order to do some comparison on perf with this and ordinary loops, but haven't had time yet. That said, there if nothing as such wrong with rep movsb dest, source at least not since Ivy Bridge (see Intel Optimization Manual, e.g. section on ERMSB. However, I do agree that the CLR could do more in terms of optimizing foundational routines (e.g. memory copy, init, comparing, move) for specific architectures. And letting the coreclr decide on e.g. SIMD versions on these that are optimal for the platform would be great. However, there are many factors to consider incl. code size etc.

Regarding Unsafe I do have another question though. Right now it is not possible to pin or fix a generic array in C#. That is the following is not legal:

        public static Array1D<T> CreateCopy<T>(T[] source)
            where T : struct
        {
            fixed (void* sourcePtr = &source)  // Cannot fix or pin generic managed array
            {

Is there anyway this can be done using Unsafe hand-written IL? And would this not be interesting to add to the Unsafe class then?

@jkotas
Copy link
Member

jkotas commented Feb 1, 2016

Is there anyway this can be done using Unsafe hand-written IL

For now, it is not very straightforward. One way to do it - this pattern is actually used in several places in BCL:

public class Unsafe
{
    public class PinningHelper {
        public byte Data;
    }

    public static PinningHelper GetPinningHelper(Object o) {
        ldarg.0
        ret
    }
...

fixed (void * pPin = &Unsafe.GetPinningHelper(source).Data)
{
    void * pFirstElement = Unsafe.AddressOf(ref source[0]);
    ...
}

Once byref locals and returns are added to C#, it will become simpler:

public class Unsafe
{
    public static ref byte AsByteRef(ref T value)
    {
        ldarg.0
        ret
    }
...

fixed (void * pFirstElement = &Unsafe.AsByteRef(ref source[0]))
{
    ...
}

@nietras
Copy link
Contributor

nietras commented Feb 1, 2016

@jkotas thanks! So we need to reinterpret the reference type as a PinningHelper so we can then pin through this? I guess that makes sense. So the CLR pins the PinningHelper (which is actually the source just reinterpreted) when we "fix" the Data member. Wouldn't this then mean that a generic cast operation would be interesting? E.g.:

public T Cast<T>(object o)
{
     ldarg.0
     ret
}

which is basically what the GetPinningHelper does? And how does that differ from AddressOf? The IL is the same but the calling parameters are different, of course.

@jkotas
Copy link
Member

jkotas commented Feb 1, 2016

I agree that a generic unsafe casting operations are interesting in general. However, the above pinning pattern is special and so it may deserve a dedicated API - even though it is still just a unsafe cast underneath.

There are number of different interesting unsafe casting operations:

  • AddressOf casts from ref T to void*
  • Cast that you have proposed casts from object to T (it may make sense to constrain T to be a reference type in this case)
  • AsByteRef in my speculative example casts from ref T to ref byte

@nietras
Copy link
Contributor

nietras commented Feb 1, 2016

Yes in C++ terms I would have said we have a reinterpret_cast for reference types and value type references (e.g. AsByteRef). What about a static_cast for value types e.g. from TFrom to TTo as in TTo Cast<TFrom, TTo>(TFrom from) which would be useful for primitive castings e.g. Cast<int, double>(value), but not sure this is possible as such?

Additionally, I do not like the Cast<TFrom, TTo> since this requires specifying the TFrom type. If I would try to make an "ideal" API I would try to make it kind of fluent e.g.

object obj = "test";
var weirdObject = Unsafe.Cast(obj).To<SomeWeirdObject>();
int value = 42;
var valueAsDouble = Unsafe.Cast(value).To<double>();
var array = new double[32];
fixed (void* pinPtr = &Unsafe.Cast(array).ToPinnable().Pin)
{
     void* firstPtr = Unsafe.AddressOf(ref array[0]);
}

with Pinnable being defined as:

public class Pinnable
{
    public byte Pin;
}

I am not a big fan of the term Helper. Whether it should be To or As after Cast is another matter. Now this could be done by adding the following to Unsafe:

        public static ValueCaster<T> Cast<T>(T value)
            where T : struct
        {
            return new ValueCaster<T>(value);
        }

        public static ReferenceCaster Cast(object obj)
        {
            return new ReferenceCaster(obj);
        }

with these types doing the actual cast (that is we have to add the hand-written IL to them):

    public struct ValueCaster<T>
        where T : struct
    {
        public readonly T Value;

        public ValueCaster(T value) {
            Value = value;
        }

        public TTo To<TTo>() where TTo : struct {
              // Add IL
        }
    }

    public struct ReferenceCaster
    {
        public readonly object Object;

        public ReferenceCaster(object obj) {
            Object = obj;
        }

        public TTo To<TTo>() where TTo : class { 
              // Add IL
        }

        public Pinnable ToPinnable()
        { return To<Pinnable>(); }
    }

Not sure it makes sense to actually split these into two types but it allows us to only have ToPinnable for reference types. And not sure the generic constraints are too constrained ;)

I would assume the JIT can actually compile this to machine code without extra runtime costs despite the intermediary types.

@nietras
Copy link
Contributor

nietras commented Feb 1, 2016

I just realized the Cast for object will only be invoked if the type actually is object, so this isn't working as well as I was hoping.
Also I have no idea how the value type casting would be possible, this requires conv.* in IL I think for each specific type.

@jkotas
Copy link
Member

jkotas commented Feb 1, 2016

I like the Pinnable name - it is much better than Helper. I do not think it makes sense to try to combine it with casting using wrapper struct - the wrapper structs go against simplicity that is required to make Unsafe extremely lean and fast. So something like:

fixed (void* pinPtr = &Unsafe.AsPinnable(array).Pin)
{
     void* firstPtr = Unsafe.AddressOf(ref array[0]);
}

Also I have no idea how the value type casting would be possible

Right, it does not work too well. The closes thing you can do is to treat it as unsafe unboxing ... but unsafe unboxing would probably deserve a separate method.

@nietras
Copy link
Contributor

nietras commented Feb 2, 2016

I agree completely that the design goal for Unsafe should be extremely lean and fast, and adding these extra types did add quite a lot of fat to the assembly.

Since we are already talking about a whole family of As* the summarized API design could be the following (all with [MethodImpl(MethodImplOptions.AggressiveInlining)] attribute):

    public static class Unsafe
    {
        public static unsafe T Read<T>(void* p)
        public static unsafe void Write<T>(void* p, T value)
        public static unsafe int SizeOf<T>()
        //
        // All the castings or "As" family of methods
        //
        public static T As<T>(object obj)

        // Perhaps leave out since very close to As<Pinnable>
        public static Pinnable AsPinnable(object obj)

        // Instead of AddressOf
        public static unsafe void* AsPointer<T>(ref T value)

        // Instead of AsByteRef a generic version? When possible in C#
        public static unsafe ref TTo AsRef<TFrom, TTo>(ref TFrom value)

        // If at all possible...
        public static unsafe TTo AsValue<TFrom, TTo>(TFrom value)
    }

    public class Pinnable
    {
        public byte Pin;
    }    

Could you explain how unbox IL might look for AsValue casting? How would this be able to cast from int to double for example?

Note that I put Pinnable in the namespace scope not inside Unsafe since this would allow writing As<Pinnable> and we could remove AsPinnable making it more lean.

@jkotas
Copy link
Member

jkotas commented Feb 2, 2016

public static T As<T>(object obj)

This should constrain T to be a class. It won't work well for valuetypes.

Perhaps leave out since it is very close to As

Agree.

// If at all possible...
public static unsafe TTo AsValue<TFrom, TTo>(TFrom value)

It cannot think of what a reasonable implementation of this would be. And all options that I can think of are just trivial combinations of the other Unsafe APIs (e.g. return Unsafe.AsRef(ref value)).

public class Pinnable

This may be sealed.

@KrzysFR
Copy link

KrzysFR commented Feb 5, 2016

Thanks for the precision. It is indeed the case for me.

Also, I think that getting rid of the fixed statement would remove one level of indentation:

public void WithFixed()
{
    fixed (SomeStruct* self = &this)
    {
        // ...
    }
}

versus

public void WithUnsafeAsPointer()
{
    var self = (SomeStruct*) Unsafe.AsPointer(ref this);
    // ...
}

@KrzysFR
Copy link

KrzysFR commented Feb 9, 2016

I've done some tests with BenchmarkDotNet, to compare the speed of the three methods (static with pointer, instance + fixed, instance + Unsafe.AsPointer), and the speed difference between fixed and Unsafe.AsPointer is almost inexistant, while the static version is a lot faster.

Result of BenchmarkDotNet (x64, RyuJIT)

Method AvrTime Error
Instance_AsPointer 1.4378 ns 0.0023 ns
Instance_Fixed 1.4347 ns 0.0034 ns
Static 0.3551 ns 0.0385 ns

Looking at the assembly code, I see that the version with fixed is not inlined (causing a call), which may be expected.

But when comparing the static version and the Unsafe.AsPointer version, both are inlined, except that there is an additional call in the later that does not seem to be required (maybe I'm reading the code wrong)

This gist contains the test code I used. tl;dr is:

  • I have a VarKey struct (from unmanaged heap) that has a variable byte buffer at the end (what the unmanaged code produces)
  • I have an USlice helper struct that wraps a byte* and a size (what the managed code consumes)
  • I am measuring the fastest way to return an USlice wrapping the "key" part of the VarKey

RyuJIT x64 on .NET 4.6.1

Static version:

; return VarKey.GetKeyStatic(m_ptr);
mov         rax, qword ptr [rcx+10h] ; access 'm_ptr'
cmp         dword ptr [rax], eax     ; attempt to trigger a nullref if null?
lea         rcx, [rax+6]             ; load address of this.Byte0
movzx       eax,word ptr [rax+4]     ; read this.Size
mov         qword ptr [rdx],rcx      ; inlined USlice ctor
mov         dword ptr [rdx+8],eax    ; (cont.)
ret  

AsPointer version:

; return m_ptr->GetKeyUnsafeAsPointer();
sub         rsp,28h
mov         rsi,rdx
mov         rdi, qword ptr [rcx+10h] ; access 'm_ptr'
cmp         dword ptr [rdi], edi     ; attempt to trigger a nullref if null?
lea         rcx, [rdi+6]             ; load address of this.Byte0
call        00007FF7CAE001E0         ; what is this call??
movzx       eax,word ptr [rax+4]     ; read this.Size
mov         qword ptr [rdx],rcx      ; inlined USlice ctor
mov         dword ptr [rdx+8],eax    ; (cont.)
ret  

I'm not sure what the additional call does, maybe Unsafe.AsPointer is not inlined correctly?

@mikedn
Copy link
Contributor

mikedn commented Feb 9, 2016

I'm not sure what the additional call does, maybe Unsafe.AsPointer is not inlined correctly?

It isn't inlined, the IL of AsPointer appears to be

ldarg.0
ret

The argument is a managed pointer but the declared return type is an unmanaged pointer. The JIT doesn't inline the method because the types do not match. I think there should be a conv.u after ldarg.0.

@nietras
Copy link
Contributor

nietras commented Feb 9, 2016

@mikedn I did not know that was an issue, but I haven't perf tested AsPointer much so it could definitely be an issue. I was wondering why this is the case for the JIT? Inlining rules are pretty important, why does this prevent inlining? Is it a general rule? Even in the face of AggresiveInlining?

@jkotas does this make sense? Should AsPointer be defined as?

        public static void* AsPointer<T>(ref T value)
        {
             ldarg.0
             conv.u
             ret
        }

I can easily change this in https://github.com/DotNetCross/Memory.Unsafe

@mikedn
Copy link
Contributor

mikedn commented Feb 9, 2016

Is it a general rule? Even in the face of AggresiveInlining ?

It is a general rule in the current JIT, there's some code in the JIT's importer that looks like:

if (returnType != originalCallType)
{
    impAbortInline(true, false, "Return types are not matching.");
    return false;
}

I suspect that the condition could be relaxed but that may require the JIT to automatically insert the conv.u to keep other compiler pieces happy. Ultimately conversion from a managed pointer to an unmanaged pointer is potentially dangerous so it makes sense that an explicit conversion is required.

@CarolEidt
Copy link
Contributor

@AndyAyersMS may want to comment, as he is working on refactoring (and ultimately tuning) the inlining code in the JIT. The JIT could potentially try to do something to adjust for a mismatch, but I think it could be potentially error-prone or unsafe, and of limited value.

@jkotas
Copy link
Member

jkotas commented Feb 9, 2016

@jkotas does this make sense? Should AsPointer be defined as?

Yes, it makes sense - it is the right way to address this problem.

@nietras
Copy link
Contributor

nietras commented Feb 9, 2016

Thanks to all for the valuable info.

@KrzysFR I have updated https://github.com/DotNetCross/Memory.Unsafe (and the nuget package new version 0.2.1.0) so AsPointer now has the conv.u as specified by Mike. However, I do not really see any reason for you to use this in the case you have described. new USlice(&key->Byte0, key->Size); is probably as efficient as it gets and there is no reason for generic code in this. I would, however, not rely on managed memory, not even on the LOH, to not move. LOH can be subject to compaction, so no guarantees. In my view, if your are using memory as if it is native then allocate native memory for it e.g. using Marshal.AllocHGlobal.

@mikedn @AndyAyersMS without context I would assume the type check here is whether or not the "type" is a reference type or a value type. And if the call changes from reference type to value type or vice versa then this would be a problem. Or are you in fact tracking whether it is a "managed" or "native" type? Because, when T is a value type in AsPointer<T>, in fact in this case a byte, what is the issue with returning the reference to a primitive value type as a void pointer? That is, why should it not be inlined in this case?

@KrzysFR
Copy link

KrzysFR commented Feb 9, 2016

@nietras Thanks I saw that, reran the benchmark, and now I'm getting the same performance as the static version. Though, curiously when I'm extracting the assembly code generated, I'm getting the exact same code as before (with the call) but performance points to it being inlined...? Probably an issue on my part, I'm not sure what is the most efficient way to extract the assembler from a test method with VS on .NET Desktop, without having to CTRL-F5, hookup the debugger, and then attempt to navigate to the test method....

I was using the LoH in my small contrived test case just to speed things up. In the actual code the memory is allocated on a native heap or comes from a memory mapped file.

@mikedn
Copy link
Contributor

mikedn commented Feb 9, 2016

And if the call changes from reference type to value type or vice versa then this would be a problem.

I think there's some sort of confusion between what the code is supposed to be doing and how it does it. It's not about what pointers point to (value types, references types etc.), it's solely about the type of the pointers.

The ldarg.0 instruction pushes a managed pointer (TYP_BYREF in JIT speak). The ret instructions expects to find an unmanaged pointer (TYP_LONG in JIT speak) on the evaluation stack because that's the return type of the method. The 2 types are different and the JIT gives up inlining because nobody put any effort into dealing with such code. And nobody put any effort into it because it's an unusual/rare/dubious case that should not occur in practice.

@KrzysFR
Copy link

KrzysFR commented Feb 9, 2016

And nobody put any effort into it because it's an unusual/rare/dubious case that should not occur in practice.

Except when you are writing interop code with native memory, or using memory mapped files, where this becomes critical (if you want to remove copying). But I understand that it was not an intended use case when .NET/C# was first developped. So yes it is unusual in the sense that only low-level/high performance code (such as the Kestrel http server, or games engines, or databases, or JSON parsers) would need to deal with such things. It would be nice if such code would be easier to write in C# though :)

@mikedn
Copy link
Contributor

mikedn commented Feb 9, 2016

Except when you are writing interop code with native memory, or using memory mapped files, where this becomes critical (if you want to remove copying).

I don't understand how that justifies having mismatched types in IL. Is there's something that prevents the use of conv.u?

@KrzysFR
Copy link

KrzysFR commented Feb 9, 2016

@mikedn Sorry, I did not understood that you were talking about conv.u specifically. I don't know much about IL and this goes well over my head.

@nietras I somehow was able to get VS to show me the proper assembly code for the AsPointer version, and it looks identical to the static one, except that it uses a different register. Thanks for fix!

@nietras
Copy link
Contributor

nietras commented Feb 10, 2016

there's something that prevents the use of conv.u?

No, just trying to understand. And trying to make sure the Unsafe methods are as lean and fast as possible also with regard to IL size. The JIT'ed code looks fine as far as I can tell.

not about what pointers point to (value types, references types etc.), it's solely about the type of the pointers.

ldarg.0 instruction pushes a managed pointer (TYP_BYREF in JIT speak). The ret instructions expects to find an unmanaged pointer (TYP_LONG in JIT speak) on the evaluation stack because that's the return type of the method.

Yeah I was way off :) Thanks for the explanation. I am asking, because this is interesting in the context of Unsafe where some methods are pretty much just "reinterpret casts" of references or pointers. That is, a couple of methods are just ldarg.0; ret as for example T As<T>(object obj) where T : class. Looking at https://github.com/dotnet/coreclr/blob/master/src/jit/typelist.h I would assume this would push a managed pointer TYP_REF (again I am guessing here since I could not find any doc explaining what exactly the different TYP_* are used for or which are in fact used), and returns TYP_REF so everything is fine here.

Looking at a possible future method ref TTo AsRef<TFrom, TTo>(ref TFrom value) this simply has input TYP_BYREF and output TYP_BYREF, so this should also be fine from an inlining perspective too. Right?

My question could then be why is TYP_PTR not used for unmanaged pointers, but TYP_LONG (64-bit I assume)?
In fact, why is TYP_PTR not used at all in the CLR? Which a simple search on github revealed and the comment "not currently used" says. Would a conversion from TYP_BYREF to TYP_PTR be considered legal? Or would this still be considered a conversion from managed to unmanaged?

I guess I do not understand how the distinction of managed pointer or unmanaged pointer is defined. That is, which pointers are considered managed.

Completely outside the scope of this and on a grander scope, have adding native integer types e.g. TYP_NINT and TYP_UNINT ever been considered?

DEF_TP(NINT     ,"nint" , TYP_NINT,     TYP_INT,PS, PS, PS, PST,PS, VTF_INT|VTF_I,  TYPE_REF_INT)

It would have been great to have nint and unint available as types in e.g. C#, which allow arithmetic on them and not like IntPtr or UIntPtr.

@mikedn
Copy link
Contributor

mikedn commented Feb 10, 2016

I would assume this would push a managed pointer TYP_REF (again I am guessing here since I could not find any doc explaining what exactly the different TYP_* are used for or which are in fact used), and returns TYP_REF so everything is fine here.

Yes, TYP_REF is used for object references (type O in IL speak). Note that the same TYP_REF is used for all reference types, in this context the compiler doesn't distinguish between, for example, string and object references.

Looking at a possible future method ref TTo AsRef<TFrom, TTo>(ref TFrom value) this simply has input TYP_BYREF and output TYP_BYREF, so this should also be fine from an inlining perspective too. Right?

Yes, ref parameters use TYP_BYREF and I expect ref returns to use TYP_BYREF as well.

My question could then be why is TYP_PTR not used for unmanaged pointers, but TYP_LONG (64-bit I assume)? In fact, why is TYP_PTR not used at all in the CLR?

TYP_PTR isn't very useful, ultimately unmanaged pointers are just integers. The JIT simply maps them to TYP_INT or TYP_LONG depending on the architecture. The IL spec itself doesn't actually have a distinct unmanaged pointer type, it uses native unsigned int for that.

Would a conversion from TYP_BYREF to TYP_PTR be considered legal? Or would this still be considered a conversion from managed to unmanaged?

It wouldn't be legal (though "legal" is too strong, the IL spec doesn't seem to state anywhere that such code is not legal, just unverifiable).

I guess I do not understand how the distinction of managed pointer or unmanaged pointer is defined. That is, which pointers are considered managed.

Managed pointers are reported to the GC, unmanaged pointers are not.

You, the author of the code, declare which pointers are managed (type & in IL speak) and which pointers are not (type native unsigned int in IL speak). It is irrelevant to what kind of memory they point to. Managed pointers may point to unmanaged memory and vice versa.

The GC needs to know about managed pointers so it can update them when objects move. Managed pointers may point to unmanaged memory in which case they still need to be reported to the GC but the GC will ignore them. Unmanaged pointers do not carry any such requirements. See III.1.1.5.1 and III.1.1.5.2 in ECMA 335 spec.

Completely outside the scope of this and on a grander scope, have adding native integer types e.g. TYP_NINT and TYP_UNINT ever been considered?

As far as the JIT is concerned there's no such thing as native int. When the IL code is imported native int is simply mapped to the appropriate integer type (TYP_INT/TYP_LONG) for the target architecture.

It would have been great to have nint and unint available as types in e.g. C#, which allow arithmetic on them and not like IntPtr or UIntPtr .

See dotnet/roslyn#2788

@nietras
Copy link
Contributor

nietras commented Feb 10, 2016

@mikedn Thank you very much that is very clear. I understand the GC requirements, but had been wondering why there was no concept of a "native pointer" in the jit, but this is just an integer type then, so that is how the distinction is made.

@nietras
Copy link
Contributor

nietras commented Feb 28, 2016

@mellinoe @terrajobst since the timeline for 1.0.0-rc2 has been changed is there any chance Unsafe could be added to this milestone again, instead of RTM? As far as I can tell the API is stable for Unsafe at least with regard to this thread. Not sure about your internal discussions.

I have been thinking about the statements (below) made by @mikedn and how this could perhaps allow an efficient definition of Span<T>.

It is irrelevant to what kind of memory they point to. Managed pointers may point to unmanaged memory and vice versa.
Managed pointers may point to unmanaged memory in which case they still need to be reported to the GC but the GC will ignore them.

Currently Span<T> has the following memory layout:

    public partial struct Span<T> : IEnumerable<T>, IEquatable<Span<T>>
    {
        /// <summary>A managed array/string; or null for native ptrs.</summary>
        internal readonly object Object;
        /// <summary>An byte-offset into the array/string; or a native ptr.</summary>
        internal readonly UIntPtr Offset;
        /// <summary>Fetches the number of elements this Span contains.</summary>
        public readonly int Length;
    }

This uses the UIntPtr Offset to mean an offset when Object != null or to mean a pointer (address) when Object == null, so I guess this is efficient as possible. But the sentence "Managed pointers may point to unmanaged memory" got me thinking whether it is possible to store the unmanaged pointer address in object Object? You would, of course, have to have some way to then query whether object Object is in fact a managed reference type or an unmanaged pointer. However, since the GC knows this shouldn't this be possible to query in some "magic" way? I noticed the tricks used in https://github.com/dotnet/corefxlab/blob/master/src/System.Slices/System/PtrUtils.cs discuss how the above Span<T> layout is used to Get/Set. Just wondering if Span<T> could actually be defined using less memory (4 bytes less only probably)?

I also noticed on dotnet/corefxlab#572 (comment) that an addition to Unsafe API for slices was discussed e.g. adding:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static T ReadUnaligned<T>(ref T p)
{
    ldarg.0          
    unaligned. 1 ldobj !!T
    ret
}

I am considering adding this to DotNetCross.Memory.Unsafe if need be.

By the way is ILSub a tool that will be made be available on nuget at some point? It seems useful. Thinking about rewriting DotNetCross.Memory.Unsafe to use this so it can be build/modified more automatically.

@mikedn
Copy link
Contributor

mikedn commented Feb 28, 2016

the sentence "Managed pointers may point to unmanaged memory" got me thinking whether it is possible to store the unmanaged pointer address in object Object ?

Nope. You can't store a managed pointer in a reference variable as such a pointer may point to the interior of a managed object. When the GC encounters a reference it expects that the reference points to the beginning of the object because it needs to access the method table and the object header.

I've never been curious enough to check how GC tracks managed pointers but they're likely problematic exactly due to the fact that they may point to the interior of a managed object. And probably this is why such pointers can only exist on the stack, you cannot have a struct/class field of managed pointer type.

I also noticed on dotnet/corefxlab#572 (comment) that an addition to Unsafe API for slices was discussed e.g. adding:

ReadUnaligned seems nice to have but maybe we should wait for someone to actually ask for it and provide a valid use case. I would hope that unaligned reads are a very rare.

@nietras
Copy link
Contributor

nietras commented Feb 28, 2016

it needs to access the method table and the object header.

Exactly, so I couldn't understand how this could be possible either but the sentence:

Managed pointers may point to unmanaged memory

Kept nagging me and I thought perhaps there was someway to define Span<T> like:

    public partial struct Span<T> : IEnumerable<T>, IEquatable<Span<T>>
    {
        /// <summary>A managed array/string; OR native pointer</summary>
        internal readonly object Object;
        /// <summary>An byte-offset into the array/string; or a native ptr.</summary>
        internal readonly int Offset;
        /// <summary>Fetches the number of elements this Span contains.</summary>
        public readonly int Length;
    }

That is, one could misuse the Object field to point to unmanaged memory too, the Offset would then be zero for the native pointer case.

@veikkoeeva
Copy link

With regard to alignment, I agree with @CarolEidt that unaligned read/write does only have a small extra cost for Intel Sandy Bridge or later. However, it is still an extra cost and also means higher cache usage if I remember correctly (I am not an assembly/micro-architecture expert). More importantly, what about other architectures such as ARM? Some ARM architectures don't even have unaligned read/wrire.

Without knowing the entire context of the reference, unaligned writes may induce more than "small extra cost". Alois Kraus has one such case examined and written in detail Is This A CPU Bug?. A few quotes:

That is a whooping factor of 42 faster. No wonder that the intel manual recommends to align the variables on page...

Then you will get not 1s but 1,5s of runtime because of the processor bus locking. In reality it is not as bad as this test suggests because if the other application uses correctly aligned variables they will operate at normal speed. But if two applications exhibit the same error they will slow each other down.

If you use an allocator which does not care about natural variable alignment rules such as the GC allocator you can run into issues which can be pretty hard to find...

It might be that unaligned cross-page operations might become expensive too, especially concerning other than x86. One example at How do modern cpus handle crosspage unaligned access?

So other than the potential to generate two page faults from a single operation, it's just another unaligned access. Of course, that still assumes all the caveats of "just another unaligned access" - namely it's only valid on normal (not device) memory, only for certain load/store instructions, has no guarantee of atomicity and may be slow - the microarchitecture will likely synthesise an unaligned access out of multiple aligned accesses1, which means multiple MMU translations, potentially multiple cache misses if it crosses a line boundary, etc.

It thought just to chip in to the discussion and hope this brings new perspective (also, a shameless plug, I think I add soon to a networking issue few points on how could one make the Orleans faster in general and networking more stable in particular, so if you're interested... <edit: at dotnet/orleans#307 (comment)).

@mellinoe
Copy link
Contributor

mellinoe commented Jun 6, 2016

We've added support for an initial set of "Unsafe" operations in RTM in dotnet/corefx#7966 . If anyone has some extra operations they think should be added, I encourage them to open a new issue with more specific details.

@mellinoe mellinoe closed this as completed Jun 6, 2016
@juliusfriedman
Copy link
Contributor

juliusfriedman commented Jun 7, 2016

I would like to see a CallIndirect and As.

I have implemented these as follows:

[System.Security.SuppressUnmanagedCodeSecurity]   
     [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)]
        internal static void CallIndirect<T>(System.IntPtr ptr, ref T result)
        {
            result = default(T);

            if (ptr == System.IntPtr.Zero)
            {
                return;
            }

            //The result in most cases is actually the value not a pointer to it...
            System.IntPtr resultPointer;

            //Get the address of the result or the result itself.
            CallIndirect(ptr, out resultPointer);

            //Determine the size of T (Unsafe.SizeOf<T>)
            int size = Unsafe.ArrayOfTwoElements<T>.AddressingDifference();

            unsafe
            {
                //Make a local reference to the result
                System.TypedReference tr = __makeref(result);

                //Make a pointer to the local reference
                System.IntPtr localPointer = *(System.IntPtr*)(&tr);

                //Make a pointer to the pointer, which when dereferenced can access the result.
                int* sourcePointer = (int*)&resultPointer;

                //Copy from the source pointer to the handle
                System.Buffer.MemoryCopy((void*)sourcePointer, (void*)localPointer, size, size);

            }

            return;
        }
/// <summary>
        /// Given a type and another type will copy the memory between the variables
        /// </summary>
        /// <typeparam name="TSource">Source Type</typeparam>
        /// <typeparam name="TResult">Destination Type</typeparam>
        /// <param name="t">The element to convert</param>
        /// <returns>An instance of U with the memory of T</returns>
        [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)]
        public unsafe static TResult As<TSource, TResult>(TSource t)
        {
            //Determine the size of T
            int sizeOfT = Unsafe.ArrayOfTwoElements<TSource>.AddressingDifference();

            //Make a local reference to the source input
            System.TypedReference trSource = __makeref(t);

            //Make a default value for the result, if TResult needs new a constraint should be added and overloads.
            TResult result = default(TResult);

            //Make a local reference to the result
            System.TypedReference trResult = __makeref(result);

            //Make a pointer to the local reference
            System.IntPtr localReferenceSource = *(System.IntPtr*)(&trSource);

            //Make a pointer to the local reference
            System.IntPtr localReferenceResult = *(System.IntPtr*)(&trResult);

            //Make a pointer to the pointer, which when dereferenced can access the result.
            int* sourcePointer = (int*)&trSource;

            //Make a pointer to the pointer, which when dereferenced can access the result.
            int* destPointer = (int*)&trResult;

            //Copy from the source pointer to the handle
            System.Buffer.MemoryCopy((void*)sourcePointer, (void*)destPointer, sizeOfT, sizeOfT);

            //Return the reference value of the result
            return __refvalue(trResult, TResult);
        }

@glenn-slayden
Copy link

@nietras wrote:
...we have no way of forcing, through IL, the JIT to do aligned accessing... [I] am not saying we should change the API, I'm just trying to express what we would ideally want, full control.

I don't understand what it would mean request that the JIT "force" an alignment. If you want "full control" of alignment as you say, you have to arrange for it in the user code:

// ...align 'dst' by skipping over up to 15 bytes deducted from the overall count 'n'
dst += (n -= -(long)dst & 0xF);
        ldarg dst
        ldarg n
        ldarg dst 
        conv.u8 
        neg 
        ldc.i4.s 0xF
        conv.i8 
        and 
        sub 
        dup 
        starg.s n
        conv.i 
        add 
        starg.s dst

// ...continue now using aligned 'dst'

For example, if this were the preparation for a memcpy operation, the JIT would have no way of knowing whether to align the source or the destination, in the case where it is not possible to have both (Intel recommends prefer the destination in such cases).

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 1.1.0 milestone Jan 31, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Jan 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests