Read method without locals #65

buybackoff · 2022-02-21T23:05:35Z

I've compared the Read method with Unsafe implementation I used to use before. It's faster than Unsafe, but I found cases in my code where perf dropped by 9%. Without locals the perf improved as expected by 5% vs original or 14% vs the version with locals.

   [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static T Read<T>(object array, int index)
        where T : class
    {
        // IL.DeclareLocals(false, typeof(byte).MakeByRefType());
        //
        // Ldarg(nameof(array));
        // Stloc_0(); // convert the object pointer to a byref
        // Ldloc_0(); // load the object pointer as a byref

        Ldarga(nameof(array));
        Ldind_Ref();

Fir the Disruptor benchmark the performance is identical. My theory is that it's basically the same stuff, but with too many IL locals JIT sometimes gives up to optimize, or something from this genre. This particular benchmark was always very sensitive to locals even in the inlined method it does not directly go through.

The text was updated successfully, but these errors were encountered:

ltrzesniewski · 2022-02-22T09:51:58Z

I'm surprised this doesn't thow an InvalidProgramException. This changes a & type on the stack to an O type, and arithmethic on those is not supported:

buybackoff · 2022-02-22T11:31:16Z

It's weird - the distinction between object reference and managed pointer is of little value. Object reference is a manager pointer to a method table. The math should work, only the verifier could complain.

But, .NET itself uses RawArrayData and supposedly it's as fast as it could be, and safe. They could do that because it's only .NET Core, they do not need to account for different layout because they define it.

Calculating the array data offset and storing it in a static readonly proves to be difficult, JIT magic with treating it as a constant does not happen, at least reliably. It requires tiered compilation and the value must be initialized in tier 0 to be treated as a constant in tier 1, any long-running loops must be recompiled in tier 1.

However, we could change ArrayDataOffset to calculate the offset not from the method table, but from the first data byte. Like this:

public static unsafe int ArrayDataOffset2
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    get => sizeof(IntPtr) == 4
        ? RuntimeHelpers.OffsetToStringData == 8 ? 4 : 12
        : RuntimeHelpers.OffsetToStringData == 12 ? 8 : 24;
}

private class RawData<T>
{
    public T Data = default!;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static T Read1<T>(object array, int index)
    where T : class
{
    return Unsafe.AddByteOffset(ref Unsafe.As<RawData<T>>(array).Data, (nint)(uint)(ArrayDataOffset2 + index * Unsafe.SizeOf<T>()));
}

On my current noisy machine where lots of stuff running this gives very significant throughput improvement for current master in OneToOneSequencedThroughputTest_ThreadAffinity bench, both on the same and different cores.

Need to check ArrayDataOffset2 values for cases other than x64 .NET Core.

I will send a PR for that.

buybackoff · 2022-02-22T11:46:24Z

Brr, this throughput numbers mean very little with different batching on a noisy machine. Need more precise measurement and some extra work 😄

ltrzesniewski · 2022-02-22T12:20:40Z

It's weird - the distinction between object reference and managed pointer is of little value. Object reference is a manager pointer to a method table. The math should work, only the verifier could complain.

Yes, I know about that, but the spec is pretty explicit about it:

Managed pointers are not interchangeable with object references.

Though the current code is storing an O value in a & local, and I couldn't find a mention in ECMA-335 which would allow that in the first place...

ltrzesniewski · 2022-02-22T12:23:57Z

However, we could change ArrayDataOffset to calculate the offset not from the method table, but from the first data byte.

You should calculate this using an array instead of a regular object (T[] instead of RawData<T>) as you should not assume the CLR uses the same layout for objects and for arrays.

buybackoff · 2022-02-22T13:01:15Z

@ltrzesniewski Unsafe.As<RawData<T>>(array).Data points right after the method table. What we do now is pointing to the method table. In array case, the .Data points to its Length slot. Do you know about differences in the method table size? Or other stuff that could be placed before .Data on different implementations?

ltrzesniewski · 2022-02-22T13:05:32Z

In array case, the .Data points to its Length slot.

Exactly. Don't we want the offset between the first element and the method table, thus skipping the length slot?

buybackoff · 2022-02-22T13:12:09Z

Don't we want the offset between the first element and the method table, thus skipping the length slot?

We can calculate it, but we cannot make it a JIT constant in easy way.

So now we have on x64: FirstOffset = MT_Ptr + ArrayDataOffset = MT_Ptr + 8 (MT_PtrSize) [.Data is here] + 4 (uint Length) + 4 (Padding) . What I propose is to just point past MT_Ptr and use existing knowledge about different stuff after .Data on different implementations.

ltrzesniewski · 2022-02-22T14:55:05Z

we cannot make it a JIT constant in easy way.

Oh, ok, I see 👍

use existing knowledge about different stuff after .Data on different implementations.

But that's exactly what ArrayDataOffset does... how would you like to change that more precisely?

buybackoff · 2022-02-22T14:57:02Z

But that's exactly what ArrayDataOffset does... how would you like to change that more precisely?

By using Unsafe and not Ldind.Ref and still avoiding locals.

ltrzesniewski · 2022-02-22T15:04:17Z

Oh, ok, sorry, I misunderstood what you were saying earlier 👍

buybackoff · 2022-02-23T17:30:59Z

@ltrzesniewski

Also this comment about managed pointers to zero: dotnet/coreclr#20386

So I'm confused.

buybackoff · 2022-02-23T18:27:37Z

dotnet/runtime#65793

buybackoff · 2022-02-24T20:01:39Z

The current implementation is optimal for x-plat.

For .NET Core it works even with simple Ldarg(nameof(array)) + offset, and I think it should works like that and the O and & separation is artificial both conceptually and implementation-wise. But for this kind of things there is the linked discussion.

ltrzesniewski · 2022-02-24T23:04:26Z

I suppose the reason for having both O and & types is performance: GC scans should be faster for O types, as the GC can assume the value is a pointer to a method table. This gets more complicated for & values, which can point to anywhere inside an object.

But I'm very interested in the answer to your linked question. 🙂

buybackoff closed this as completed Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read method without locals #65

Read method without locals #65

buybackoff commented Feb 21, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 22, 2022 •

edited

buybackoff commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 23, 2022

buybackoff commented Feb 23, 2022

buybackoff commented Feb 24, 2022

ltrzesniewski commented Feb 24, 2022

Read method without locals #65

Read method without locals #65

Comments

buybackoff commented Feb 21, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 22, 2022 • edited

buybackoff commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 22, 2022

ltrzesniewski commented Feb 22, 2022

buybackoff commented Feb 23, 2022

buybackoff commented Feb 23, 2022

buybackoff commented Feb 24, 2022

ltrzesniewski commented Feb 24, 2022

buybackoff commented Feb 22, 2022 •

edited