Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read method without locals #65

Closed
buybackoff opened this issue Feb 21, 2022 · 15 comments
Closed

Read method without locals #65

buybackoff opened this issue Feb 21, 2022 · 15 comments

Comments

@buybackoff
Copy link
Contributor

I've compared the Read method with Unsafe implementation I used to use before. It's faster than Unsafe, but I found cases in my code where perf dropped by 9%. Without locals the perf improved as expected by 5% vs original or 14% vs the version with locals.

   [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static T Read<T>(object array, int index)
        where T : class
    {
        // IL.DeclareLocals(false, typeof(byte).MakeByRefType());
        //
        // Ldarg(nameof(array));
        // Stloc_0(); // convert the object pointer to a byref
        // Ldloc_0(); // load the object pointer as a byref

        Ldarga(nameof(array));
        Ldind_Ref();

Fir the Disruptor benchmark the performance is identical. My theory is that it's basically the same stuff, but with too many IL locals JIT sometimes gives up to optimize, or something from this genre. This particular benchmark was always very sensitive to locals even in the inlined method it does not directly go through.

@ltrzesniewski
Copy link
Contributor

I'm surprised this doesn't thow an InvalidProgramException. This changes a & type on the stack to an O type, and arithmethic on those is not supported:

image

@buybackoff
Copy link
Contributor Author

buybackoff commented Feb 22, 2022

It's weird - the distinction between object reference and managed pointer is of little value. Object reference is a manager pointer to a method table. The math should work, only the verifier could complain.

But, .NET itself uses RawArrayData and supposedly it's as fast as it could be, and safe. They could do that because it's only .NET Core, they do not need to account for different layout because they define it.

Calculating the array data offset and storing it in a static readonly proves to be difficult, JIT magic with treating it as a constant does not happen, at least reliably. It requires tiered compilation and the value must be initialized in tier 0 to be treated as a constant in tier 1, any long-running loops must be recompiled in tier 1.

However, we could change ArrayDataOffset to calculate the offset not from the method table, but from the first data byte. Like this:

public static unsafe int ArrayDataOffset2
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    get => sizeof(IntPtr) == 4
        ? RuntimeHelpers.OffsetToStringData == 8 ? 4 : 12
        : RuntimeHelpers.OffsetToStringData == 12 ? 8 : 24;
}

private class RawData<T>
{
    public T Data = default!;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static T Read1<T>(object array, int index)
    where T : class
{
    return Unsafe.AddByteOffset(ref Unsafe.As<RawData<T>>(array).Data, (nint)(uint)(ArrayDataOffset2 + index * Unsafe.SizeOf<T>()));
}

On my current noisy machine where lots of stuff running this gives very significant throughput improvement for current master in OneToOneSequencedThroughputTest_ThreadAffinity bench, both on the same and different cores.

Need to check ArrayDataOffset2 values for cases other than x64 .NET Core.

I will send a PR for that.

@buybackoff
Copy link
Contributor Author

Brr, this throughput numbers mean very little with different batching on a noisy machine. Need more precise measurement and some extra work 😄

@ltrzesniewski
Copy link
Contributor

It's weird - the distinction between object reference and managed pointer is of little value. Object reference is a manager pointer to a method table. The math should work, only the verifier could complain.

Yes, I know about that, but the spec is pretty explicit about it:

Managed pointers are not interchangeable with object references.

Though the current code is storing an O value in a & local, and I couldn't find a mention in ECMA-335 which would allow that in the first place...

@ltrzesniewski
Copy link
Contributor

However, we could change ArrayDataOffset to calculate the offset not from the method table, but from the first data byte.

You should calculate this using an array instead of a regular object (T[] instead of RawData<T>) as you should not assume the CLR uses the same layout for objects and for arrays.

@buybackoff
Copy link
Contributor Author

@ltrzesniewski Unsafe.As<RawData<T>>(array).Data points right after the method table. What we do now is pointing to the method table. In array case, the .Data points to its Length slot. Do you know about differences in the method table size? Or other stuff that could be placed before .Data on different implementations?

@ltrzesniewski
Copy link
Contributor

In array case, the .Data points to its Length slot.

Exactly. Don't we want the offset between the first element and the method table, thus skipping the length slot?

@buybackoff
Copy link
Contributor Author

Don't we want the offset between the first element and the method table, thus skipping the length slot?

We can calculate it, but we cannot make it a JIT constant in easy way.

So now we have on x64: FirstOffset = MT_Ptr + ArrayDataOffset = MT_Ptr + 8 (MT_PtrSize) [.Data is here] + 4 (uint Length) + 4 (Padding) . What I propose is to just point past MT_Ptr and use existing knowledge about different stuff after .Data on different implementations.

@ltrzesniewski
Copy link
Contributor

we cannot make it a JIT constant in easy way.

Oh, ok, I see 👍

use existing knowledge about different stuff after .Data on different implementations.

But that's exactly what ArrayDataOffset does... how would you like to change that more precisely?

@buybackoff
Copy link
Contributor Author

But that's exactly what ArrayDataOffset does... how would you like to change that more precisely?

By using Unsafe and not Ldind.Ref and still avoiding locals.

@ltrzesniewski
Copy link
Contributor

Oh, ok, sorry, I misunderstood what you were saying earlier 👍

@buybackoff
Copy link
Contributor Author

@ltrzesniewski
image

Also this comment about managed pointers to zero: dotnet/coreclr#20386

So I'm confused.

@buybackoff
Copy link
Contributor Author

dotnet/runtime#65793

@buybackoff
Copy link
Contributor Author

The current implementation is optimal for x-plat.

For .NET Core it works even with simple Ldarg(nameof(array)) + offset, and I think it should works like that and the O and & separation is artificial both conceptually and implementation-wise. But for this kind of things there is the linked discussion.

@ltrzesniewski
Copy link
Contributor

I suppose the reason for having both O and & types is performance: GC scans should be faster for O types, as the GC can assume the value is a pointer to a method table. This gets more complicated for & values, which can point to anywhere inside an object.

But I'm very interested in the answer to your linked question. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants