[WIP] Fast codegen-free field access #23783
[WIP] Fast codegen-free field access #23783
Conversation
src/jit/importer.cpp
Outdated
@@ -4269,6 +4283,13 @@ NamedIntrinsic Compiler::lookupNamedIntrinsic(CORINFO_METHOD_HANDLE method) | |||
result = NI_System_Collections_Generic_EqualityComparer_get_Default; | |||
} | |||
} | |||
else if (strcmp(namespaceName, "System.Runtime.CompilerServices") == 0) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need all this - why is not object.GetRawData()
+ Unsafe.Add
enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was resulting in weird unoptimized codegen. (The Memory.Span property getter today shows some of this weird codegen since it uses a very similar trick to what you propose.) This seemed like the safest / most straightforward way to address it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My usual question for this is how hard is it to just fix it in the JIT. You can always introduce special intrinsics, etc., but it does not scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somebody more familiar with JIT would have to chime in, but a cursory inspection makes it seem like modifying the optimizer would be more work than adding this intrinsic. It seems JIT doesn't quite know how to optimize the two-step process of getting a field ref from an object (convert GCREF to BYREF), then further manipulating the BYREF by performing an addition operation. I suspect that there was simply never a scenario for this previously so it never made sense to invest in this seemingly bizarre optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about generated code such as:
3912 cmp dword ptr [rdx], edx
4883C208 add rdx, 8
8B4204 mov eax, dword ptr [rdx+4]
where the null check could be omitted and the addition could be folded into the address mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ObjectHasComponentSize
I am not really happy that the current implementation blurs the boundaries between objectrefs and byrefs. E.g. it would prevent us from experimenting with removing syncblocks. The proper implementation of this method would avoid ever creating a byref pointing to offset 0. If we want to make this better, I would rather spend time on that; not on trying to fix the JIT to produce perfect code for access to negative field offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel you that it's a bit of a hack and has definitely got some code smell to it. And I'd prefer a much cleaner way of doing it. :)
But before it was checked in there were discussions with people familiar with the runtime, object layout, the JIT, and the GC. And it was determined that accessing fields at offset 0 (pMethodTable) would be safe.
You're right that accessing fields at truly negative offsets would be dangerous, and it would interfere both with the GC's ability to ref track and the runtime team's ability to experiment with things like sync block removal. But nobody is proposing accessing fields at negative offsets, let alone optimizing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I'm experimenting with whether it'd be possible to do this using IL replacement rather than GenTree replacement. Would that alleviate some of the concerns people have?
Edit: Interestingly, using IL replacement (through method rewriting a la what we do with Unsafe
and other RuntimeHelpers
routines) not only resulted in worse codegen, it resulted in incorrect codegen.
mdToken tokRawData = MscorlibBinder::GetField(FIELD__RAW_DATA__DATA)->GetMemberDef();
static BYTE ilcode[] = { CEE_LDARG_0,
CEE_LDFLDA,0,0,0,0,
#ifdef _TARGET_64BIT_
CEE_LDC_I4_8,
#else // _TARGET_64BIT_
CEE_LDC_I4_4,
#endif // _TARGET_64BIT_
CEE_SUB,
CEE_LDARG_1,
CEE_ADD,
CEE_RET };
ilcode[2] = (BYTE)(tokRawData);
ilcode[3] = (BYTE)(tokRawData >> 8);
ilcode[4] = (BYTE)(tokRawData >> 16);
ilcode[5] = (BYTE)(tokRawData >> 24);
GetLastNameFromFieldAccessor(System.Reflection.FieldAccessor`2<Person,System.String>, Person):
00007ffc`74536ed0 0f1f440000 nop dword ptr [rax+rax]
00007ffc`74536ed5 488b4108 mov rax,qword ptr [rcx+8] ; rax := this._fieldOffset (also a null check on 'this')
00007ffc`74536ed9 3912 cmp dword ptr [rdx],edx ; null check on 'obj'
00007ffc`74536edb 4883c208 add rdx,8 ; rdx := ref to ((RawData)obj).Data
00007ffc`74536edf 4803c2 add rax,rdx ; ** INCORRECT ** adding _fieldOffset to ref data before subtracting 8
00007ffc`74536ee2 4883c0f8 add rax,0FFFFFFFFFFFFFFF8h
00007ffc`74536ee6 488b00 mov rax,qword ptr [rax] ; deref LastName field
00007ffc`74536ee9 c3 ret
Even though in IL I subtracted 8 first before adding fieldOffset, something in the JIT decided to reorder these operations, resulting in a scenario where rax
might point well off the end of the object and no longer be GC-trackable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you just do => ref Unsafe.As<byte, TField>(ref Unsafe.AddByteOffset(ref RuntimeHelpers.GetRawData(o), _offset));
? The codegen for that may have one extra instruction than the ideal codegen; but I do not think that it warrants workarounds like IL replacement.
it resulted in incorrect codegen
Yep, byref
arithmetic is very slippery slope. Many opportunities for hard to trace down GC holes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The codegen for that may have one extra instruction than the ideal codegen; but I do not think that it warrants workarounds like IL replacement.
Sure, I'll cede that point. Bonus: using the existing code allows the obj null check to avoid an extra register allocation. :)
It would be useful to have some examples that shows how serializers would be changed to use this. |
👍 |
I’ve noticed that recently we’ve been substantially affecting the readability and expressibility of the System.Private.CoreLib code smattering it with Unsafe reads, JIT intrinsics and other memory “hacks” I’m not discounting the benefits - performance that permeates through the stack, but it strikes me that some of these cases would be better done by improving the JIT to optimise more cases or do better. The reasons to me is that it’s “easier” to do this in c# because open source contributors have more knowledge of c# and the JIT Is hard to approach |
And when the JIT does improve, attempting to remove such hacks can require a ton of work. It's not like you can just remove them, you need to check the generated code, perhaps run some benchmarks, deal with unexpected interactions between multiple hacks and so on.
Perhaps the JIT would be easier to approach if the code was better written. Sadly, a lot was written just with short term gains in mind and targeting today's "real world" use cases with 0 consideration for the fact that tomorrow's "real world" might be different. |
Let me open a new issue to track the ldflda + add JIT optimization work. If that comes in, this PR would be able to take advantage of it, and the Memory.Span property getter (the only other similar usage I know of today) would immediately see the benefit of the simplified codegen. |
For the most part, many of these are things (like HWIntrinsics) are things that even native compilers expose because they are non-trivial things to handle in the compiler or (like Unsafe) are places that are limitations in the C# language today. However, I do agree that we shouldn't just be adding new intrinsics for any old case and that we should look at optimizing the JIT instead where possible or trying to provide an appropriate abstractions so that low level details don't bleed into all of our APIs. |
|
||
namespace System.Reflection | ||
{ | ||
public sealed class FieldAccessor<TObject, TField> where TObject : class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is FieldAccessor
not a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need to add an initalized field to ensure it had sensible values for default(FieldAccessor)
private readonly uint _fieldOffset;
private readonly bool _initalized;
and additionally need a check in the GetRef
public ref TField GetRef(TObject obj)
{
if (!_initalized) ThrowHelper.ThrowNotInitalized();
// ...
}
Question is does it add value in terms of the extra cost vs size saved in number of instances and one less indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Ben and I had a brief conversation on this the other day. The type wouldn't be safe to use if it were a struct, as not only would uninitialized usage be a problem, the struct would also be subject to tearing concerns.
There's no real perf hit to making this a class instead of a struct. The intent is that the caller will cache the instance and reuse it. Since the accessor itself is inlined (it's a non-virtual method) and usage doesn't incur a null check on the accessor itself (see the comment at the top of GetRef
), this should take care of any perf concerns.
This might be usefull not just in serializers. Here are our case. In our project we have a set of POCO classes that needs to be copied/cloned. It is very easy to forget to copy the field if you write such code manually and to be fair, writing such code is not pleasant. So, we build our own "cloner" that generate copy/clone code on the fly by emiting IL. |
ae98c02
to
920be54
Compare
I updated the PR based on previous feedback. It's still WIP since I've not tested it to see what codegen is emitted into the callers. Updates since last iteration:
|
All these are methods and properties. This feature as it stands is for fields only. |
@benadams A good example is dotnet/corefx#39254. I will be happy if @GrabYourPitchforks will do the same for properties because it's the case you pointed, but also serializers and composite type handling in Npgsql would benefit from this. |
Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:
|
This isn't going to be finished before the coming repo merge. I've migrated it to a temporary holding area GrabYourPitchforks#1 so that it doesn't get lost in the shuffle. Will reopen at some point in the future once the dust settles. |
Contributes to https://github.com/dotnet/corefx/issues/24390. This is only a proof of concept to solicit discussion.
This is an experimental
FieldAccessor
type which provides read+write access to a class's instance fields in a manner almost as high performance as raw codegen.Lots of open issues here, especially regarding how readonly fields should be treated, whether the type checks in the
FieldAccessor
ctor are correct, and so on. But it serves as a proof of concept that we can pull this off to enable codegen-free fast serialization scenarios.(This also slightly optimizes the
Memory<T>.Span
property getter.)/cc @jkotas @steveharter @stephentoub @dotnet/jit-contrib as FYI