-
Notifications
You must be signed in to change notification settings - Fork 2.7k
JIT: after devirtualization try undoing box and invoking unboxed entry #14698
Conversation
@stephentoub, @jkotas, @briansull PTAL Diffs are a bit tricky to evaluate here since the changes to corelib have perturbed crossgen order and cut out a fair number of instantiations. @stephentoub if you have some simple test cases for the new paths in AwaitUnsafeOnCompleted please send them my way. I've run all the CoreFx tests locally without issue. |
Thanks, Andy. You're looking for tests cases to run or test cases to see the asm for? |
Runnable is probably better. |
First commit onlyThe fact we use R2R for most of the jit-diffs assemblies is an inhibitor. So we only see diffs here in the core library.
Second commit onlyNote most of the improvement here is from instantations of
Second commit only, common methodsIf we filter out methods that exist only in base or diff we can see common method changes. The regression in
Both commits togetherBasically the sum of the two separately.
Sample diffs (simple test case)The jit can now undo the box for [MethodImpl(MethodImplOptions.NoInlining)]
static int CompareTo<T>(T x, T y)
{
if (x is IComparable)
{
return ((IComparable)x).CompareTo(y);
}
return -1;
} -; Lcl frame size = 32
+; Lcl frame size = 48
G_M19816_IG01:
- 57 push rdi
56 push rsi
- 53 push rbx
- 4883EC20 sub rsp, 32
- 8BF1 mov esi, ecx
- 8BFA mov edi, edx
+ 4883EC30 sub rsp, 48
+ 8BF2 mov esi, edx
G_M19816_IG02:
- 48B950F8688CFD7F0000 mov rcx, 0x7FFD8C68F850
- E816B9825F call CORINFO_HELP_NEWSFAST
- 488BD8 mov rbx, rax
- 897308 mov dword ptr [rbx+8], esi
- 48B950F8688CFD7F0000 mov rcx, 0x7FFD8C68F850
- E801B9825F call CORINFO_HELP_NEWSFAST
+ 894C2428 mov dword ptr [rsp+28H], ecx
+ 48B908CE698CFD7F0000 mov rcx, 0x7FFD8C69CE08
+ E896BB825F call CORINFO_HELP_NEWSFAST
488BD0 mov rdx, rax
- 897A08 mov dword ptr [rdx+8], edi
- 488BCB mov rcx, rbx
- E85B2D245C call System.Int32:CompareTo(ref):int:this
+ 897208 mov dword ptr [rdx+8], esi
+ 488D4C2428 lea rcx, bword ptr [rsp+28H]
+ E816685A5C call System.Int32:CompareTo(ref):int:this
90 nop
G_M19816_IG03:
- 4883C420 add rsp, 32
- 5B pop rbx
+ 4883C430 add rsp, 48
5E pop rsi
- 5F pop rdi
C3 ret
-; Total bytes of code 70, prolog size 7 for method Ex:CompareTo(int,int):int
+; Total bytes of code 49, prolog size 5 for method Ex:CompareTo(int,int):int Sample diffs (in changed part of AsyncTaskMethodBuilder)Still working on it... |
@dotnet-bot test Windows_NT x64 corefx_baseline |
In the sample test case; the frame size went up though there seems to be less in it - is that expected? |
Due to using stack rather than heap? |
Does the optimization happen when the simple test case is written as below? I assume so, but just curious as the IL is slightly different between the two. [MethodImpl(MethodImplOptions.NoInlining)]
static int CompareTo<T>(T x, T y)
{
if (x is IComparable c)
{
return c.CompareTo(y);
}
return -1;
} |
@benaadams Yes, there is a copy on the stack now instead of on the heap. |
@justinvp no, not yet, it needs something like #14471 to capture the knowledge gained from a successful type test. |
@AndyAyersMS ah, ok. Thanks. |
Cross-linking to #14178 which has some good discussion. Latest version here mostly cleans up the special casing via type equality. Could go further and remove the unsafe cast bits from the earlier checks too.... |
Actually in we might be able to get to @justinvp's example. Looks like we aren't as proficient about propagating types through |
Improving flow of types through dup helps a little but more is needed. |
Sample diffs (in changed part of AsyncTaskMethodBuilder)Codegen for instantiation that follows the new Still working on verifying changes via an allocation profiler. Something about my somewhat ad-hoc test setup is getting in the way. Inlines into 060044EF System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Int64][System.Int64]:AwaitUnsafeOnCompleted(byref,byref):this
[0 IL=0002 TR=000003 060044F0] [FAILED: too many il bytes] System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Int64][System.Int64]:GetStateMachineBox(byref):ref:this
[1 IL=0173 TR=000153 0600458A] [below ALWAYS_INLINE size] System.Runtime.CompilerServices.ValueTaskAwaiter`1[Int64][System.Int64]:GetTask():ref:this
[2 IL=0006 TR=000178 060024E2] [profitable inline] System.Threading.Tasks.ValueTask`1[Int64][System.Int64]:AsTask():ref:this
[3 IL=0016 TR=000210 060044FB] [aggressive inline attribute] System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Int64][System.Int64]:GetTaskForResult(long):ref
[4 IL=0663 TR=000413 06002254] [profitable inline] System.Threading.Tasks.Task`1[Int64][System.Int64]:.ctor(long):this
[5 IL=0012 TR=000434 060022D1] [profitable inline] System.Threading.Tasks.Task:.ctor(bool,int,struct):this
[6 IL=0001 TR=000448 06000102] [below ALWAYS_INLINE size] System.Object:.ctor():this
[0 IL=0184 TR=000166 06004512] [FAILED: noinline per IL/cached result] System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
G_M44580_IG01:
57 push rdi
56 push rsi
53 push rbx
4883EC40 sub rsp, 64
C5F877 vzeroupper
33C0 xor rax, rax
4889442430 mov qword ptr [rsp+30H], rax
488BF2 mov rsi, rdx
G_M44580_IG02:
498BD0 mov rdx, r8
E804F4FFFF call System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Int64][System.Int64]:GetStateMachineBox(byref):ref:this
488BF8 mov rdi, rax
480FBE0E movsx rcx, byte ptr [rsi]
G_M44580_IG03:
C4E17A6F06 vmovdqu xmm0, qword ptr [rsi]
C4E17A7F442430 vmovdqu qword ptr [rsp+30H], xmm0 // copy the awaiter locally instead of boxing
G_M44580_IG04:
488B4C2430 mov rcx, gword ptr [rsp+30H] // from inlined AsTask/GetTaskForResult; check for task
4885C9 test rcx, rcx
754D jne SHORT G_M44580_IG08
488B742438 mov rsi, qword ptr [rsp+38H] // if no task, see result is zero so we can use cache
4885F6 test rsi, rsi
7523 jne SHORT G_M44580_IG06 // jump if we can't use cache
G_M44580_IG05:
48B928303A3CFD7F0000 mov rcx, 0x7FFD3C3A3028 // fetch cached task
BAD4000000 mov edx, 212
E80966375F call CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS
48B90029001075010000 mov rcx, 0x17510002900
488B19 mov rbx, gword ptr [rcx]
EB1D jmp SHORT G_M44580_IG07
G_M44580_IG06:
48B938C14B3CFD7F0000 mov rcx, 0x7FFD3C4BC138 // cons up new task
E8DBA2365F call CORINFO_HELP_NEWFAST
C7403400000001 mov dword ptr [rax+52], 0x1000000
48897038 mov qword ptr [rax+56], rsi
488BD8 mov rbx, rax
G_M44580_IG07:
488BCB mov rcx, rbx
G_M44580_IG08:
488BD7 mov rdx, rdi
41B801000000 mov r8d, 1
E8CC014257 call System.Runtime.CompilerServices.TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
90 nop
G_M44580_IG09:
4883C440 add rsp, 64
5B pop rbx
5E pop rsi
5F pop rdi
C3 ret |
This is ready for review now... |
src/jit/gentree.cpp
Outdated
GenTree* blkArg = copyDst->gtOp.gtOp1; | ||
assert(blkArg->gtOper == GT_ADD); | ||
|
||
// Todo: verify addends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you wanted to add verification that you are adding +4/+8 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, thanks for catching that. Added with the latest commit.
virtual CORINFO_METHOD_HANDLE getUnboxedEntry( | ||
CORINFO_METHOD_HANDLE ftn, | ||
bool* requiresInstMethodTableArg = NULL /* OUT */ | ||
) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh Good, a Jit Interface change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, need a new guid, don't I....
If you use perfview adding |
@benaadams thanks, problem is that I'm not seeing the expected allocations in perfview, and can't get VS's profiler to work at all. Going to try a release build with perfview. Maybe checked builds do odd things. |
K, I was having issues because the ETW tracing causes it to take a different path; so you need to switch it off for Tasks |
On ubuntu, GC/Scenarios/LeakWheel/leakwheel/leakwheel.sh failed. At first glance it doesn't look like this change could be the cause, and earlier PR legs passed. Seems like also this test may have some random behavior. So am going to retry. @dotnet-bot retest Ubuntu x64 Checked Innerloop Build and Test |
Ubuntu now seems to be stuck running @sandreenko can you look over the SPMI changes I've made here and see if you spot anything wrong? I can try and repro but it may take a while. My local ubuntu box (which I dual boot with Win10) is offline stuck in "grub repair" after a Win10 update. I'm going to modify the shim collector to pass a local for the optional arg and then only report it back if the caller wants it. That way the optional result is always recorded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good,
Windows_NT arm64 Cross Checked Innerloop Build and Test JIT\Performance\CodeQuality\Roslyn\CscBench
|
ac45c0b
to
1471c9c
Compare
LGTM |
@stephentoub look ok to you now? |
Yup, thanks for doing this. I noticed in the asm you shared there are opportunities to pare that down further, in the implementation of the {Configured}ValueTaskAwaiter.GetTask, but I can do that subsequently. |
This may have regressed for #20936 (comment) ((IStateMachineBoxAwareAwaiter)awaiter).AwaitUnsafeOnCompleted(box) |
Added issue https://github.com/dotnet/coreclr/issues/20937 |
Take advantage of dotnet#14698 to avoid boxing the TRest argument and improve devirtualization.
Take advantage of dotnet#14698 to avoid boxing the TRest argument and improve devirtualization.
Take advantage of dotnet#14698 to avoid boxing the TRest argument and improve devirtualization.
Take advantage of dotnet#14698 to avoid boxing the TRest argument and improve devirtualization.
Take advantage of dotnet#14698 to avoid boxing the TRest argument and improve devirtualization.
Take advantage of dotnet#14698 to avoid boxing the TRest argument and improve devirtualization.
Take advantage of #14698 to avoid boxing the TRest argument and improve devirtualization.
Take advantage of dotnet/coreclr#14698 to avoid boxing the TRest argument and improve devirtualization. Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Take advantage of dotnet/coreclr#14698 to avoid boxing the TRest argument and improve devirtualization. Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Take advantage of dotnet/coreclr#14698 to avoid boxing the TRest argument and improve devirtualization. Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Take advantage of dotnet/coreclr#14698 to avoid boxing the TRest argument and improve devirtualization. Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Take advantage of dotnet/coreclr#14698 to avoid boxing the TRest argument and improve devirtualization. Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
…) (#16844) Take advantage of dotnet/coreclr#14698 to avoid boxing the TRest argument and improve devirtualization. Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
See commits for details.
As per usual I need to finish off the implementation for SPMI and prepare desktop for jit interface changes.
Also still looking at diffs.