New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Remove JTRUE(relop) invariant in the backend #82766
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsToday the backend (and much of the JIT) allows only JTRUE with a relop With this, we can now enable an optimization that turns Not so many diffs are expected, but they should come with #79283. Currently based on top of #82610. Example diffs: [MethodImpl(MethodImplOptions.NoInlining)]
public static int Foo(int a, int b, int c)
{
if (a < b & b < c & c < 10)
return a * 42 + b;
return 13;
} 6B01001F cmp w0, w1
7A42B024 ccmp w1, w2, z, lt
7A4AB844 ccmp w2, #10, z, lt
- 9A9FA7E2 cset x2, lt
- 340000A2 cbz w2, G_M35561_IG05
- ;; size=20 bbWeight=1 PerfScore 3.00
-G_M35561_IG03: ;; offset=001CH
+ 540000AA bge G_M35561_IG05
+ ;; size=16 bbWeight=1 PerfScore 2.50
+G_M35561_IG03: ;; offset=0018H
52800542 mov w2, #42
1B020400 madd w0, w0, w2, w1
;; size=8 bbWeight=0.50 PerfScore 1.25
[MethodImpl(MethodImplOptions.NoInlining)]
public static int Foo(int a, int b, int c)
{
if (a < b & b < c & c < 10)
return 42;
return 13;
} G_M35561_IG01: ;; offset=0000H
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M35561_IG02: ;; offset=0008H
+ 528001A3 mov w3, #13
+ 52800544 mov w4, #42
6B01001F cmp w0, w1
7A42B024 ccmp w1, w2, z, lt
7A4AB844 ccmp w2, #10, z, lt
- 9A9FA7E0 cset x0, lt
- 528001A1 mov w1, #13
- 52800542 mov w2, #42
- 7100001F cmp w0, #0
- 1A820020 csel w0, w1, w2, eq
- ;; size=32 bbWeight=1 PerfScore 4.00
-G_M35561_IG03: ;; offset=0028H
+ 1A84A060 csel w0, w3, w4, ge
+ ;; size=24 bbWeight=1 PerfScore 3.00
+G_M35561_IG03: ;; offset=0020H
A8C17BFD ldp fp, lr, [sp], #0x10
D65F03C0 ret lr
;; size=8 bbWeight=1 PerfScore 2.00
|
Today the backend (and much of the JIT) allows only JTRUE with a relop operand, and it is always assumed that this relop appears right before JTRUE in linear order. This change adds support for JTRUE nodes with arbitrary integral operands from lowering and onwards. The operand is also allowed to appear at arbitrary locations in LIR. With this, we can now enable an optimization that turns EQ/NE(relop/SETCC, 0) into just a (potentially) reversed condition. This improves codegen for some cases with SELECTCC and gives us a simple way to allow for compare chains that feed conditional branches in the future.
4ea9856
to
dd46020
Compare
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn |
Azure Pipelines successfully started running 3 pipeline(s). |
cc @dotnet/jit-contrib PTAL @kunalspathak @BruceForstall @a74nh With this PR #79283 should not need any (significant) backend work. |
// Return Value: | ||
// The next node to lower (usually nullptr). | ||
// | ||
GenTree* Lowering::LowerJTrue(GenTreeOp* jtrue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split out LA64 LowerJTrue
logic since it is very different from the rest of the backends. It lowers the new JTRUE(op)
nodes by turning them into JCMP(op, 0, NE)
. @shushanhf this is untested, can you please double check that it looks ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will check it for LA64.
Thanks.
use.ReplaceWith(op1); | ||
BlockRange().Remove(cmp->gtGetOp2()); | ||
BlockRange().Remove(cmp); | ||
return next; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difficulty I have with debugging the lowering phase, is that nodes can get renamed or moved, and there is no record in the JITDUMP except for the node dump before and after the phase. It's not always obvious what happened, especially when your work isn't focused on the lowering phase.
Not for this PR, but some extra debug throughout lowering would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should have more dump logging in lowering. I'll add some to this transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional dumping looks good. I think the "before" IR isn't required in these cases as that IR would have been printed the last time it was changed, but happy for it to stay like this.
I'm wondering (and this is unrelated to your change): in your examples, the C# code uses |
Yes, with |
@BruceForstall @a74nh I added some JITDUMP logging. I had to add a couple of new helpers to deal with the fact that there are no explicit flags edges, so there are new For the example above the jitdump is now: *************** Starting PHASE Lowering nodeinfo
[000006] is a candidate for CCMP:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- t2 = ▌ LT int $100
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- t5 = ▌ LT int $101
┌──▌ t2 int
├──▌ t5 int
N007 ( 13, 7) [000006] ----------- t6 = ▌ AND int $102
Conversion was legal. Result:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- ▌ CMP void
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- ▌ CCMP void cond=SLT flags=z
N007 ( 13, 7) [000006] ----------- t6 = SETCC int cond=SLT
[000010] is a candidate for CCMP:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- ▌ CMP void
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- ▌ CCMP void cond=SLT flags=z
N007 ( 13, 7) [000006] ----------- t6 = SETCC int cond=SLT
N008 ( 1, 1) [000007] ----------- t7 = LCL_VAR int V02 arg2 u:1 (last use) $82
N009 ( 1, 2) [000008] -c--------- t8 = CNS_INT int 10 $44
┌──▌ t7 int
├──▌ t8 int
N010 ( 6, 4) [000009] ----------- t9 = ▌ LT int $103
┌──▌ t6 int
├──▌ t9 int
N011 ( 20, 12) [000010] ----------- t10 = ▌ AND int $104
Conversion was legal. Result:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
N008 ( 1, 1) [000007] ----------- t7 = LCL_VAR int V02 arg2 u:1 (last use) $82
N009 ( 1, 2) [000008] -c--------- t8 = CNS_INT int 10 $44
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- ▌ CMP void
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- ▌ CCMP void cond=SLT flags=z
┌──▌ t7 int
├──▌ t8 int
N010 ( 6, 4) [000009] ----------- ▌ CCMP void cond=SLT flags=z
N011 ( 20, 12) [000010] ----------- t10 = SETCC int cond=SLT |
I reworked the JITDUMP handling a bit more in the latest commit. [000006] is a candidate for CCMP:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- t2 = ▌ LT int $100
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- t5 = ▌ LT int $101
┌──▌ t2 int
├──▌ t5 int
N007 ( 13, 7) [000006] ----------- t6 = ▌ AND int $102
Conversion was legal. Result:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- ▌ CMP void
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- ▌ CCMP void cond=SLT flags=z
N007 ( 13, 7) [000006] ----------- t6 = SETCC int cond=SLT
[000010] is a candidate for CCMP:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- ▌ CMP void
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- ▌ CCMP void cond=SLT flags=z
N007 ( 13, 7) [000006] ----------- t6 = SETCC int cond=SLT
N008 ( 1, 1) [000007] ----------- t7 = LCL_VAR int V02 arg2 u:1 (last use) $82
N009 ( 1, 2) [000008] -c--------- t8 = CNS_INT int 10 $44
┌──▌ t7 int
├──▌ t8 int
N010 ( 6, 4) [000009] ----------- t9 = ▌ LT int $103
┌──▌ t6 int
├──▌ t9 int
N011 ( 20, 12) [000010] ----------- t10 = ▌ AND int $104
Conversion was legal. Result:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
N008 ( 1, 1) [000007] ----------- t7 = LCL_VAR int V02 arg2 u:1 (last use) $82
N009 ( 1, 2) [000008] -c--------- t8 = CNS_INT int 10 $44
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- ▌ CMP void
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- ▌ CCMP void cond=SLT flags=z
┌──▌ t7 int
├──▌ t8 int
N010 ( 6, 4) [000009] ----------- ▌ CCMP void cond=SLT flags=z
N011 ( 20, 12) [000010] ----------- t10 = SETCC int cond=SLT
Lowering select:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
N008 ( 1, 1) [000007] ----------- t7 = LCL_VAR int V02 arg2 u:1 (last use) $82
N009 ( 1, 2) [000008] -c--------- t8 = CNS_INT int 10 $44
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- ▌ CMP void
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- ▌ CCMP void cond=SLT flags=z
┌──▌ t7 int
├──▌ t8 int
N010 ( 6, 4) [000009] ----------- ▌ CCMP void cond=SLT flags=z
N011 ( 20, 12) [000010] ----------- t10 = SETCC int cond=SGE
N014 ( 1, 2) [000014] ----------- t14 = CNS_INT int 13 $45
N015 ( 1, 2) [000016] ----------- t16 = CNS_INT int 42 $46
┌──▌ t10 int
├──▌ t14 int
├──▌ t16 int
N016 ( 25, 20) [000018] ----------- t18 = ▌ SELECT int
Converted to SELECTCC:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
N002 ( 1, 1) [000001] ----------- t1 = LCL_VAR int V01 arg1 u:1 $81
N004 ( 1, 1) [000003] ----------- t3 = LCL_VAR int V01 arg1 u:1 (last use) $81
N005 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V02 arg2 u:1 $82
N008 ( 1, 1) [000007] ----------- t7 = LCL_VAR int V02 arg2 u:1 (last use) $82
N009 ( 1, 2) [000008] -c--------- t8 = CNS_INT int 10 $44
N014 ( 1, 2) [000014] ----------- t14 = CNS_INT int 13 $45
N015 ( 1, 2) [000016] ----------- t16 = CNS_INT int 42 $46
┌──▌ t0 int
├──▌ t1 int
N003 ( 6, 3) [000002] ----------- ▌ CMP void
┌──▌ t3 int
├──▌ t4 int
N006 ( 6, 3) [000005] ----------- ▌ CCMP void cond=SLT flags=z
┌──▌ t7 int
├──▌ t8 int
N010 ( 6, 4) [000009] ----------- ▌ CCMP void cond=SLT flags=z
┌──▌ t14 int
├──▌ t16 int
N016 ( 25, 20) [000018] ----------- t18 = ▌ SELECTCC int cond=SGE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
// markFlagsOperands - Whether the dataflow algorithm should also try to | ||
// mark operands that are defining flags. For example, | ||
// GT_JCC implicitly consumes the flags defined by the | ||
// previous node in linear order. If this argument is | ||
// true, then the algorithm will also include the flags | ||
// definition (which is e.g. a CMP node) in the returned | ||
// range. | ||
// This works on a best-effort basis; the user should not | ||
// rely on all flags defs to be found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this down to "Template arguments"?
src/coreclr/jit/lir.cpp
Outdated
// range contains only nodes in the dataflow tree and false | ||
// otherwise. | ||
// | ||
// Type arguments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this "Template arguments:" instead?
src/coreclr/jit/lir.cpp
Outdated
// root - The root of the dataflow tree. | ||
// isClosed - An output parameter that is set to true if the returned | ||
// range contains only nodes in the dataflow tree and false | ||
// otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're here, can you fix this? root
doesn't match the declaration. It's missing markCount
and sideEffects
.
Today the backend (and much of the JIT) allows only JTRUE with a relop
operand, and it is always assumed that this relop appears right before
JTRUE in linear order. This change adds support for JTRUE nodes with
arbitrary integral operands from lowering and onwards. The operand is
also allowed to appear at arbitrary locations in LIR.
With this, we can now enable an optimization that turns
EQ/NE(relop/SETCC, 0) into just a (potentially) reversed condition. This
improves codegen for some cases with SELECTCC and gives us a simple way
to allow for compare chains that feed conditional branches in the
future.
Not so many diffs are expected, but they should come with #79283.
Currently based on top of #82610.
Example diffs: