Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue where resolution move would overwrite the source of operand of JCMP #48833

Merged
merged 3 commits into from
Mar 3, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 36 additions & 13 deletions src/coreclr/jit/lsra.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7656,7 +7656,7 @@ void LinearScan::resolveRegisters()
printf("Resolution Candidates: ");
dumpConvertedVarSet(compiler, resolutionCandidateVars);
printf("\n");
printf("Has %sCritical Edges\n\n", hasCriticalEdges ? "" : "No");
printf("Has %sCritical Edges\n\n", hasCriticalEdges ? "" : "No ");

printf("Prior to Resolution\n");
foreach_block(compiler, block)
Expand Down Expand Up @@ -8277,31 +8277,46 @@ void LinearScan::handleOutgoingCriticalEdges(BasicBlock* block)
}
}

// Next, if this blocks ends with a switch table, we have to make sure not to copy
// into the registers that it uses.
regMaskTP switchRegs = RBM_NONE;
// Next, if this blocks ends with a switch table, or for Arm64, ends with JCMP instruction,
// make sure to not copy into the registers that are consumed at the end of this block.
//
// Note: Only switches and JCMP (for Arm4) have input regs (and so can be fed by copies), so those
// are the only block-ending branches that need special handling.
regMaskTP consumedRegs = RBM_NONE;
if (block->bbJumpKind == BBJ_SWITCH)
{
// At this point, Lowering has transformed any non-switch-table blocks into
// cascading ifs.
GenTree* switchTable = LIR::AsRange(block).LastNode();
assert(switchTable != nullptr && switchTable->OperGet() == GT_SWITCH_TABLE);

switchRegs = switchTable->gtRsvdRegs;
consumedRegs = switchTable->gtRsvdRegs;
GenTree* op1 = switchTable->gtGetOp1();
GenTree* op2 = switchTable->gtGetOp2();
noway_assert(op1 != nullptr && op2 != nullptr);
assert(op1->GetRegNum() != REG_NA && op2->GetRegNum() != REG_NA);
// No floating point values, so no need to worry about the register type
// (i.e. for ARM32, where we used the genRegMask overload with a type).
assert(varTypeIsIntegralOrI(op1) && varTypeIsIntegralOrI(op2));
switchRegs |= genRegMask(op1->GetRegNum());
switchRegs |= genRegMask(op2->GetRegNum());
consumedRegs |= genRegMask(op1->GetRegNum());
consumedRegs |= genRegMask(op2->GetRegNum());
}

#ifdef TARGET_ARM64
// Next, if this blocks ends with a JCMP, we have to make sure not to copy
// into the register that it uses or modify the local variable it must consume
// Next, if this blocks ends with a JCMP, we have to make sure:
// 1. Not to copy into the register that JCMP uses
// e.g. JCMP w21, BRANCH
// 2. Not to copy into the source of JCMP's operand before it is consumed
// e.g. Should not use w0 since it will contain wrong value after resolution
// call METHOD
// ; mov w0, w19 <-- should not resolve in w0 here.
// mov w21, w0
// JCMP w21, BRANCH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not obvious to me why the copy needs special casing here -- could we not insert the resolution move between the copy and jump?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting the resolution move between the copy and jump should be fine for some cases, for example:

mov     w21, w0
mov     x0, x19  ; <-- resolution move
cbnz    w21, G_M2478_IG37

but when we generate code for JCMP, it is tied with the copy and so the resolution move happens before the copy/jmp, like this:

      mov     x0, x20
      ldr     x21, [x20]
      ldr     x21, [x21,#72]
      ldr     x21, [x21,#40]
      blr     x21
      mov     x0, x19   ; move resolution overwrites `x0` before it gets copied.
      mov     w21, w0
      cbnz    w21, G_M2478_IG37

G_M2478_IG05:              
      mov     x19, x0
      ....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that the GT_COPY gets special handling in codegen and is conceptually merged with the operation that consumes its result, so both its input and output regs must be excluded from the set available for resolution.

And only switches and jcmp have input regs (and so can be fed by copies), so those are the only block-ending branches that need special handling.

Which makes me wonder if we might need similar logic for switches. It looks like we don't check if the input regs for switches come from GT_COPY. Switches are rare enough that we might not have good stress coverage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment

Done.

wonder if we might need similar logic for switches

Possibly, but I am not confident to make that change before seeing a scenario, so lets postpone it for now.

// 3. Not to modify the local variable it must consume

// Note: GT_COPY has special handling in codegen and its generation is merged with the
// node that consumes its result. So both, the input and output regs of GT_COPY must be
// excluded from the set available for resolution.
LclVarDsc* jcmpLocalVarDsc = nullptr;
if (block->bbJumpKind == BBJ_COND)
{
Expand All @@ -8310,7 +8325,13 @@ void LinearScan::handleOutgoingCriticalEdges(BasicBlock* block)
if (lastNode->OperIs(GT_JCMP))
{
GenTree* op1 = lastNode->gtGetOp1();
switchRegs |= genRegMask(op1->GetRegNum());
consumedRegs |= genRegMask(op1->GetRegNum());

if (op1->OperIs(GT_COPY))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are locals and copies the only special cases that might happen here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also switch_regs is not as descriptive as one might like, given that we now set it for things that are not switches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the only ones I could think of. COPY are special because we defer the code generation until the consuming node like here:

case GT_COPY:
// This is handled at the time we call genConsumeReg() on the GT_COPY
break;

So in the test that I was investigating, the GT_COPY node's generation get deferred and happens after the generation of resolution move, at the time when the JCMP is generated.

Generating: N787 ( 34, 17) [000046] --CXGO------        t46 = *  CALLV vt-ind int    MiniBench.MyChannelWriter`1[Int32][System.Int32].TryWrite REG x0 $218
							GC regs: 100001 {x0 x20} => 100000 {x20}
Call: GCvars=0000000000000008 {V00}, gcrefRegs=100000 {x20}, byrefRegs=80000 {x19}
[15] Rec call GC vars = 0000000000000008
IN000f:                           blr     x21
                                                              /--*  t46    int    
Generating:                [000930] --CXGO------       t930 = *  COPY      int    REG x21 (currently: ***not unused***, should be unused)
Generating: N789 (  1,  2) [000047] -c----------        t47 =    CNS_INT   int    0 REG NA $40
; Generating: N001 (  1,  1) [000937] ------------       t937 =    LCL_VAR   byref  V00 this          x19 REG x19
                                                              /--*  t937   byref  
Generating: N002 (  2,  2) [000938] ------------       t938 = *  COPY      byref  REG x0 (currently: ***unused***, should be notunused)
IN0010:                           mov     x0, x19
							V00 in reg x19 is becoming dead  [000937]
							Live regs: 180000 {x19 x20} => 100000 {x20}
							Byref regs: 80000 {x19} => 0000 {}
							V00 in reg x0 is becoming live  [000938]
							Live regs: 100000 {x20} => 100001 {x0 x20}
							Byref regs: 0000 {} => 0001 {x0}
                                                              /--*  t930   int    
                                                              +--*  t47    int    
Generating: N791 ( 36, 20) [000048] CNE---XGO-N----              *  JCMP      void   REG NA
IN0011:                           mov     w21, w0
IN0012:                           cbnz    (LARGEJMP)L_M2478_BB37

Regarding switchRegs, I will rename it to consumedRegs.

{
GenTree* srcOp1 = op1->gtGetOp1();
consumedRegs |= genRegMask(srcOp1->GetRegNum());
}

if (op1->IsLocal())
{
Expand Down Expand Up @@ -8388,9 +8409,11 @@ void LinearScan::handleOutgoingCriticalEdges(BasicBlock* block)
{
sameToReg = REG_NA;
}
// If this register is used by a switch table at the end of the block, we can't do the copy
// in this block (since we can't insert it after the switch).
if ((sameToRegMask & switchRegs) != RBM_NONE)
// If this register is busy because it is used by a switch table at the end of the block
// (or for Arm64, it is consumed by JCMP), we can't do the copy in this block since we can't
// insert it after the switch (or for Arm64, can't insert and overwrite the operand/source
// of operand of JCMP).
if ((sameToRegMask & consumedRegs) != RBM_NONE)
{
sameToReg = REG_NA;
}
Expand Down