-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
@dotnet-bot test Windows_NT jitstressregs1 |
@dotnet-bot test Windows_NT x86 ryujit Checked jitstress2_jitstressregs0x10 |
@dotnet-bot test this please |
@dotnet-bot help |
@dotnet-bot test this please |
@dotnet-bot test Windows_NT jitstressregs1 |
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs0x10 |
@dotnet-bot test Windows_NT jitstressregs1 |
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs0x10 |
@dotnet-bot test Ubuntu jitstressregs2 |
@dotnet-bot test Ubuntu jitstressregs1 |
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs8 |
@CarolEidt - Please review this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fix to issue 8286. Commit migrated from dotnet/coreclr@f26796f
Issue 1: the original issue uncovered by issue 8286.
This issue repros with JitStressRegs=4 against x86 RyuJIT.
The repro method has a InitBlock with size specified by a variable i.e. StoreDynBlk(InitValue, DstAddr, Size).
Lower/Codegen decides to use rep.Stosb which requires InitValue=eax, DstAddr=edi, Size=ecx. Rep.Stosb kill edi and ecx and leaves eax intact. The kill set of StoreDynBlk is modeled accordingly. That is at the point of Use of InitValue in StoreDynBlk, it has a fixedRef to eax.
Above V11 corresponds to InitVal. Note that prior to allocating Use of V11, we have the following reg allocation V11=edx, V3=eax.
Now refPos 40 is a fixedRegRef which is not a kill nor marked delayRegFree. Further it is in a register (edx) which is not the one (eax) we want. In this case we need eax to be spilled and assigned to refPos 40 with it marked as copyReg=true. But that logic in allocateRegisters() gets triggered only
if (!fixedRegRef || refPos marked delayRegFree)
.Fix:
This if condition is removed to fix the above case. Note that in resolveLocalRef, GT_COPY is inserted only
if (!fixedRegRef || moveReg)
Issue2:
This lead to an assert in verifyFinalAllocation() for the case of RefTypeKill that was asserting that the RegRecord that is being killed is not assigned to any interval at the point of kill. This assert is hit for CQPerf benchmark method PermuteArray on ubuntu.
Above the refPos of interest are 202 and 213. in verifyFinalAllocation(), while processing RefPos 213 (i.e. kill eax) finds eax RegRecord's assignedInterval is non-null and set to Interval of refPos 202 (i.e. V30). At the beginning of this basic block at location 399, LSRA inserted resolution move to swap V30 and V5 so that V30 will be eax. Prior to resolution move V30 came into basic block in reg R13. verifyResolutionMove() logic didn't set V30's interval->assignedReg to eax' RegRecord as result it remained pointing to R13. As a result verifyFinalAllocation() while processing refPos 202, has cleared assignedInterval field on R13 instead of on eax's RegRecord. And when refKill pos 213 is processed it finds eax's regRecord->assignedInterval is not cleared.
Fix:
verifyResolutionMove() is fixed to update interval->assignedReg field of both leftInterval and rightInterval in case of a swap.
Issue3:
Ref pos 202 is isFixedReg=true, copyReg=true and spillAfter=true. verifyFinalAllocation() is printing assigned reg as the register (esi) spilled in the dump. In case of copyreg=true and spillAfter=true, the register that is being is spilled is homeReg given by interval->physReg. Corrected dump to handle to this case.
AsmDiffs:
SPMI asm diffs on amd64 desktop indicate minor code size regression. These are due to a different register allocated or in some cases additional mov instruction or an additional resolution mov generated.
For example in case of mscorlib
Throughput:
Throughput measured on desktop shows that the regression is within noise limit
Fixes #8286