-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle MultiRegNode in gtGetRegMask() #93576
Handle MultiRegNode in gtGetRegMask() #93576
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Detailsnull
|
@dotnet/jit-contrib |
src/coreclr/jit/gentree.cpp
Outdated
#ifdef TARGET_ARM64 | ||
else if (IsCopyOrReloadOfMultiRegNode()) | ||
{ | ||
// A multi-reg copy or reload, will have valid regs for only those | ||
// positions that need to be copied or reloaded. Hence we need | ||
// to consider only those registers for computing reg mask. | ||
|
||
const GenTreeCopyOrReload* copyOrReload = AsCopyOrReload(); | ||
const unsigned regCount = GetMultiRegCount(comp); | ||
|
||
resultMask = RBM_NONE; | ||
for (unsigned i = 0; i < regCount; i++) | ||
{ | ||
regNumber reg = copyOrReload->GetRegNumByIdx(i); | ||
if (reg != REG_NA) | ||
{ | ||
resultMask |= genRegMask(reg); | ||
} | ||
} | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this case could be combined with the previous case to make the TP cost smaller. Also, shouldn't this be handled for all platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this case could be combined with the previous case to make the TP cost smaller.
I would rather keep it separate, the way it is for other places in codebase. Also, for TP, we will still need to do the check for IsCopyOrReloadOfMultiRegNode()
and another check again when getting regCount
.
Also, shouldn't this be handled for all platforms?
Possibly, but I haven't seen cases where we hit this for other platforms, so would rather defer removing the #ifdef
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather keep it separate, the way it is for other places in codebase. Also, for TP, we will still need to do the check for IsCopyOrReloadOfMultiRegNode() and another check again when getting regCount.
Why can't the previous case just be completely removed? It seems like this one handles it. If not, you can at least combine it to be
if (IsCopyOrReload())
{
GenTree* op1 = gtGetOp1();
if (op1->IsMultiRegCall())
{
// previous case
}
else if (op1->IsMultiRegNode())
{
// current case
}
}
The MinOpts TP costs right now look significant.
Possibly, but I haven't seen cases where we hit this for other platforms, so would rather defer removing the #ifdef.
It seems like a bad idea to wait until we hit the bug on other platforms to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not, you can at least combine it to be
I think I misunderstood your previous comment. Yes, I can definitely combine it with IsCopyOrReload()
. But thinking a bit more about this, I think this should be handled at other code paths as well where we currently just check for IsCopyOrReloadOfMultiRegCall()
and IsMultiRegLclVar()
and should also have a case for IsMultiRegNode()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 places where we need to handle multiRegIdx
and fixed as part of this PR:
gtConsumeReg(m)
: This API already tookmultiRegIndex
and we should fetch the correspondingregNum
from the tree. If it is notREG_NA
, we should pass it to gcinfo.gtProduceReg()
: I handled two scenarios here: If the tree is multi-reg hwintrinsic node or if it is aGT_COPYRELOAD
whose operand is multi-reg hwintrinsic node.gtHasReg()
: Same asgtProduceReg()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gtConsumeReg(m)
: This API already tookmultiRegIndex
and we should fetch the correspondingregNum
from the tree. If it is notREG_NA
, we should pass it to gcinfo.
By fixing this, the main issue was resolved. Below are just the safety measures to make sure we handle it correctly at other places.
2.
gtProduceReg()
: I handled two scenarios here: If the tree is multi-reg hwintrinsic node or if it is aGT_COPYRELOAD
whose operand is multi-reg hwintrinsic node.
The only multi reg intrinsic we have in xarch is DivRem, so yes, we were missing updating the gcinfo when we produced in the registers for this intrinsic.
3.
gtHasReg()
: Same asgtProduceReg()
.
We do not have multi-reg intrinsic where we will partially assign registers to them. So this can be simply checked using GetRegNum() != REG_NA
and doesn't need special handling. This still need to be handled for CopyOrReload
node, so added that path.
@jakobbotsch - can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks ok to me, but the TP code seems high, especially in MinOpts where we don't expect to have multireg nodes (at least allocated into registers).
There are very few places we actually can produce/consume multireg nodes -- I wonder if some of these core methods like genProduceReg
should be split into genProduceReg
and genProduceMultiReg
, with genProduceMultiReg
called directly from the places we can produce multiple registers. That would probably help with TP (maybe even turn this into an improvement overall). I would also be fine with leaving this as a follow-up .NET 9 work item.
I agree with you. The immediate fix for the original issue is just https://github.com/dotnet/runtime/pull/93576/files#diff-426fb8beee6e663b3fb0f11330af0ce0ce21ff22a8ea819f39d12a95d9d6cffcR1456-R1460. To unblock superpmi replay clean runs, I am thinking of just merging this change and thinking more about handling the cases for |
Sounds good to me. |
With some minor adjustment of how we get multireg in https://github.com/dotnet/runtime/pull/93576/files#diff-426fb8beee6e663b3fb0f11330af0ce0ce21ff22a8ea819f39d12a95d9d6cffcR2244 and https://github.com/dotnet/runtime/pull/93576/files#diff-fba867eda1c745875748370c59da2195c3823b85a17d199444068d93f91c749bR839 gets little TP gain. |
Adding multi-def support exposed the existing shortcoming in
gtGetRegMask
where we were not handling if the multi-reg intrinsic or node is source ofGT_COPY
orGT_RELOAD
. We will only get theregMask
if themultiRegIndex
of source holds a valid register. One of the things I don't like about this is we have to findCompiler*
to this method to get the register count.Fixes: #93527