Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle MultiRegNode in gtGetRegMask() #93576

Merged
merged 12 commits into from
Oct 20, 2023

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Oct 16, 2023

Adding multi-def support exposed the existing shortcoming in gtGetRegMask where we were not handling if the multi-reg intrinsic or node is source of GT_COPY or GT_RELOAD. We will only get the regMask if the multiRegIndex of source holds a valid register. One of the things I don't like about this is we have to find Compiler* to this method to get the register count.

Fixes: #93527

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 16, 2023
@ghost ghost assigned kunalspathak Oct 16, 2023
@ghost
Copy link

ghost commented Oct 16, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: kunalspathak
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

Comment on lines 1116 to 1136
#ifdef TARGET_ARM64
else if (IsCopyOrReloadOfMultiRegNode())
{
// A multi-reg copy or reload, will have valid regs for only those
// positions that need to be copied or reloaded. Hence we need
// to consider only those registers for computing reg mask.

const GenTreeCopyOrReload* copyOrReload = AsCopyOrReload();
const unsigned regCount = GetMultiRegCount(comp);

resultMask = RBM_NONE;
for (unsigned i = 0; i < regCount; i++)
{
regNumber reg = copyOrReload->GetRegNumByIdx(i);
if (reg != REG_NA)
{
resultMask |= genRegMask(reg);
}
}
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this case could be combined with the previous case to make the TP cost smaller. Also, shouldn't this be handled for all platforms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this case could be combined with the previous case to make the TP cost smaller.

I would rather keep it separate, the way it is for other places in codebase. Also, for TP, we will still need to do the check for IsCopyOrReloadOfMultiRegNode() and another check again when getting regCount.

Also, shouldn't this be handled for all platforms?

Possibly, but I haven't seen cases where we hit this for other platforms, so would rather defer removing the #ifdef.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather keep it separate, the way it is for other places in codebase. Also, for TP, we will still need to do the check for IsCopyOrReloadOfMultiRegNode() and another check again when getting regCount.

Why can't the previous case just be completely removed? It seems like this one handles it. If not, you can at least combine it to be

if (IsCopyOrReload())
{
  GenTree* op1 = gtGetOp1();
  if (op1->IsMultiRegCall())
  {
    // previous case
  }
  else if (op1->IsMultiRegNode())
  {
    // current case
  }
}

The MinOpts TP costs right now look significant.

Possibly, but I haven't seen cases where we hit this for other platforms, so would rather defer removing the #ifdef.

It seems like a bad idea to wait until we hit the bug on other platforms to fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, you can at least combine it to be

I think I misunderstood your previous comment. Yes, I can definitely combine it with IsCopyOrReload(). But thinking a bit more about this, I think this should be handled at other code paths as well where we currently just check for IsCopyOrReloadOfMultiRegCall() and IsMultiRegLclVar() and should also have a case for IsMultiRegNode().

Copy link
Member Author

@kunalspathak kunalspathak Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 places where we need to handle multiRegIdx and fixed as part of this PR:

  1. gtConsumeReg(m): This API already took multiRegIndex and we should fetch the corresponding regNum from the tree. If it is not REG_NA, we should pass it to gcinfo.
  2. gtProduceReg(): I handled two scenarios here: If the tree is multi-reg hwintrinsic node or if it is a GT_COPYRELOAD whose operand is multi-reg hwintrinsic node.
  3. gtHasReg(): Same as gtProduceReg().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. gtConsumeReg(m): This API already took multiRegIndex and we should fetch the corresponding regNum from the tree. If it is not REG_NA, we should pass it to gcinfo.

By fixing this, the main issue was resolved. Below are just the safety measures to make sure we handle it correctly at other places.

2. gtProduceReg(): I handled two scenarios here: If the tree is multi-reg hwintrinsic node or if it is a GT_COPYRELOAD whose operand is multi-reg hwintrinsic node.

The only multi reg intrinsic we have in xarch is DivRem, so yes, we were missing updating the gcinfo when we produced in the registers for this intrinsic.

3. gtHasReg(): Same as gtProduceReg().

We do not have multi-reg intrinsic where we will partially assign registers to them. So this can be simply checked using GetRegNum() != REG_NA and doesn't need special handling. This still need to be handled for CopyOrReload node, so added that path.

@kunalspathak kunalspathak marked this pull request as draft October 18, 2023 16:03
@kunalspathak kunalspathak marked this pull request as ready for review October 20, 2023 01:21
@kunalspathak
Copy link
Member Author

@jakobbotsch - can you take a look?

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks ok to me, but the TP code seems high, especially in MinOpts where we don't expect to have multireg nodes (at least allocated into registers).

There are very few places we actually can produce/consume multireg nodes -- I wonder if some of these core methods like genProduceReg should be split into genProduceReg and genProduceMultiReg, with genProduceMultiReg called directly from the places we can produce multiple registers. That would probably help with TP (maybe even turn this into an improvement overall). I would also be fine with leaving this as a follow-up .NET 9 work item.

@kunalspathak
Copy link
Member Author

kunalspathak commented Oct 20, 2023

but the TP code seems high

I agree with you. The immediate fix for the original issue is just https://github.com/dotnet/runtime/pull/93576/files#diff-426fb8beee6e663b3fb0f11330af0ce0ce21ff22a8ea819f39d12a95d9d6cffcR1456-R1460. To unblock superpmi replay clean runs, I am thinking of just merging this change and thinking more about handling the cases for gtHasReg() and genProduceReg() separately. Thoughts?

@jakobbotsch
Copy link
Member

I agree with you. The immediate fix for the original issue is just https://github.com/dotnet/runtime/pull/93576/files#diff-426fb8beee6e663b3fb0f11330af0ce0ce21ff22a8ea819f39d12a95d9d6cffcR1456-R1460. To unblock superpmi replay clean runs, I am thinking of just merging this change and thinking more about handling the cases for gtHasReg() and genProduceReg() separately. Thoughts?

Sounds good to me.

@kunalspathak kunalspathak merged commit 8043dbd into dotnet:main Oct 20, 2023
127 of 129 checks passed
@kunalspathak kunalspathak deleted the multireg-copyorreload branch October 20, 2023 18:18
@ghost ghost locked as resolved and limited conversation to collaborators Nov 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SuperPMI replay failure: (unsigned)reg < ArrLen(regMasks)
2 participants