-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64-SVE: Allow LCL_VARs to store as mask #99608
Changes from 3 commits
6628904
ed574f9
02fa227
2e2e174
fcdb18a
687af37
1fc8d5b
9dbfe63
85f09bf
7945d51
bd5d951
ce61a40
b5502a6
39c02d0
d8dea0e
8ec8e38
3ec441c
f569512
0110170
ec05e34
71bcb48
24cd68b
5b995ae
3a82d5d
8baee38
b22755a
bd8db6e
e359c93
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6419,6 +6419,17 @@ void Compiler::impImportBlockCode(BasicBlock* block) | |
impSpillSideEffects(false, CHECK_SPILL_ALL DEBUGARG("Spill before store to pinned local")); | ||
} | ||
|
||
#if defined(TARGET_ARM64) && defined(FEATURE_MASKED_HW_INTRINSICS) | ||
// Masks must be converted to vectors before being stored to memory. | ||
// But, for local stores we can optimise away the conversion | ||
if (op1->OperIsHWIntrinsic() && op1->AsHWIntrinsic()->GetHWIntrinsicId() == NI_Sve_ConvertMaskToVector) | ||
a74nh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
op1 = op1->AsHWIntrinsic()->Op(1); | ||
lvaTable[lclNum].lvType = TYP_MASK; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see we're doing this retyping here. But I don't see where we're doing any fixups to ensure that something which reads the Imagine for example, something like: Vector<int> mask = Vector.GreaterThan(x, y);
return mask + Vector.Create(5); Where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might expect such logic to insert a general |
||
lclTyp = lvaGetActualType(lclNum); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do they need to be converted before being stored to memory? Shouldn't this be entirely dependent on the target type, just with a specialization for locals since we want to allow efficient consumption in the typical case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this just a suggestion to change the comments, or a code change too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just a change to the comment. I think we only need to clarify that we're optimizing masks stored to locals to allow better consumption in the expected typical case and that we have the handling in place to ensure that consumers which actually need a vector get the Noting, however, that this could be an incorrect optimization in some cases. For example, if it's a user-defined local where actual vectors are also stored then it would require a So I'm not entirely certain this is the "right" place to be doing this either. We might need to actually do this in a later phase where we can observe all uses of a local from the perspective of user defined code, so that we only do this optimization when all values being stored are masks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right place either. Originally I wanted to avoid creating the conversion in the first place, but realised it has to be created and then removed after/during creation of the local. That's how I ended up putting it in importer. I can't see an obvious later phase this would be done in. Morph? Or maybe during lowering? Anything that already parses local vars would be better? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the correct location for this would be However, @EgorBo or @jakobbotsch may have a better idea on where the code should go. The consideration in general is really just that we need to find non-address exposed This work needs to happen after import since that's the only code that would be using pre-existing locals. Any locals introduced by CSE or other phases involving a mask will already be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed this kind of brute force changing of a local's type is not safe. It will break all sorts of IR invariants we want to be able to rely on if you don't also do a full pass to fix up other uses of the local. It seems like this kind of optimization should be its own separate pass after local morph when we know whether or not it is address exposed. We have linked locals at that point, so finding the occurrences is relatively efficient. You can leave some breadcrumb around to only run the pass when there are opportunities. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'll add a new pass then. For now I've removed the importer changes. Technically, this PR could be merged as is. It won't make any jit difference by itself, but it's quite a lot of code that is a blocker for other hw intrinsic work I'm doing (the embedded masks). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you expand a little on what you mean here? I want to make sure I'm parsing the right data in the pass. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doing it in a separate PR sounds good to me. Presumably it needs some heuristics to figure out if it's profitable to make the replacements as well.
See |
||
#endif // TARGET_ARM64 && FEATURE_MASKED_HW_INTRINSICS | ||
|
||
op1 = gtNewStoreLclVarNode(lclNum, op1); | ||
|
||
// TODO-ASG: delete this zero-diff quirk. Requires some forward substitution work. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this pass down
UNPREDICATED
rather than doing the inverse?That is, I imagine most instructions we encounter will end up unpredicated (or effectively unpredicated by using a
TrueMask
), so I'd expect we end up with overall less checks if we simply saidif (varTypeUsesMaskReg(targetType)) { insOpts |= INS_SCALABLE_OPTS_PREDICATED; }
Otherwise, we end up having to special case every single instruction that has a predicated and unpredicated form and additionally check if they use a mask register or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because the emit function is inconveniently the wrong way around. I was going to fix the emit function up in this PR, but, once register allocation is working we can get rid the enum and get rid of all these checks.
Given the register allocation work is going to take some time, then maybe I should fix up the emit code in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Just wanted to get clarification as it did seem backwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched this around. It no longer matches some of the other emit_R_R_etc functions, but that's ok because it'll all vanish eventually.