-
Notifications
You must be signed in to change notification settings - Fork 32
[AIE2P] Add shuffle mask range checks in matchShuffleBcstToCopy combiner #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not correct. The code was written with just one broadcast in mind. If we have two, it will get MinValue = 0 and MaxValue = (2 * NumSrcElems - 1) (full vector), and then it will select a copy from the first.
Please, test this case:
---
name: test
alignment: 16
body: |
bb.1.entry:
%1:_(s32) = COPY $r0
%6:_(s32) = COPY $r1
%2:_(<16 x s32>) = G_AIE_BROADCAST_VECTOR %1:_(s32)
%3:_(<16 x s32>) = G_AIE_BROADCAST_VECTOR %6:_(s32)
%0:_(<16 x s32>) = G_SHUFFLE_VECTOR %3:_(<16 x s32>), %2:_, shufflemask(1, 18, 18, 18, 18, 18, 18, 18, 18, 2, 18, 18, 18, 18, 18, 18)
PseudoRET implicit $lr, implicit %0
The easiest way to fix is:
if (IsSrc1Bcst == IsSrc2Bcst)
return false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or in one go if (IsSrc1Bst + IsSrc2Bcst != 1) return false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right. I was just thinking about one broadcast source. I have updated the code to support any or both sources as broadcast. Thanks for catching this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we could also check both ranges separately.
f12f573 to
40008a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be further streamlined.
- We only have to check whether it is a valid src mask if the corresponding operand is a broadcast.
- We only have to check the second operand if the first operand fails.
if (broadcast1 && inrange(mask, 0, NumSrcElems))
SrcReg = Src1Reg;
else if (broadcas2 && inrange(mask, NumSrcElems. NumSrcElems)
SrcReg = Src2Reg;
else
return false;
(using {start, size} convention for ranges here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored.
4b6f61b to
f90f36e
Compare
f90f36e to
712e979
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think, we should take care of undef(-1) as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any practical example generating the mix of undef and def masks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know we have this pattern in mllib or not. But there were many patterns with mask containing def and undef values. Same example can be like this
/// Match something like this:
/// %1:_(s32) = COPY $r0
/// %2:_(<16 x s32>) = COPY $x0
/// %3:_(<16 x s32>) = G_AIE_BROADCAST_VECTOR %1:_(s32)
/// %0:_(<16 x s32>) = G_SHUFFLE_VECTOR %3(<16 x s32>), %2(<16 x s32>),
/// shufflemask(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1)
/// To convert to:
/// %0:_(<16 x s32>) = G_AIE_BROADCAST_VECTOR %1(s32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I have updated the function to handle undef and added some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const bool IsValidSrc1Mask = IsSrc1Bcst ? MaskMatch(0).isMaskWithinRange(Mask, MinSrc1Value, MaxSrc1Value) : false;
4ce1308 to
3be6948
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need llvm::any_of(Mask, [](int v) { return v != -1; })?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I wanted it to return false when all of the masks are defined as -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you could remove this check
if (MaskMatch::isMaskWithAllUndefs(Mask))
return false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
3be6948 to
268b8a5
Compare
niwinanto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
If either vector source of
G_SHUFFLE_VECTORis defined byG_AIE_BROADCAST_VECTOR, combine it into aCOPYoperation if the mask falls within a valid range.