Skip to content

Conversation

@abnikant
Copy link
Collaborator

If either vector source of G_SHUFFLE_VECTOR is defined by G_AIE_BROADCAST_VECTOR, combine it into a COPY operation if the mask falls within a valid range.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const bool

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not correct. The code was written with just one broadcast in mind. If we have two, it will get MinValue = 0 and MaxValue = (2 * NumSrcElems - 1) (full vector), and then it will select a copy from the first.

Please, test this case:

---
name:       test
alignment:       16
body:             |
  bb.1.entry:
  %1:_(s32) = COPY $r0
  %6:_(s32) = COPY $r1
  %2:_(<16 x s32>) = G_AIE_BROADCAST_VECTOR %1:_(s32)
  %3:_(<16 x s32>) = G_AIE_BROADCAST_VECTOR %6:_(s32)
  %0:_(<16 x s32>) = G_SHUFFLE_VECTOR %3:_(<16 x s32>), %2:_, shufflemask(1, 18, 18, 18, 18, 18, 18, 18, 18, 2, 18, 18, 18, 18, 18, 18)
  PseudoRET implicit $lr, implicit %0

The easiest way to fix is:

if (IsSrc1Bcst == IsSrc2Bcst)
   return false;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in one go if (IsSrc1Bst + IsSrc2Bcst != 1) return false;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right. I was just thinking about one broadcast source. I have updated the code to support any or both sources as broadcast. Thanks for catching this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we could also check both ranges separately.

@abnikant abnikant force-pushed the aie2p.shuffle.bcst branch from f12f573 to 40008a1 Compare March 28, 2025 12:47
Copy link
Collaborator

@martien-de-jong martien-de-jong Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be further streamlined.

  1. We only have to check whether it is a valid src mask if the corresponding operand is a broadcast.
  2. We only have to check the second operand if the first operand fails.
if (broadcast1 && inrange(mask, 0, NumSrcElems))
  SrcReg = Src1Reg;
else if (broadcas2 && inrange(mask, NumSrcElems. NumSrcElems)
  SrcReg = Src2Reg;
else 
  return false; 

(using {start, size} convention for ranges here)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored.

@abnikant abnikant force-pushed the aie2p.shuffle.bcst branch 3 times, most recently from 4b6f61b to f90f36e Compare April 8, 2025 07:13
@abnikant abnikant force-pushed the aie2p.shuffle.bcst branch from f90f36e to 712e979 Compare April 8, 2025 08:27
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think, we should take care of undef(-1) as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any practical example generating the mix of undef and def masks?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know we have this pattern in mllib or not. But there were many patterns with mask containing def and undef values. Same example can be like this

/// Match something like this:
///  %1:_(s32) = COPY $r0
///  %2:_(<16 x s32>) = COPY $x0
///  %3:_(<16 x s32>) = G_AIE_BROADCAST_VECTOR %1:_(s32)
///  %0:_(<16 x s32>) = G_SHUFFLE_VECTOR %3(<16 x s32>), %2(<16 x s32>),
///  shufflemask(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1)
/// To convert to:
///  %0:_(<16 x s32>) = G_AIE_BROADCAST_VECTOR %1(s32)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I have updated the function to handle undef and added some tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const bool IsValidSrc1Mask = IsSrc1Bcst ? MaskMatch(0).isMaskWithinRange(Mask, MinSrc1Value, MaxSrc1Value) : false;

@abnikant abnikant force-pushed the aie2p.shuffle.bcst branch 2 times, most recently from 4ce1308 to 3be6948 Compare April 8, 2025 10:44
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need llvm::any_of(Mask, [](int v) { return v != -1; })?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I wanted it to return false when all of the masks are defined as -1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you could remove this check

  if (MaskMatch::isMaskWithAllUndefs(Mask))
    return false;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@abnikant abnikant force-pushed the aie2p.shuffle.bcst branch from 3be6948 to 268b8a5 Compare April 8, 2025 12:09
Copy link
Collaborator

@niwinanto niwinanto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@abnikant abnikant merged commit c155d96 into aie-public Apr 8, 2025
6 checks passed
mgehre-amd pushed a commit that referenced this pull request Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants