Skip to content

Conversation

@vg0204
Copy link

@vg0204 vg0204 commented Oct 24, 2025

As in case of lateAMDGPUWaveTransform pipeline, the SIOptimizeExecMaskingPreRA should be moved just before SGPR allocation when per-lane VGPR allocation has been handled. So, we need to ensure that any kind of optimization dealing with EXEC mask around VGPRs has to be handlded way before, just after Instruction Selection appropriately.

Thus, we migrate optimizeVcndVcmpPair from SIOptimizeExecMaksingPreRA into SIFoldOperands pass invoked during MachineSSAOptimization pipeline.

SIFoldOperand pass

As in case of lateAMDGPUWaveTransform pipeline, the
SIOptimizeExecMaskingPreRA should be moved just before SGPR
allocation when per-lane VGPR allocation has been handled. So, we
need to ensure that any kind of optimization dealing with EXEC
mask around VGPRs has to be handlded way before, just after
Instruction Selection appropriately.

Thus, we migrate optimizeVcndVcmpPair from SIOptimizeExecMaksingPreRA
into SIFoldOperands pass invoked dring MachineSSAOptimization pipeline.
@vg0204 vg0204 requested review from cdevadas and jmmartinez October 24, 2025 07:16
@vg0204 vg0204 self-assigned this Oct 24, 2025
@z1-cciauto
Copy link
Collaborator

; GCN-NEXT: [[V_CNDMASK_B32_e64_:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, [[V_CMP_NEQ_F16_e64_]], implicit $exec
; GCN-NEXT: [[S_CSELECT_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_CSELECT_B32 -1, 0, implicit undef $scc
; GCN-NEXT: [[S_AND_B32_1:%[0-9]+]]:sreg_32 = S_AND_B32 $exec_lo, [[S_CSELECT_B32_]], implicit-def dead $scc
; GCN-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $exec_lo, implicit-def $exec_lo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new code sequence correct?

Copy link
Author

@vg0204 vg0204 Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is correct as the below optimization which happened in SIOptimizeExecMaskingPreRA additionally now does not happen in SIFoldOperands pass :

// If the only user of a logical operation is move to exec, fold it now
// to prevent forming of saveexec. I.e.:
//
//    %0:sreg_64 = COPY $exec
//    %1:sreg_64 = S_AND_B64 %0:sreg_64, %2:sreg_64
// =>
//    %1 = S_AND_B64 $exec, %2:sreg_64

Take a look at the input test MIR to understand it clearly

@z1-cciauto
Copy link
Collaborator

@vg0204 vg0204 merged commit 0bba171 into amd-feature/wave-transform Oct 27, 2025
5 checks passed
@vg0204 vg0204 deleted the amd/dev/vikashgu/refactor-Vcndmask-execMask-fold-migrate-siFoldOperands branch October 27, 2025 11:06
vg0204 added a commit that referenced this pull request Oct 29, 2025
…w pipeline. (#412)

This patch introduces SIOptimizeExecMaskingPreRA after
AMDGPUWaveTransform pass, but just before SGPR allocation to reduce
register pressure for the new pipeline. While at the same time, it still
acts as pre-RA pass optimizing EXEC-mask related instructions for legacy
pipeline.

It is a follow-up which depended on the #369.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants