-
Notifications
You must be signed in to change notification settings - Fork 76
[AMDGPU][WaveTransform] Migrate VcndmaskVcmpExecMask fold into SIFoldOperand pass #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU][WaveTransform] Migrate VcndmaskVcmpExecMask fold into SIFoldOperand pass #369
Conversation
SIFoldOperand pass As in case of lateAMDGPUWaveTransform pipeline, the SIOptimizeExecMaskingPreRA should be moved just before SGPR allocation when per-lane VGPR allocation has been handled. So, we need to ensure that any kind of optimization dealing with EXEC mask around VGPRs has to be handlded way before, just after Instruction Selection appropriately. Thus, we migrate optimizeVcndVcmpPair from SIOptimizeExecMaksingPreRA into SIFoldOperands pass invoked dring MachineSSAOptimization pipeline.
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/7 |
| ; GCN-NEXT: [[V_CNDMASK_B32_e64_:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, [[V_CMP_NEQ_F16_e64_]], implicit $exec | ||
| ; GCN-NEXT: [[S_CSELECT_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_CSELECT_B32 -1, 0, implicit undef $scc | ||
| ; GCN-NEXT: [[S_AND_B32_1:%[0-9]+]]:sreg_32 = S_AND_B32 $exec_lo, [[S_CSELECT_B32_]], implicit-def dead $scc | ||
| ; GCN-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $exec_lo, implicit-def $exec_lo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this new code sequence correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is correct as the below optimization which happened in SIOptimizeExecMaskingPreRA additionally now does not happen in SIFoldOperands pass :
// If the only user of a logical operation is move to exec, fold it now
// to prevent forming of saveexec. I.e.:
//
// %0:sreg_64 = COPY $exec
// %1:sreg_64 = S_AND_B64 %0:sreg_64, %2:sreg_64
// =>
// %1 = S_AND_B64 $exec, %2:sreg_64
Take a look at the input test MIR to understand it clearly
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/10 |
…w pipeline. (#412) This patch introduces SIOptimizeExecMaskingPreRA after AMDGPUWaveTransform pass, but just before SGPR allocation to reduce register pressure for the new pipeline. While at the same time, it still acts as pre-RA pass optimizing EXEC-mask related instructions for legacy pipeline. It is a follow-up which depended on the #369.
As in case of lateAMDGPUWaveTransform pipeline, the SIOptimizeExecMaskingPreRA should be moved just before SGPR allocation when per-lane VGPR allocation has been handled. So, we need to ensure that any kind of optimization dealing with EXEC mask around VGPRs has to be handlded way before, just after Instruction Selection appropriately.
Thus, we migrate optimizeVcndVcmpPair from SIOptimizeExecMaksingPreRA into SIFoldOperands pass invoked during MachineSSAOptimization pipeline.