-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeadBranchElimPass #673
Conversation
source/opt/dead_branch_elim_pass.cpp
Outdated
|
||
Pass::Status DeadBranchElimPass::ProcessImpl() { | ||
// Current functionality assumes structured control flow. | ||
// TODO(): Handle non-structured control-flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding your username or email to TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
source/opt/dead_branch_elim_pass.cpp
Outdated
|
||
bool modified = false; | ||
// Call Mem2Reg on all remaining functions. | ||
for (auto& e : module_->entry_points()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
source/opt/dead_branch_elim_pass.cpp
Outdated
static const int kSpvBranchCondTrueLabId = 1; | ||
static const int kSpvBranchCondFalseLabId = 2; | ||
static const int kSpvSelectionMergeMergeBlockId = 0; | ||
static const int kSpvPhiVal0Id = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider better naming and moving these constants in the body of the function where they are used.
Also consider using anonymous namespace instead of static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of these names is:
kSpv[Name of Operation][NameOfOperand]
My goal was to avoid just have having magic numbers all over the code, and if at some future point an instruction changed we might have some hope of tracking its presence down.
Can you suggest a better structure? I am trying to keep their length somewhat manageable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spv is not really needed, since it's a local variable. I would add InIdx at the end, to signify that these are not ids, but 'in operand indices'. I would also move these into the functions where they are used.
const uint32_t kPhiVal0IdInIdx = 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I added an underscore to the names to separate the operator from the operand names, for example: kSpvBranchCond_TrueLabId.
I also make them non-static and anonymous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Removed Spv and added InIdx. But I would suggest keeping them at global scope allows them to be used in multiple functions, thus reducing redundancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I took out underscores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. And changed them to uint32_t
source/opt/dead_branch_elim_pass.cpp
Outdated
spvtools::CFA<ir::BasicBlock>::DepthFirstTraversal( | ||
&*func->begin(), StructuredSuccessorsFunction(), ignore_block, | ||
[&](cbb_ptr b) { order->push_front(const_cast<ir::BasicBlock*>(b)); }, ignore_edge); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be more readable this way:
auto ignore_block = {};
auto ignore_edge = [](cbb_ptr, cbb_ptr) {};
auto get_structured_successors = [this](const ir::BasicBlock* block) { return &(block2structured_succs_[block]); };
auto whatever = [&](cbb_ptr b) { order->push_front(const_castir::BasicBlock*(b)); }
spvtools::CFAir::BasicBlock::DepthFirstTraversal(
&*func->begin(), get_structured_successors, ignore_block, whatever, ignore_edge);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
source/opt/dead_branch_elim_pass.cpp
Outdated
uint32_t condId, bool* condVal, bool* condIsConst) { | ||
ir::Instruction* cInst = def_use_mgr_->GetDef(condId); | ||
switch (cInst->opcode()) { | ||
case SpvOpConstantFalse: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing indentation before case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
source/opt/dead_branch_elim_pass.cpp
Outdated
(void)GetConstCondition(cInst->GetSingleWordInOperand(0), | ||
condVal, condIsConst); | ||
if (*condIsConst) | ||
*condVal = !*condVal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider
bool negated_value;
if (!IsConstCondition(cInst->GetSingleWordInOperand(0), &negated_value))
return false;
*condVal = !negated_value
return true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
source/opt/dead_branch_elim_pass.cpp
Outdated
continue; | ||
|
||
// Replace conditional branch with unconditional branch | ||
uint32_t trueLabId = br->GetSingleWordInOperand(kSpvBranchCondTrueLabId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding const qualifier to local variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
Hey @atgoo: Does the algorithm look correct to you? Is there enough test coverage to be confident that this algorithm isn't broken and won't break under reasonable changes elsewhere? Are any of your suggestions blocking? |
No, my suggestions are about readability. I'm not qualified to judge if the logic is correct on a large scale, and I didn't find any problems on a tactical level. |
Brainstorming cases after reading just the MR description (not a request for feedback or changes. Just personal notes!):
|
Interesting example. My intent with this optimization was to only process OpBranchConditionals that are preceded by by OpSelectionMerge. If the code does not reflect that, I will make that change immediately. Sorry for the confusion. |
The code does only process OpSelectionMerge. I will make that clearer in the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm has a couple of problems. I've given a couple of examples.
const uint32_t kSelectionMergeMergeBlockIdInIdx = 0; | ||
const uint32_t kPhiVal0IdInIdx = 0; | ||
const uint32_t kPhiLab0IdInIdx = 1; | ||
const uint32_t kPhiVal1IdInIdx = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this patch in order, it's mysterious why phi input branch pairs 0 and 1 are the only ones mentioned... Hope there's a loop later. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because only OpSelectionMerge/OpBranchConditional is optimized (for the moment). This means that when patching phis in the merge block, we only have to deal with phis with 2 and only 2 pairs.
Status Process(ir::Module*) override; | ||
|
||
private: | ||
// Returns the id of the merge block declared by a merge instruction in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the next method raise the question in my mind about continue targets.
source/opt/dead_branch_elim_pass.h
Outdated
// this block, if any. If none, returns zero. | ||
uint32_t MergeBlockIdIfAny(const ir::BasicBlock& blk) const; | ||
|
||
// Compute structured successors for function |func|. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need to treat continue target specially as well? It's similar to a merge block in that we might branch out of a nested construct to a nesting continue target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the spec does mention the possibility of the continue target being unreachable, given the other requirements, I am having difficulty coming up with a flow graph where this is the case. Here is my thinking:
- A loop must have a back edge
- There must be path from the header to the backedge block (otherwise it can't be a backedge)
- The continue target must dominate the backedge block
Thus, there must be a path from the header to the continue target
Thus the continue target is only unreachable iff the header is unreachable.
So it seems to me that the continue target does not to be treated specially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few things going on here:
- Surprisingly, the definition of "back-edge" is still up for debate.
See Need to refine definition of a back edge used by structured control flow rules SPIRV-Headers#16 and the corresponding internal issue https://gitlab.khronos.org/spirv/SPIR-V/issues/78
I got involved because there are simple shaders that leave the continue target unreachable, and those exist in the Vulkan CTS suite. I had to adjust the validator to accommodate them. For example, this fragment shader:
#version 310 es
void main() {
bool c;
while(c) {
break;
}
}
will compile via Glslang into:
OpCapability Shader
%1 = OpExtInstImport "GLSL.std.450"
OpMemoryModel Logical GLSL450
OpEntryPoint Fragment %main "main"
OpExecutionMode %main OriginUpperLeft
OpSource ESSL 310
OpName %main "main"
OpName %c "c"
%void = OpTypeVoid
%3 = OpTypeFunction %void
%bool = OpTypeBool
%_ptr_Function_bool = OpTypePointer Function %bool
%main = OpFunction %void None %3
%5 = OpLabel
%c = OpVariable %_ptr_Function_bool Function
OpBranch %6
%6 = OpLabel
OpLoopMerge %8 %9 None
OpBranch %10
%10 = OpLabel
%14 = OpLoad %bool %c
OpBranchConditional %14 %7 %8
%7 = OpLabel
OpBranch %8
%9 = OpLabel
OpBranch %6
%8 = OpLabel
OpReturn
OpFunctionEnd
That has continue target block %9 unreachable. and yet we still want the branch from %9 to %6 to be a back edge.
-
There was a recent fix (I think in SPIR-V 1.0 Rev10) in structured control flow rules to say "the loop header must dominate the Continue Target, unless the Continue Target is unreachable in the CFG" (emphasis mine). This was to accommodate the case in 1.
-
Even given all that, the reason you're computing these "structured successors" is to list the blocks that you can safely delete once you delete a newly-found dead block. Those blocks are characterized by being the blocks dominated by the newly-dead block. The structured successors is a proxy for dominance. But a newly-dead block can escape out to an outer continue target. So my intuition thinks there's still something fishy here. I'll go try out some more cases through the new rev of your code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, here's a concrete example of my concern. (See point 3 in previous comment).
Example fragment shader:
#version 450
void main() {
bool c;
bool d;
while(c) {
if(d) {
continue;
}
if(false) { // setting this up to be killed. branches out to an outer continue target.
continue;
}
discard;
}
}
Compiles to:
OpCapability Shader
%1 = OpExtInstImport "GLSL.std.450"
OpMemoryModel Logical GLSL450
OpEntryPoint Fragment %main "main"
OpExecutionMode %main OriginUpperLeft
OpSource GLSL 450
OpName %main "main"
OpName %c "c"
OpName %d "d"
%void = OpTypeVoid
%3 = OpTypeFunction %void
%bool = OpTypeBool
%_ptr_Function_bool = OpTypePointer Function %bool
%false = OpConstantFalse %bool
%main = OpFunction %void None %3
%5 = OpLabel
%c = OpVariable %_ptr_Function_bool Function
%d = OpVariable %_ptr_Function_bool Function
OpBranch %6
%6 = OpLabel
OpLoopMerge %8 %9 None
OpBranch %10
%10 = OpLabel
%14 = OpLoad %bool %c
OpBranchConditional %14 %7 %8
%7 = OpLabel
%16 = OpLoad %bool %d
OpSelectionMerge %18 None
OpBranchConditional %16 %17 %18
%17 = OpLabel
OpBranch %9
%18 = OpLabel
OpSelectionMerge %22 None
OpBranchConditional %false %21 %22
%21 = OpLabel
OpBranch %9
%22 = OpLabel
OpKill
%9 = OpLabel
OpBranch %6
%8 = OpLabel
OpReturn
OpFunctionEnd
Then running through this optimization pass, I get:
OpCapability Shader
%1 = OpExtInstImport "GLSL.std.450"
OpMemoryModel Logical GLSL450
OpEntryPoint Fragment %main "main"
OpExecutionMode %main OriginUpperLeft
OpSource GLSL 450
OpName %main "main"
OpName %c "c"
OpName %d "d"
%void = OpTypeVoid
%3 = OpTypeFunction %void
%bool = OpTypeBool
%_ptr_Function_bool = OpTypePointer Function %bool
%false = OpConstantFalse %bool
%main = OpFunction %void None %3
%5 = OpLabel
%c = OpVariable %_ptr_Function_bool Function
%d = OpVariable %_ptr_Function_bool Function
OpBranch %6
%6 = OpLabel
OpLoopMerge %8 %9 None
OpBranch %10
%10 = OpLabel
%14 = OpLoad %bool %c
OpBranchConditional %14 %7 %8
%7 = OpLabel
%16 = OpLoad %bool %d
OpSelectionMerge %18 None
OpBranchConditional %16 %17 %18
%17 = OpLabel
OpBranch %9
%18 = OpLabel
OpBranch %22
%22 = OpLabel
OpKill
%8 = OpLabel
OpReturn
OpFunctionEnd
That's invalid because block %9 has been swept away but is mentioned as the continue target of the loop, and as the branch target of block %17.
The problem is that you determine that block %21 is dead (that's right) and all the blocks it dominates should be swept away. But your algorithm incorrectly determines that block %9 should be included in that set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make a good point that, whether valid or invalid, glslang can generate this sequence and it would be bad to break it.
While I think a more general pass to clean up and normalize such sequences might be the best long term solution, I think a solution (which I think you suggested at one point) might be best in the short term: have ComputeStructuredSuccessors add the continue target after the merge block and before all the "real" successors. This should guarantee that in the "structured order", the continue target will appear after any loop code not part of the continue sequence. Since the continue target dominates the backedge and the backedge post-dominates the continue target, this should remove any possibility of continue code being eliminated by this pass.
FWIW, despite the seeming questions around valid control flow, I am going to continue the policy of not generating any new unreachable blocks with this pass, even if it means reinserting a BranchConditional. This already has a TODO on it to revisit.
br->GetSingleWordInOperand(kBranchCondFalseLabIdInIdx); | ||
const uint32_t mergeLabId = | ||
mergeInst->GetSingleWordInOperand(kSelectionMergeMergeBlockIdInIdx); | ||
const uint32_t liveLabId = condVal == true ? trueLabId : falseLabId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok, but generally I just use the absorption rule to collapse "condVal == true" to just "condVal". :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Its almost more of a readability thing here.
source/opt/dead_branch_elim_pass.h
Outdated
// If block |bp| contains constant conditional branch, return true | ||
// and return branch and merge instructions in |branchInst| and |mergeInst| | ||
// and the boolean constant in |condVal|. | ||
bool GetConstConditionalBranch(ir::BasicBlock* bp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we can have conditional branches that don't have merge instruction.
Examples include:
%entry= OpLabel
OpBranch %loop
%loop = OpLabel
OpLoopMerge %merge %body None
OpBranch %body
%body = OpLabel
%cond = .... %bool
OpBranch %cond %exit %loop ;; this back edge does not have a preceding merge instruction
%merge = OpLabel
Similarly, in an example generated from this:
for (...) {
if ( cond) {
break; // this breaks out to the merge block,, and it could have been compiled into an OpBranchConditional
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Already covered by request to clarify)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we are good here as I clarified that this pass is only targeting OpBranchConditionals preceded by OpSelectionMerge.
source/opt/dead_branch_elim_pass.h
Outdated
void KillAllInsts(ir::BasicBlock* bp); | ||
|
||
// If block |bp| contains constant conditional branch, return true | ||
// and return branch and merge instructions in |branchInst| and |mergeInst| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If block |bp| contains constant conditional branch that is also preceded by a selection merge, return true and ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
source/opt/dead_branch_elim_pass.h
Outdated
// If block |bp| contains constant conditional branch, return true | ||
// and return branch and merge instructions in |branchInst| and |mergeInst| | ||
// and the boolean constant in |condVal|. | ||
bool GetConstConditionalBranch(ir::BasicBlock* bp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend renaming this to GetConstConditionalSelectionBranch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
*condVal = true; | ||
*condIsConst = true; | ||
} break; | ||
case SpvOpLogicalNot: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
const uint32_t deadLabId = condVal == true ? falseLabId : trueLabId; | ||
AddBranch(liveLabId, *bi); | ||
def_use_mgr_->KillInst(br); | ||
def_use_mgr_->KillInst(mergeInst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there's a nested structured selection that needs to be able to break out to this named merge block? I.e. there is a deeply nested break block that branches out to the merge block. But now it's no longer a named merge block.
Example:
%selection0 = OpLabel
OpSelectionMerge %merge0 None ;; Candidate for removal. Declares %merge0 as merge block
OpBranchConditional %constantfalse %selection1 %merge0 ;; Candidate for removal.
; The nested selection, a one -armed "if"
%selection1 = OpLabel
OpSelectionMerge %merge1 None
%cond = OpLoad %bool %myboolvar
OpBranchConditional %cond %merge1 %body1
%body1 = OpLabel ; break block ,i.e. branches out several levels of nesting.
OpBranch %merge0 ;; By structured control flow rules, needs %merge0 to be named as a merge block
%merge1 = OpLabel
OpBranch %merge0
%merge0 = OpLabel
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little surprised by this. The "impression" I got from the spec is that this is not legal, although I grant that one could interpret it as being legal.
How sure are you about this? Have there been discussions where this was specifically called out as legal?
If necessary, I would "fix" this by leaving the BranchConditional (and OpSelectionMerge) but having the "dead" branch go right to the merge block (if it doesn't already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right. I was misremembering.
Reading the spec again, you can only go out to the next outer merge block. You can't jump out two like I suggested. In short, structured control flow rules can't implement named-break like Java has. (or named "last" in Perl).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not exactly. What you can't do is break out of two Selection constructs. You can break out of the middle of a Selection to the merge block of an enclosing loop. So that's a branch that skips over a merge block in between out to an outer merge.
Example is generated from:
void main() {
int x = 0;
bool c, d;
while (d) {
if (c) {
x = 1;
break; // goes out to the y=x statement, skipping over the merge for this "if".
x = 2;
} else {
x = 3;
break; // goes out to the y = x statement.
x = 4;
}
}
int y = x;
}
But this isn't the case you're handling. So your conclusion is fine.
source/opt/dead_branch_elim_pass.cpp
Outdated
|
||
// Iterate to merge block deleting dead blocks | ||
std::unordered_set<uint32_t> deadLabIds; | ||
deadLabIds.insert(deadLabId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a difficulty here because, I believe, the structured order of blocks could put a continue target we break out to before an inner merge block. (Example below)
So... When one of the candidates to kill is a "continue block", then it this could try to kill the corresponding continue target, and that would also orphan the reference in the corresponding OpLoopMerge.
Example is schematically similar to:
for (...) { // names a continue target
if ( ... ) {
break; // a branch to the continue target.
// but the merge block for this structured selection will appear *after* the continue target in the structured order.
// So we'll want to sweep away the continue target block when we shouldn't.
}
; continue target is conceptually here.
}
More explicitly here's a full example:
OpCapability Shader
OpMemoryModel Logical GLSL450
OpEntryPoint Fragment %foo "foo"
%bool = OpTypeBool
%void = OpTypeVoid
%void_fn = OpTypeFunction %void
%boolptr = OpTypePointer Function %bool
%false = OpConstantFalse %bool
%foo = OpFunction %void None %void_fn
%entry = OpLabel
%loopcondition_var = OpVariable %boolptr Function
%loopcondition = OpLoad %bool %loopcondition_var
OpBranch %loop
%loop = OpLabel
OpLoopMerge %mergeloop %continuetarget None
OpBranchConditional %loopcondition %mergeloop %body
%body = OpLabel
OpSelectionMerge %mergeselection None
OpBranchConditional %false %continuetarget %mergeselection
%mergeselection = OpLabel
OpKill
%continuetarget = OpLabel
OpBranchConditional %loopcondition %loop %mergeloop
%mergeloop = OpLabel
OpReturn
OpFunctionEnd
The above example passes validation. Running it through your algorithm, I get this result:
OpCapability Shader
OpMemoryModel Logical GLSL450
OpEntryPoint Fragment %1 "foo"
%bool = OpTypeBool
%void = OpTypeVoid
%4 = OpTypeFunction %void
%_ptr_Function_bool = OpTypePointer Function %bool
%false = OpConstantFalse %bool
%1 = OpFunction %void None %4
%7 = OpLabel
%8 = OpVariable %_ptr_Function_bool Function
%9 = OpLoad %bool %8
OpBranch %10
%10 = OpLabel
OpLoopMerge %11 %12 None ;;;;;;;;;;;;;;;; Whoops this is a dead reference to %12, and is invalid.
OpBranchConditional %9 %11 %13
%13 = OpLabel
OpBranch %14
%14 = OpLabel
OpKill
%11 = OpLabel
OpReturn
OpFunctionEnd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all due respect for the validator, I would argue that the input program is invalid.
Because %continuetarget is the target of an OpSelectionMerge/OpBranchCondition (and it is not the merge block), that means that the %continuetarget block is inside of the Selection construct. By the validation rules, if that block branches out of the Selection construct, it can only branch to the Selection construct's merge block, or the continue or break of the innermost loop. But this block potentially branches to the loop's header, which is not one of the choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point taken that the validator is incomplete, and might have bugs. :-)
But if I follow your line of reasoning, there would be no way to escape an inner selection construct out to a continue target of an enclosing loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a key point that the continue target is named by the LoopMerge instruction of the enclosing loop. The loop encloses the selection construct because the loop header %loop dominates the selection header %body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, a clearer way to see it is that I can have multiple selection constructs inside the loop, each of them branching out to the continue target. The continue target is then not dominated by any of those inner selections, but only by the loop header.
Concretely, the body of the loop would be a sequence of selections structured something like:
if (cond) { continue; }
if (cond2) { continue; }
if (cond3) { continue; }
Each of those "continue" statements is a branch the continue target. And so the continue target is not in any one of the selections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the block which I believe causes invalid code:
%body = OpLabel
OpSelectionMerge %mergeselection None
OpBranchConditional %false %continuetarget %mergeselection
Here is how I would rewrite it so that it is valid (by my reading) and still does exactly what you wish:
%body = OpLabel
OpSelectionMerge %mergeselection None
OpBranchConditional %false %continueblock %mergeselection
%continueblock = OpLabel
OpBranch %continuetarget
But even then I am still dropping the continue target. I will investigate further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this more, I believe even the code I have written is still not valid. The reason is that the %continuetarget block is still part of the %body select construct and this is invalid because it is not legal for the backedge to come out of a select construct. It is part of the select construct because it is dominated by the %body block and not dominated by the %selectionmerge block.
The big problem here is the OpKill in the %selectionmerge block, which causes the %selectionmerge block to NOT dominate ANY blocks in the loop.
I have tried to come up with a valid version of this code that causes a problem but have not been able to. Please let me know if you think you have one or disagree with my assessment on the validity of this code.
No worries on the confusion. That's the point of reviews. :-) But there are some algorithmic problems. See my examples posted. |
Looking at the issues you raised, I realized it is possible for the current algorithm to create invalid code for some cases. I have just added code which should address these cases. I added some logic near the end of the processing of a constant conditional branch. If the merge block associated with the constant branch ends up with no predecessors (dead), we go back and replace the replacement branch, going back to the original constant conditional branch (with preceeding SelectionMerge). One slight difference is that the "dead" label is now the merge label. This is to prevent creating a dead block from the merge block, which could cause the code to become invalid. There may be more elegant ways to do this, but this works for now while still allowing us to largely achieve our goals of reducing size by eliminating dead branch code. I also added a test using your code which exercises this new functionality. |
I still think the algorithm is incorrectly sweeping away blocks. If I were doing this, I'd consider simplifying the pass so it only simplifies conditional branches and leaves dead code elimination to a different pass. But coder's choice. |
Here are two reasons I thought it best if I attempted some form of dead code elimination in this pass:
I think it would be good to write the more general routine sometime soon. Perhaps such a routine would be a necessary component to generating valid code from glslang, similar to how JohnK is thinking about generating valid sequences in other areas. |
I agree that it would be nice to do targeted elimination of dead blocks, provided the code complexity is low enough. I think employing appropriately-formulated dominance would get us there. I'm not clear on what invalid state you can get into without the dead code elimination. Can you give an example? (Hmm. I see that OpSelectionMerge is not allowed to immediately precede OpBranch. I dislike that, but it's the rule.) For HLSL compilation, we're considering allowing an "expanded dialect" form of SPIR-V for Vulkan immediately after Glslang generates it. But I believe it's still fully valid by core SPIR-V rules; it's just that we can relax (violate) some of the Vulkan-specific rules. |
Regarding the fact that OpSelectionMerge can't immediately precede OpBranch. A workaround is like this:
Instead of changing that BranchConditional to an OpBranch (invalid by the rule I don't like), change it to:
It's silly, but it should work. I'd leave it to future work to detect if there are non-local breaks out of the nested constructs out to the merge block: if there are none, then you can eliminate the OpSelectionMerge altogether. |
Re invalid code caused by dead blocks, re-reading the spec, it seems I misremembered the structured control flow rules. I was worried about dead blocks branching into a structured construct. I was misremembering that no branches were allowed from outside the construct into the construct, but the rule is that the header must dominate its construct, so I guess such dead blocks do not effect that dominance. So one could break dead block elimination into a separate pass. |
Re OpSelectionMerge preceding OpBranch, I also thought of the double live operand trick, but was a little worried about surprising a driver's shader compiler that may have never seen such a thing. |
HLSL depends on dead-code elimination to remove some forms of invalid code. One example, if I recall correctly, the call tree of dead functions does not have to be complete, only the call tree root by the actual entry point. (HLSL can have multiple things that look like an entry point, but only one is chosen at a time to be the actual entry point.) Currently, glslang has its own algorithm to look at call trees and eliminate functions that cannot be reached from the entry point to avoid generating invalid code. I don't know how much this concept generalizes, but there is this idea in HLSL that dead code can be invalid to some degree. (Of course, and I think this is already understood, I just don't have enough context, using if-then-else to conditionally set something that must be statically known is invalid, so has to be eliminated, under the assumption such cases will always be statically determinable to be constant conditions.) PS. @greg-lunarg I'll notice this topic had a question for me faster with an @johnkslang than with a JohnK. |
@greg-lunarg : Let me know how you'll proceed. I'd like to update https://github.com/KhronosGroup/SPIRV-Tools/projects/1 with a more accurate breakdown of the work. |
By "eliminate dead blocks inside a function", I presume you mean blocks that are not reachable from the first block. Notice that this a slightly different beast than dead code elimination, which usually focuses on an instruction's uses. Standard DCE looks at an instruction and if its result is not used, deletes it. Aggressive DCE starts with live stores and then works backward, marking live everything it uses, finally deleting everything not marked live. Note that neither of these necessarily gets rid of all unreachable blocks. So I would propose a pass called --eliminate-unreachable-blocks which would do just that. I think I could get authorization to write such a pass, especially if it was thought that such a pass would be necessary to create valid SPIR-V for some shaders. @dneto Can you please clarify?: Are you suggesting that acceptance of this PR is contingent on such a pass? I would request that such a contingency not be put in place, but rather create a TODO for such a pass in the future, at which time we remove the dead block elimination in this pass. As for dead functions and types, I have authorization to put such transforms/passes into spirv-opt. While they are functionally fairly distinct and separate functions should be written for each, we may want to ultimately combine them into a single pass which removes all dead instructions at a module level, maybe --dce-module. Re #699, I have already written such a beast, which looks through an instruction's uses for debug and annotations and deletes them before deleting the instruction. I just need to put it into a shared location. I had been thinking that the proposed "optimization context" which enables passing of data from one pass to another (such as def_use_mgr_) would be a good place for these shared helper functions. @dneto0: Do you agree? |
Yes, "eliminate dead blocks in function" means remove blocks that are not reachable from the entry point, nor are the merge block or continue target for any reachable loop header or selection header. Acceptance of this PR is contingent on fixing the bugs. If you can keep the current strategy of doing some DCE in this pass and make it correct then that's perfectly fine by me. In the DCE discussion I just wanted clarity as to what bits are imminent vs. left for later. For example if @johnkslang wants to write the "eliminate dead blocks in a function" pass then he should be able to grab it independently. Yes, I later saw that #699 has almost everything I was asking for regarding more generic/robust/flexible removal of a value. Yes, the planned-for optimization context could be a good natural place for that functionality. Understand that I make lots of little TODOs in the form of tasks/cards/issues as a memory aid and organizing mechanism, without attached expectation of due date. |
I think I have addressed all bugs/concerns at this point. Let me know if I missed something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly there.
Thanks for fixing the too-aggressive sweeping of continue blocks.
I found a couple of small things, though.
- See "If a dead block is a header block". Fix comment.
- By same logic, if a dead block is a loop header, then its continue target is also dead.
- I think the HasNonPhiRef will always be true for a merge block; so that fixup code to add back a branch can't ever execute. Should remove it. Also, unreachable merge blocks are ok to keep anyway.
If I'm wrong, let me know.
source/opt/dead_branch_elim_pass.cpp
Outdated
(*dbi)->ForEachSuccessorLabel([&deadSuccIds](uint32_t succ) { | ||
deadSuccIds.insert(succ); | ||
}); | ||
// If dead block is merge block, add its merge block to dead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If dead block is a header block, ...."
By the same logic, shouldn't we also add its continue target if it has one? If the header block of a construct is dead then all its nested constructs would be dead too.
Optional to fix, since it could be cleaned up with a separate "remove dead blocks" pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. I will make the change.
const uint32_t deadLabId = condVal == true ? falseLabId : trueLabId; | ||
AddBranch(liveLabId, *bi); | ||
def_use_mgr_->KillInst(br); | ||
def_use_mgr_->KillInst(mergeInst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not exactly. What you can't do is break out of two Selection constructs. You can break out of the middle of a Selection to the merge block of an enclosing loop. So that's a branch that skips over a merge block in between out to an outer merge.
Example is generated from:
void main() {
int x = 0;
bool c, d;
while (d) {
if (c) {
x = 1;
break; // goes out to the y=x statement, skipping over the merge for this "if".
x = 2;
} else {
x = 3;
break; // goes out to the y = x statement.
x = 4;
}
}
int y = x;
}
But this isn't the case you're handling. So your conclusion is fine.
// Process phi instructions in merge block. | ||
// deadLabIds are now blocks which cannot precede merge block. | ||
// If eliminated branch is to merge label, add current block to dead blocks. | ||
if (deadLabId == mergeLabId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised that the merge block for a selection merge can only have two predecessors (!).
But I think you're right, due to the "I can't break out of two levels of selection constructs" rule.
It's more tricky if you're trying to handle OpSwitch, or a loop. But you're not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is not obvious.
// and the mergeblock as the false branch. This is done so the merge block | ||
// is not orphaned, which could cause invalid control flow in certain case. | ||
// TODO(greg-lunarg): Do this only in cases where invalid code is caused. | ||
if (!HasNonPhiRef(mergeLabId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a merge block is always referenced from the merge instruction that names it. So HasNonPhiRef(mergeLabId) is always true. And doesn't that mean this "if" block is dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's valid for a merge block to be unreachable. That's explicitly called out in the spec
each header block must strictly dominate its merge block, unless the merge block is unreachable in the CFG
It's because of cases like:
if (c) { return; } else { return; }
where_the_merge_block_goes(); // never reached
So I don't think you need this block of code at all. ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the point that I test if the (selection) merge block has non-phi refs, I have already removed the selection merge instruction that pointed to it. So we can get an unreachable (former) merge block. I even added a test that hits this very logic.
The case I was worried about (and still am worried about) is when dead branch elimination severs a loop header from its continue:
for (...) {
...
if (true)
discard;
...
}
to
for (...) {
...
discard;
...
}
Your recent proposal on latch edges notwithstanding, until the spec is changed, I am uncomfortable generating code that seems to be invalid. I would be more comfortable generating such a sequence once we have an unreachable code removal pass that can fix such sequences and make them valid by removing the unreachable continue as as removing any affected loop merge instruction. We would also need to warn users in the documentation that this pass can create invalid code and that running the unreachable code deleter would likely be necessary to restore validity.
Are you ok with this more conservative path?
Re your latch edge proposal, the only concern I had with it was the requirement that all latch edges point to a loop header. Doesn't that outlaw unreachable blocks that don't point to a loop header?
What is the status of this proposal? Is there something stopping it?
Another possible approach is to keep the current language describing back edge, but use a modified DFS similar to what I use where the DFS, when visiting a loop header, visits the merge block and continue block first and then all its other successors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had missed this point:
At the point that I test if the (selection) merge block has non-phi refs, I have already removed the selection merge instruction that pointed to it. So we can get an unreachable (former) merge block. I even added a test that hits this very logic.
Looking again, I see the OrphanMerge, and experimentation proves that it triggers that case.
In reading the code, I got the fact that the loop finding dead blocks avoids adding the merge block, but then missed the significance of the "if (deadLabId == mergeLabId) deadSuccIds.insert(...) " code sequence.
Ok. Looks good to me now. You've even got the TODO there.
Fair enough about the latch edge issue. I'm more comfortable being aggressive there because I've already seen CTS test cases working even though they use the more aggressive rule. I only made the rule when I noticed the CTS test cases need something more general than the spec or validator was willing to give.
Thanks for the continuing very thorough and thoughtful reviews. I think I have addressed all of your latest concerns. Let me know if you would like any other changes. |
Looks good. Thanks! About trying to be careful: I try. I also am willing to be wrong. I figure if you're right 90% of the time and I'm right 90% of the time, but our wrongness doesn't overlap, then together we'll get to 99% right by correcting each other's mistakes. :-) |
Squashed, rebased, and pushed onto master as 52e247f |
…#675) This method needs to live a little longer to allow downstream users to migrate without causing build breakages. This is part of KhronosGroup#673
The downstream users that use this method have been updated. SetHLSLShaderModel is the replacement. Fixes KhronosGroup#673
This is another in a series of passes designed to reduce SPIR-V file size.
For each entry point function, this pass will look for BranchConditionals
with constant condition and convert to a branch. Any resulting dead blocks
are eliminated. For all phi functions in merge block, replace all uses with the
id corresponding to the living predecessor.
This pass requires that all control flow be structured to allow optimization.
This pass is most effective when preceeded by passes which eliminate
local loads and stores, effectively propagating constant values into BranchConditionals
where possible.
Note that is pass contains functions and data from previous passes in this series.
These will be coalesced under the planned optimization context class.