Add DeadBranchElimPass #673

greg-lunarg · 2017-06-07T16:20:48Z

This is another in a series of passes designed to reduce SPIR-V file size.

For each entry point function, this pass will look for BranchConditionals
with constant condition and convert to a branch. Any resulting dead blocks
are eliminated. For all phi functions in merge block, replace all uses with the
id corresponding to the living predecessor.

This pass requires that all control flow be structured to allow optimization.

This pass is most effective when preceeded by passes which eliminate
local loads and stores, effectively propagating constant values into BranchConditionals
where possible.

Note that is pass contains functions and data from previous passes in this series.
These will be coalesced under the planned optimization context class.

atgoo · 2017-06-20T15:13:23Z

source/opt/dead_branch_elim_pass.cpp

+
+Pass::Status DeadBranchElimPass::ProcessImpl() {
+  // Current functionality assumes structured control flow. 
+  // TODO(): Handle non-structured control-flow.


Consider adding your username or email to TODO.

atgoo · 2017-06-20T15:14:48Z

source/opt/dead_branch_elim_pass.cpp

+
+  bool modified = false;
+  // Call Mem2Reg on all remaining functions.
+  for (auto& e : module_->entry_points()) {


const auto&

atgoo · 2017-06-20T15:17:11Z

source/opt/dead_branch_elim_pass.cpp

+static const int kSpvBranchCondTrueLabId = 1;
+static const int kSpvBranchCondFalseLabId = 2;
+static const int kSpvSelectionMergeMergeBlockId = 0;
+static const int kSpvPhiVal0Id = 0;


Consider better naming and moving these constants in the body of the function where they are used.

Also consider using anonymous namespace instead of static.

The structure of these names is:

kSpv[Name of Operation][NameOfOperand]

My goal was to avoid just have having magic numbers all over the code, and if at some future point an instruction changed we might have some hope of tracking its presence down.

Can you suggest a better structure? I am trying to keep their length somewhat manageable.

Spv is not really needed, since it's a local variable. I would add InIdx at the end, to signify that these are not ids, but 'in operand indices'. I would also move these into the functions where they are used.
const uint32_t kPhiVal0IdInIdx = 0;

OK. I added an underscore to the names to separate the operator from the operand names, for example: kSpvBranchCond_TrueLabId.

I also make them non-static and anonymous.

Ok. Removed Spv and added InIdx. But I would suggest keeping them at global scope allows them to be used in multiple functions, thus reducing redundancy.

OK. I took out underscores.

OK. And changed them to uint32_t

atgoo · 2017-06-20T18:08:21Z

source/opt/dead_branch_elim_pass.cpp

+  spvtools::CFA<ir::BasicBlock>::DepthFirstTraversal(
+    &*func->begin(), StructuredSuccessorsFunction(), ignore_block,
+    [&](cbb_ptr b) { order->push_front(const_cast<ir::BasicBlock*>(b)); }, ignore_edge);
+}


This could be more readable this way:

auto ignore_block = {};
auto ignore_edge = [](cbb_ptr, cbb_ptr) {};
auto get_structured_successors = [this](const ir::BasicBlock* block) { return &(block2structured_succs_[block]); };
auto whatever = [&](cbb_ptr b) { order->push_front(const_castir::BasicBlock*(b)); }

spvtools::CFAir::BasicBlock::DepthFirstTraversal(
&*func->begin(), get_structured_successors, ignore_block, whatever, ignore_edge);

atgoo · 2017-06-20T18:23:28Z

source/opt/dead_branch_elim_pass.cpp

+    uint32_t condId, bool* condVal, bool* condIsConst) {
+  ir::Instruction* cInst = def_use_mgr_->GetDef(condId);
+  switch (cInst->opcode()) {
+  case SpvOpConstantFalse: {


Missing indentation before case.

atgoo · 2017-06-20T18:34:23Z

source/opt/dead_branch_elim_pass.cpp

+    (void)GetConstCondition(cInst->GetSingleWordInOperand(0),
+        condVal, condIsConst);
+    if (*condIsConst)
+      *condVal = !*condVal;


Consider
bool negated_value;
if (!IsConstCondition(cInst->GetSingleWordInOperand(0), &negated_value))
return false;

*condVal = !negated_value
return true;

atgoo · 2017-06-20T19:00:55Z

source/opt/dead_branch_elim_pass.cpp

+      continue;
+
+    // Replace conditional branch with unconditional branch
+    uint32_t trueLabId = br->GetSingleWordInOperand(kSpvBranchCondTrueLabId);


Consider adding const qualifier to local variables.

dneto0 · 2017-06-20T20:14:57Z

Hey @atgoo: Does the algorithm look correct to you? Is there enough test coverage to be confident that this algorithm isn't broken and won't break under reasonable changes elsewhere? Are any of your suggestions blocking?

atgoo · 2017-06-20T20:48:58Z

No, my suggestions are about readability. I'm not qualified to judge if the logic is correct on a large scale, and I didn't find any problems on a tactical level.

…t issue

dneto0 · 2017-06-30T18:10:18Z

Brainstorming cases after reading just the MR description (not a request for feedback or changes. Just personal notes!):

How do you determine a block is dead?
A dead block could itself be a merge block or a continue target. In that case the block has to survive in order to fulfill its role in structured control flow. Example is:

   %entry = OpLabel
   OpBranch %loop

  %loop = OpLabel
  OpLoopMerge %merge %loop None
  OpBranchConditional %false %merge %loop  ; tempted to kill %merge

  %merge = OpLabel  ;;  This has to stick around. Even if it's unreachable from the function entry block.
  OpReturn

greg-lunarg · 2017-06-30T19:44:06Z

Interesting example.

My intent with this optimization was to only process OpBranchConditionals that are preceded by by OpSelectionMerge. If the code does not reflect that, I will make that change immediately.

Sorry for the confusion.

greg-lunarg · 2017-06-30T19:49:31Z

The code does only process OpSelectionMerge. I will make that clearer in the comments.

…ocessed

dneto0

The algorithm has a couple of problems. I've given a couple of examples.

dneto0 · 2017-06-30T18:11:53Z

source/opt/dead_branch_elim_pass.cpp

+const uint32_t kSelectionMergeMergeBlockIdInIdx = 0;
+const uint32_t kPhiVal0IdInIdx = 0;
+const uint32_t kPhiLab0IdInIdx = 1;
+const uint32_t kPhiVal1IdInIdx = 2;


Reading this patch in order, it's mysterious why phi input branch pairs 0 and 1 are the only ones mentioned... Hope there's a loop later. :-)

It is because only OpSelectionMerge/OpBranchConditional is optimized (for the moment). This means that when patching phis in the merge block, we only have to deal with phis with 2 and only 2 pairs.

dneto0 · 2017-06-30T18:14:43Z

source/opt/dead_branch_elim_pass.h

+  Status Process(ir::Module*) override;
+
+ private:
+  // Returns the id of the merge block declared by a merge instruction in 


This and the next method raise the question in my mind about continue targets.

dneto0 · 2017-06-30T18:15:26Z

source/opt/dead_branch_elim_pass.h

+  // this block, if any.  If none, returns zero.
+  uint32_t MergeBlockIdIfAny(const ir::BasicBlock& blk) const;
+
+  // Compute structured successors for function |func|.


Do we not need to treat continue target specially as well? It's similar to a merge block in that we might branch out of a nested construct to a nesting continue target.

While the spec does mention the possibility of the continue target being unreachable, given the other requirements, I am having difficulty coming up with a flow graph where this is the case. Here is my thinking:

A loop must have a back edge

There must be path from the header to the backedge block (otherwise it can't be a backedge)

The continue target must dominate the backedge block
Thus, there must be a path from the header to the continue target
Thus the continue target is only unreachable iff the header is unreachable.

So it seems to me that the continue target does not to be treated specially.

There's a few things going on here:

Surprisingly, the definition of "back-edge" is still up for debate.
See Need to refine definition of a back edge used by structured control flow rules SPIRV-Headers#16 and the corresponding internal issue https://gitlab.khronos.org/spirv/SPIR-V/issues/78

I got involved because there are simple shaders that leave the continue target unreachable, and those exist in the Vulkan CTS suite. I had to adjust the validator to accommodate them. For example, this fragment shader:

#version 310 es void main() { bool c; while(c) { break; } }

will compile via Glslang into:

OpCapability Shader %1 = OpExtInstImport "GLSL.std.450" OpMemoryModel Logical GLSL450 OpEntryPoint Fragment %main "main" OpExecutionMode %main OriginUpperLeft OpSource ESSL 310 OpName %main "main" OpName %c "c" %void = OpTypeVoid %3 = OpTypeFunction %void %bool = OpTypeBool %_ptr_Function_bool = OpTypePointer Function %bool %main = OpFunction %void None %3 %5 = OpLabel %c = OpVariable %_ptr_Function_bool Function OpBranch %6 %6 = OpLabel OpLoopMerge %8 %9 None OpBranch %10 %10 = OpLabel %14 = OpLoad %bool %c OpBranchConditional %14 %7 %8 %7 = OpLabel OpBranch %8 %9 = OpLabel OpBranch %6 %8 = OpLabel OpReturn OpFunctionEnd

That has continue target block %9 unreachable. and yet we still want the branch from %9 to %6 to be a back edge.

There was a recent fix (I think in SPIR-V 1.0 Rev10) in structured control flow rules to say "the loop header must dominate the Continue Target, unless the Continue Target is unreachable in the CFG" (emphasis mine). This was to accommodate the case in 1.

Even given all that, the reason you're computing these "structured successors" is to list the blocks that you can safely delete once you delete a newly-found dead block. Those blocks are characterized by being the blocks dominated by the newly-dead block. The structured successors is a proxy for dominance. But a newly-dead block can escape out to an outer continue target. So my intuition thinks there's still something fishy here. I'll go try out some more cases through the new rev of your code.

Ok, here's a concrete example of my concern. (See point 3 in previous comment).

Example fragment shader:

#version 450 void main() { bool c; bool d; while(c) { if(d) { continue; } if(false) { // setting this up to be killed. branches out to an outer continue target. continue; } discard; } }

Compiles to:

OpCapability Shader %1 = OpExtInstImport "GLSL.std.450" OpMemoryModel Logical GLSL450 OpEntryPoint Fragment %main "main" OpExecutionMode %main OriginUpperLeft OpSource GLSL 450 OpName %main "main" OpName %c "c" OpName %d "d" %void = OpTypeVoid %3 = OpTypeFunction %void %bool = OpTypeBool %_ptr_Function_bool = OpTypePointer Function %bool %false = OpConstantFalse %bool %main = OpFunction %void None %3 %5 = OpLabel %c = OpVariable %_ptr_Function_bool Function %d = OpVariable %_ptr_Function_bool Function OpBranch %6 %6 = OpLabel OpLoopMerge %8 %9 None OpBranch %10 %10 = OpLabel %14 = OpLoad %bool %c OpBranchConditional %14 %7 %8 %7 = OpLabel %16 = OpLoad %bool %d OpSelectionMerge %18 None OpBranchConditional %16 %17 %18 %17 = OpLabel OpBranch %9 %18 = OpLabel OpSelectionMerge %22 None OpBranchConditional %false %21 %22 %21 = OpLabel OpBranch %9 %22 = OpLabel OpKill %9 = OpLabel OpBranch %6 %8 = OpLabel OpReturn OpFunctionEnd

Control flow graph is:

Then running through this optimization pass, I get:

OpCapability Shader %1 = OpExtInstImport "GLSL.std.450" OpMemoryModel Logical GLSL450 OpEntryPoint Fragment %main "main" OpExecutionMode %main OriginUpperLeft OpSource GLSL 450 OpName %main "main" OpName %c "c" OpName %d "d" %void = OpTypeVoid %3 = OpTypeFunction %void %bool = OpTypeBool %_ptr_Function_bool = OpTypePointer Function %bool %false = OpConstantFalse %bool %main = OpFunction %void None %3 %5 = OpLabel %c = OpVariable %_ptr_Function_bool Function %d = OpVariable %_ptr_Function_bool Function OpBranch %6 %6 = OpLabel OpLoopMerge %8 %9 None OpBranch %10 %10 = OpLabel %14 = OpLoad %bool %c OpBranchConditional %14 %7 %8 %7 = OpLabel %16 = OpLoad %bool %d OpSelectionMerge %18 None OpBranchConditional %16 %17 %18 %17 = OpLabel OpBranch %9 %18 = OpLabel OpBranch %22 %22 = OpLabel OpKill %8 = OpLabel OpReturn OpFunctionEnd

That's invalid because block %9 has been swept away but is mentioned as the continue target of the loop, and as the branch target of block %17.

The problem is that you determine that block %21 is dead (that's right) and all the blocks it dominates should be swept away. But your algorithm incorrectly determines that block %9 should be included in that set.

You make a good point that, whether valid or invalid, glslang can generate this sequence and it would be bad to break it.

While I think a more general pass to clean up and normalize such sequences might be the best long term solution, I think a solution (which I think you suggested at one point) might be best in the short term: have ComputeStructuredSuccessors add the continue target after the merge block and before all the "real" successors. This should guarantee that in the "structured order", the continue target will appear after any loop code not part of the continue sequence. Since the continue target dominates the backedge and the backedge post-dominates the continue target, this should remove any possibility of continue code being eliminated by this pass.

FWIW, despite the seeming questions around valid control flow, I am going to continue the policy of not generating any new unreachable blocks with this pass, even if it means reinserting a BranchConditional. This already has a TODO on it to revisit.

dneto0 · 2017-06-30T18:48:24Z

source/opt/dead_branch_elim_pass.cpp

+        br->GetSingleWordInOperand(kBranchCondFalseLabIdInIdx);
+    const uint32_t mergeLabId =
+        mergeInst->GetSingleWordInOperand(kSelectionMergeMergeBlockIdInIdx);
+    const uint32_t liveLabId = condVal == true ? trueLabId : falseLabId;


This is ok, but generally I just use the absorption rule to collapse "condVal == true" to just "condVal". :-)

Yeah. Its almost more of a readability thing here.

dneto0 · 2017-06-30T19:00:47Z

source/opt/dead_branch_elim_pass.h

+  // If block |bp| contains constant conditional branch, return true
+  // and return branch and merge instructions in |branchInst| and |mergeInst|
+  // and the boolean constant in |condVal|. 
+  bool GetConstConditionalBranch(ir::BasicBlock* bp,


But we can have conditional branches that don't have merge instruction.
Examples include:

%entry= OpLabel OpBranch %loop %loop = OpLabel OpLoopMerge %merge %body None OpBranch %body %body = OpLabel %cond = .... %bool OpBranch %cond %exit %loop ;; this back edge does not have a preceding merge instruction %merge = OpLabel

Similarly, in an example generated from this:

for (...) { if ( cond) { break; // this breaks out to the merge block,, and it could have been compiled into an OpBranchConditional }

(Already covered by request to clarify)

So I think we are good here as I clarified that this pass is only targeting OpBranchConditionals preceded by OpSelectionMerge.

dneto0 · 2017-06-30T19:05:06Z

source/opt/dead_branch_elim_pass.h

+  void KillAllInsts(ir::BasicBlock* bp);
+
+  // If block |bp| contains constant conditional branch, return true
+  // and return branch and merge instructions in |branchInst| and |mergeInst|


If block |bp| contains constant conditional branch that is also preceded by a selection merge, return true and ...

dneto0 · 2017-06-30T19:05:58Z

source/opt/dead_branch_elim_pass.h

+  // If block |bp| contains constant conditional branch, return true
+  // and return branch and merge instructions in |branchInst| and |mergeInst|
+  // and the boolean constant in |condVal|. 
+  bool GetConstConditionalBranch(ir::BasicBlock* bp,


Recommend renaming this to GetConstConditionalSelectionBranch

dneto0 · 2017-06-30T19:09:59Z

source/opt/dead_branch_elim_pass.cpp

+      *condVal = true;
+      *condIsConst = true;
+    } break;
+    case SpvOpLogicalNot: {


dneto0 · 2017-06-30T19:26:44Z

source/opt/dead_branch_elim_pass.cpp

+    const uint32_t deadLabId = condVal == true ? falseLabId : trueLabId;
+    AddBranch(liveLabId, *bi);
+    def_use_mgr_->KillInst(br);
+    def_use_mgr_->KillInst(mergeInst);


What if there's a nested structured selection that needs to be able to break out to this named merge block? I.e. there is a deeply nested break block that branches out to the merge block. But now it's no longer a named merge block.

Example:

%selection0 = OpLabel OpSelectionMerge %merge0 None ;; Candidate for removal. Declares %merge0 as merge block OpBranchConditional %constantfalse %selection1 %merge0 ;; Candidate for removal. ; The nested selection, a one -armed "if" %selection1 = OpLabel OpSelectionMerge %merge1 None %cond = OpLoad %bool %myboolvar OpBranchConditional %cond %merge1 %body1 %body1 = OpLabel ; break block ,i.e. branches out several levels of nesting. OpBranch %merge0 ;; By structured control flow rules, needs %merge0 to be named as a merge block %merge1 = OpLabel OpBranch %merge0 %merge0 = OpLabel ...

I am a little surprised by this. The "impression" I got from the spec is that this is not legal, although I grant that one could interpret it as being legal.

How sure are you about this? Have there been discussions where this was specifically called out as legal?

If necessary, I would "fix" this by leaving the BranchConditional (and OpSelectionMerge) but having the "dead" branch go right to the merge block (if it doesn't already).

Oh you're right. I was misremembering.
Reading the spec again, you can only go out to the next outer merge block. You can't jump out two like I suggested. In short, structured control flow rules can't implement named-break like Java has. (or named "last" in Perl).

Well, not exactly. What you can't do is break out of two Selection constructs. You can break out of the middle of a Selection to the merge block of an enclosing loop. So that's a branch that skips over a merge block in between out to an outer merge.
Example is generated from:

void main() { int x = 0; bool c, d; while (d) { if (c) { x = 1; break; // goes out to the y=x statement, skipping over the merge for this "if". x = 2; } else { x = 3; break; // goes out to the y = x statement. x = 4; } } int y = x; }

But this isn't the case you're handling. So your conclusion is fine.

dneto0 · 2017-06-30T20:06:00Z

source/opt/dead_branch_elim_pass.cpp

+
+    // Iterate to merge block deleting dead blocks
+    std::unordered_set<uint32_t> deadLabIds;
+    deadLabIds.insert(deadLabId);


There's a difficulty here because, I believe, the structured order of blocks could put a continue target we break out to before an inner merge block. (Example below)

So... When one of the candidates to kill is a "continue block", then it this could try to kill the corresponding continue target, and that would also orphan the reference in the corresponding OpLoopMerge.

Example is schematically similar to:

for (...) { // names a continue target if ( ... ) { break; // a branch to the continue target. // but the merge block for this structured selection will appear *after* the continue target in the structured order. // So we'll want to sweep away the continue target block when we shouldn't. } ; continue target is conceptually here. }

More explicitly here's a full example:

OpCapability Shader OpMemoryModel Logical GLSL450 OpEntryPoint Fragment %foo "foo" %bool = OpTypeBool %void = OpTypeVoid %void_fn = OpTypeFunction %void %boolptr = OpTypePointer Function %bool %false = OpConstantFalse %bool %foo = OpFunction %void None %void_fn %entry = OpLabel %loopcondition_var = OpVariable %boolptr Function %loopcondition = OpLoad %bool %loopcondition_var OpBranch %loop %loop = OpLabel OpLoopMerge %mergeloop %continuetarget None OpBranchConditional %loopcondition %mergeloop %body %body = OpLabel OpSelectionMerge %mergeselection None OpBranchConditional %false %continuetarget %mergeselection %mergeselection = OpLabel OpKill %continuetarget = OpLabel OpBranchConditional %loopcondition %loop %mergeloop %mergeloop = OpLabel OpReturn OpFunctionEnd

The above example passes validation. Running it through your algorithm, I get this result:

OpCapability Shader OpMemoryModel Logical GLSL450 OpEntryPoint Fragment %1 "foo" %bool = OpTypeBool %void = OpTypeVoid %4 = OpTypeFunction %void %_ptr_Function_bool = OpTypePointer Function %bool %false = OpConstantFalse %bool %1 = OpFunction %void None %4 %7 = OpLabel %8 = OpVariable %_ptr_Function_bool Function %9 = OpLoad %bool %8 OpBranch %10 %10 = OpLabel OpLoopMerge %11 %12 None ;;;;;;;;;;;;;;;; Whoops this is a dead reference to %12, and is invalid. OpBranchConditional %9 %11 %13 %13 = OpLabel OpBranch %14 %14 = OpLabel OpKill %11 = OpLabel OpReturn OpFunctionEnd

With all due respect for the validator, I would argue that the input program is invalid.

Because %continuetarget is the target of an OpSelectionMerge/OpBranchCondition (and it is not the merge block), that means that the %continuetarget block is inside of the Selection construct. By the validation rules, if that block branches out of the Selection construct, it can only branch to the Selection construct's merge block, or the continue or break of the innermost loop. But this block potentially branches to the loop's header, which is not one of the choices.

Point taken that the validator is incomplete, and might have bugs. :-)

But if I follow your line of reasoning, there would be no way to escape an inner selection construct out to a continue target of an enclosing loop.

It's a key point that the continue target is named by the LoopMerge instruction of the enclosing loop. The loop encloses the selection construct because the loop header %loop dominates the selection header %body

Actually, a clearer way to see it is that I can have multiple selection constructs inside the loop, each of them branching out to the continue target. The continue target is then not dominated by any of those inner selections, but only by the loop header.
Concretely, the body of the loop would be a sequence of selections structured something like:

if (cond) { continue; } if (cond2) { continue; } if (cond3) { continue; }

Each of those "continue" statements is a branch the continue target. And so the continue target is not in any one of the selections.

Here is the block which I believe causes invalid code:

%body = OpLabel
OpSelectionMerge %mergeselection None
OpBranchConditional %false %continuetarget %mergeselection

Here is how I would rewrite it so that it is valid (by my reading) and still does exactly what you wish:

%body = OpLabel
OpSelectionMerge %mergeselection None
OpBranchConditional %false %continueblock %mergeselection

%continueblock = OpLabel
OpBranch %continuetarget

But even then I am still dropping the continue target. I will investigate further.

Looking at this more, I believe even the code I have written is still not valid. The reason is that the %continuetarget block is still part of the %body select construct and this is invalid because it is not legal for the backedge to come out of a select construct. It is part of the select construct because it is dominated by the %body block and not dominated by the %selectionmerge block.

The big problem here is the OpKill in the %selectionmerge block, which causes the %selectionmerge block to NOT dominate ANY blocks in the loop.

I have tried to come up with a valid version of this code that causes a problem but have not been able to. Please let me know if you think you have one or disagree with my assessment on the validity of this code.

dneto0 · 2017-06-30T20:07:34Z

No worries on the confusion. That's the point of reviews. :-)

But there are some algorithmic problems. See my examples posted.

greg-lunarg · 2017-07-03T21:06:01Z

Looking at the issues you raised, I realized it is possible for the current algorithm to create invalid code for some cases. I have just added code which should address these cases.

I added some logic near the end of the processing of a constant conditional branch. If the merge block associated with the constant branch ends up with no predecessors (dead), we go back and replace the replacement branch, going back to the original constant conditional branch (with preceeding SelectionMerge). One slight difference is that the "dead" label is now the merge label. This is to prevent creating a dead block from the merge block, which could cause the code to become invalid. There may be more elegant ways to do this, but this works for now while still allowing us to largely achieve our goals of reducing size by eliminating dead branch code.

I also added a test using your code which exercises this new functionality.

…anches

dneto0 · 2017-07-04T18:07:38Z

I still think the algorithm is incorrectly sweeping away blocks.

If I were doing this, I'd consider simplifying the pass so it only simplifies conditional branches and leaves dead code elimination to a different pass. But coder's choice.

greg-lunarg · 2017-07-05T16:32:51Z

Here are two reasons I thought it best if I attempted some form of dead code elimination in this pass:

I was trying to make sure that the pass generated valid code. Maybe that isn't a hard requirement. Maybe we can document some passes generate potentially invalid code that requires a set of passes to clean things up to get back to valid. Thoughts? Anyway, certain code sequences were leaving what I considered to be invalid code, and it seemed that dead block elimination was the only way to deal with it.
Doing the elimination during the pass allowed me to do a more targeted elimination. This has two advantages: a) it is more efficient because it is dealing with a more limited problem domain, and b) it allowed me to accomplish the targeted size reduction without having to write a more complex, general elimination routine.

I think it would be good to write the more general routine sometime soon. Perhaps such a routine would be a necessary component to generating valid code from glslang, similar to how JohnK is thinking about generating valid sequences in other areas.

dneto0 · 2017-07-05T21:03:46Z

I agree that it would be nice to do targeted elimination of dead blocks, provided the code complexity is low enough. I think employing appropriately-formulated dominance would get us there.

I'm not clear on what invalid state you can get into without the dead code elimination. Can you give an example? (Hmm. I see that OpSelectionMerge is not allowed to immediately precede OpBranch. I dislike that, but it's the rule.)

For HLSL compilation, we're considering allowing an "expanded dialect" form of SPIR-V for Vulkan immediately after Glslang generates it. But I believe it's still fully valid by core SPIR-V rules; it's just that we can relax (violate) some of the Vulkan-specific rules.

dneto0 · 2017-07-05T21:11:33Z

Regarding the fact that OpSelectionMerge can't immediately precede OpBranch. A workaround is like this:
If you have

OpSelectionMerge...
OpBranchConditional %true %live %dead

Instead of changing that BranchConditional to an OpBranch (invalid by the rule I don't like), change it to:

OpSelectionMerge...
OpBranchConditional %true %live %live

It's silly, but it should work.

I'd leave it to future work to detect if there are non-local breaks out of the nested constructs out to the merge block: if there are none, then you can eliminate the OpSelectionMerge altogether.

greg-lunarg · 2017-07-05T21:16:07Z

Re invalid code caused by dead blocks, re-reading the spec, it seems I misremembered the structured control flow rules. I was worried about dead blocks branching into a structured construct. I was misremembering that no branches were allowed from outside the construct into the construct, but the rule is that the header must dominate its construct, so I guess such dead blocks do not effect that dominance. So one could break dead block elimination into a separate pass.

greg-lunarg · 2017-07-05T21:21:07Z

Re OpSelectionMerge preceding OpBranch, I also thought of the double live operand trick, but was a little worried about surprising a driver's shader compiler that may have never seen such a thing.

johnkslang · 2017-07-05T21:25:09Z

HLSL depends on dead-code elimination to remove some forms of invalid code. One example, if I recall correctly, the call tree of dead functions does not have to be complete, only the call tree root by the actual entry point. (HLSL can have multiple things that look like an entry point, but only one is chosen at a time to be the actual entry point.)

Currently, glslang has its own algorithm to look at call trees and eliminate functions that cannot be reached from the entry point to avoid generating invalid code.

I don't know how much this concept generalizes, but there is this idea in HLSL that dead code can be invalid to some degree.

(Of course, and I think this is already understood, I just don't have enough context, using if-then-else to conditionally set something that must be statically known is invalid, so has to be eliminated, under the assumption such cases will always be statically determinable to be constant conditions.)

PS. @greg-lunarg I'll notice this topic had a question for me faster with an @johnkslang than with a JohnK.

dneto0 · 2017-07-06T15:37:28Z

I stand corrected: without Linkage, calling a function that has no body is invalid SPIR-V. Some transforms will need to be able to process such modules, to cover @johnkslang 's HLSL scenarios.
(I wasn't caught up on the discussion in Transform generalized SPIR-V into SPIR-V for Vulkan, to support HLSL compilation #687)
Consequence: I shouldn't insist on an optimization test case being valid before the tansform runs. But probably still good practice to make cases that are valid after the transform runs.
About dominance: It took me a while to get used to the idea that dominance exists only for blocks reachable from the entry block.
For simplicity, I could see three DCE transforms:
- eliminate dead functions (unreachable in the static call tree of entry points)
- eliminate dead blocks inside a function
- eliminate unused types.
About surprising drivers with the double-live trick: Sounds like an opportunity to write a CTS test. :-)
The optimizer could use a common function to remove a dead value, handling cases where the value is a target of OpName or annotation instructions. It would let us generalize the existing transforms in a few places. I filed Add utility function to remove a value. #699

@greg-lunarg : Let me know how you'll proceed. I'd like to update https://github.com/KhronosGroup/SPIRV-Tools/projects/1 with a more accurate breakdown of the work.

greg-lunarg · 2017-07-06T17:19:34Z

By "eliminate dead blocks inside a function", I presume you mean blocks that are not reachable from the first block. Notice that this a slightly different beast than dead code elimination, which usually focuses on an instruction's uses. Standard DCE looks at an instruction and if its result is not used, deletes it. Aggressive DCE starts with live stores and then works backward, marking live everything it uses, finally deleting everything not marked live. Note that neither of these necessarily gets rid of all unreachable blocks. So I would propose a pass called --eliminate-unreachable-blocks which would do just that. I think I could get authorization to write such a pass, especially if it was thought that such a pass would be necessary to create valid SPIR-V for some shaders.

@dneto Can you please clarify?: Are you suggesting that acceptance of this PR is contingent on such a pass? I would request that such a contingency not be put in place, but rather create a TODO for such a pass in the future, at which time we remove the dead block elimination in this pass.

As for dead functions and types, I have authorization to put such transforms/passes into spirv-opt. While they are functionally fairly distinct and separate functions should be written for each, we may want to ultimately combine them into a single pass which removes all dead instructions at a module level, maybe --dce-module.

Re #699, I have already written such a beast, which looks through an instruction's uses for debug and annotations and deletes them before deleting the instruction. I just need to put it into a shared location. I had been thinking that the proposed "optimization context" which enables passing of data from one pass to another (such as def_use_mgr_) would be a good place for these shared helper functions. @dneto0: Do you agree?

dneto0 · 2017-07-06T22:31:20Z

Yes, "eliminate dead blocks in function" means remove blocks that are not reachable from the entry point, nor are the merge block or continue target for any reachable loop header or selection header.

Acceptance of this PR is contingent on fixing the bugs. If you can keep the current strategy of doing some DCE in this pass and make it correct then that's perfectly fine by me.

In the DCE discussion I just wanted clarity as to what bits are imminent vs. left for later. For example if @johnkslang wants to write the "eliminate dead blocks in a function" pass then he should be able to grab it independently.

Yes, I later saw that #699 has almost everything I was asking for regarding more generic/robust/flexible removal of a value. Yes, the planned-for optimization context could be a good natural place for that functionality. Understand that I make lots of little TODOs in the form of tasks/cards/issues as a memory aid and organizing mechanism, without attached expectation of due date.

greg-lunarg · 2017-07-06T23:00:32Z

I think I have addressed all bugs/concerns at this point. Let me know if I missed something.

dneto0

Nearly there.

Thanks for fixing the too-aggressive sweeping of continue blocks.

I found a couple of small things, though.

See "If a dead block is a header block". Fix comment.
By same logic, if a dead block is a loop header, then its continue target is also dead.
I think the HasNonPhiRef will always be true for a merge block; so that fixup code to add back a branch can't ever execute. Should remove it. Also, unreachable merge blocks are ok to keep anyway.

If I'm wrong, let me know.

dneto0 · 2017-07-07T03:20:18Z

source/opt/dead_branch_elim_pass.cpp

+        (*dbi)->ForEachSuccessorLabel([&deadSuccIds](uint32_t succ) {
+          deadSuccIds.insert(succ);
+        });
+        // If dead block is merge block, add its merge block to dead


"If dead block is a header block, ...."

By the same logic, shouldn't we also add its continue target if it has one? If the header block of a construct is dead then all its nested constructs would be dead too.

Optional to fix, since it could be cleaned up with a separate "remove dead blocks" pass.

Good points. I will make the change.

dneto0 · 2017-07-07T03:47:28Z

source/opt/dead_branch_elim_pass.cpp

+    const uint32_t deadLabId = condVal == true ? falseLabId : trueLabId;
+    AddBranch(liveLabId, *bi);
+    def_use_mgr_->KillInst(br);
+    def_use_mgr_->KillInst(mergeInst);


Well, not exactly. What you can't do is break out of two Selection constructs. You can break out of the middle of a Selection to the merge block of an enclosing loop. So that's a branch that skips over a merge block in between out to an outer merge.
Example is generated from:

void main() { int x = 0; bool c, d; while (d) { if (c) { x = 1; break; // goes out to the y=x statement, skipping over the merge for this "if". x = 2; } else { x = 3; break; // goes out to the y = x statement. x = 4; } } int y = x; }

But this isn't the case you're handling. So your conclusion is fine.

dneto0 · 2017-07-07T03:49:34Z

source/opt/dead_branch_elim_pass.cpp

+    // Process phi instructions in merge block.
+    // deadLabIds are now blocks which cannot precede merge block.
+    // If eliminated branch is to merge label, add current block to dead blocks.
+    if (deadLabId == mergeLabId)


I was surprised that the merge block for a selection merge can only have two predecessors (!).
But I think you're right, due to the "I can't break out of two levels of selection constructs" rule.

It's more tricky if you're trying to handle OpSwitch, or a loop. But you're not.

Yeah, it is not obvious.

dneto0 · 2017-07-07T03:54:39Z

source/opt/dead_branch_elim_pass.cpp

+    // and the mergeblock as the false branch. This is done so the merge block
+    // is not orphaned, which could cause invalid control flow in certain case.
+    // TODO(greg-lunarg): Do this only in cases where invalid code is caused.
+    if (!HasNonPhiRef(mergeLabId)) {


But a merge block is always referenced from the merge instruction that names it. So HasNonPhiRef(mergeLabId) is always true. And doesn't that mean this "if" block is dead code?

Also, it's valid for a merge block to be unreachable. That's explicitly called out in the spec

each header block must strictly dominate its merge block, unless the merge block is unreachable in the CFG

It's because of cases like:

if (c) { return; } else { return; } where_the_merge_block_goes(); // never reached

So I don't think you need this block of code at all. ??

At the point that I test if the (selection) merge block has non-phi refs, I have already removed the selection merge instruction that pointed to it. So we can get an unreachable (former) merge block. I even added a test that hits this very logic.

The case I was worried about (and still am worried about) is when dead branch elimination severs a loop header from its continue:

for (...) {
...
if (true)
discard;
...
}

to

for (...) {
...
discard;
...
}

Your recent proposal on latch edges notwithstanding, until the spec is changed, I am uncomfortable generating code that seems to be invalid. I would be more comfortable generating such a sequence once we have an unreachable code removal pass that can fix such sequences and make them valid by removing the unreachable continue as as removing any affected loop merge instruction. We would also need to warn users in the documentation that this pass can create invalid code and that running the unreachable code deleter would likely be necessary to restore validity.

Are you ok with this more conservative path?

Re your latch edge proposal, the only concern I had with it was the requirement that all latch edges point to a loop header. Doesn't that outlaw unreachable blocks that don't point to a loop header?

What is the status of this proposal? Is there something stopping it?

Another possible approach is to keep the current language describing back edge, but use a modified DFS similar to what I use where the DFS, when visiting a loop header, visits the merge block and continue block first and then all its other successors.

I had missed this point:

At the point that I test if the (selection) merge block has non-phi refs, I have already removed the selection merge instruction that pointed to it. So we can get an unreachable (former) merge block. I even added a test that hits this very logic.

Looking again, I see the OrphanMerge, and experimentation proves that it triggers that case.

In reading the code, I got the fact that the loop finding dead blocks avoids adding the merge block, but then missed the significance of the "if (deadLabId == mergeLabId) deadSuccIds.insert(...) " code sequence.

Ok. Looks good to me now. You've even got the TODO there.

Fair enough about the latch edge issue. I'm more comfortable being aggressive there because I've already seen CTS test cases working even though they use the more aggressive rule. I only made the rule when I noticed the CTS test cases need something more general than the spec or validator was willing to give.

…dead

greg-lunarg · 2017-07-07T17:53:41Z

Thanks for the continuing very thorough and thoughtful reviews. I think I have addressed all of your latest concerns. Let me know if you would like any other changes.

dneto0 · 2017-07-07T19:03:26Z

Looks good. Thanks!

About trying to be careful: I try. I also am willing to be wrong. I figure if you're right 90% of the time and I'm right 90% of the time, but our wrongness doesn't overlap, then together we'll get to 99% right by correcting each other's mistakes. :-)

dneto0 · 2017-07-07T19:19:41Z

Squashed, rebased, and pushed onto master as 52e247f

…#675) This method needs to live a little longer to allow downstream users to migrate without causing build breakages. This is part of KhronosGroup#673

The downstream users that use this method have been updated. SetHLSLShaderModel is the replacement. Fixes KhronosGroup#673

greg-lunarg and others added 3 commits June 7, 2017 10:09

DeadBranchElim: Add DeadBranchElimPass

53f5323

Merge branch 'master' into mem2reg17

df95557

DeadBranchElim: Fix pass creation comment.

1e118a8

atgoo reviewed Jun 20, 2017

View reviewed changes

greg-lunarg and others added 13 commits June 20, 2017 18:12

DeadBranchElim: Fix switch indent

64c678a

DeadBranchElim: Fixed negation case of GetConstCondition

a9888f9

DeadBranchElim: Add name to TODO

1b2926b

DeadBranchElim: Add const and fix multiple entry point / short-circui…

ee41ea7

…t issue

DeadBranchElim: Add const to one-time store locals.

93be240

DeadBranchElim: Make DepthFirstTraversal call more readable

cc4f962

DeadBranchElim: Add TODO to get rid of const_cast.

dd85801

DeadBranchElim: Improved kSpv* names

e357094

DeadBranchElim: Changed kSpv* to non-static and anonymous

d21de93

Merge branch 'master' into mem2reg17

96360d4

DeadBranchElim: Improved k* names further.

13f7d17

DeadBranchElim: Remove _s from k*

6d9e1ec

DeadBranchElim: Changed k* to uint32_t

df9dbd4

DeadBranchElim: Clarify comment only OpSelectionMerge branches are pr…

b78672e

…ocessed

dneto0 requested changes Jun 30, 2017

View reviewed changes

greg-lunarg added 4 commits June 30, 2017 17:15

DeadBranchElim: Fix name and comment

e62430e

DeadBranchElim: Don't assume Merge before BranchConditional

523fed0

DeadBranchElim: Rename to deadSuccIds and improve comments

fdfca7d

DeadBranchElim: Keep SelectionMerge/BranchCond if invalid code possible.

ea482c9

DeadBranchElim: Improve comments on remaining constant conditional br…

06915ed

…anches

DeadBranchElim: Fix disappearing continue target; added test

ee56907

dneto0 requested changes Jul 7, 2017

View reviewed changes

DeadBranchElim: Add continue target to dead blocks if loop header is …

03c8c03

…dead

dneto0 approved these changes Jul 7, 2017

View reviewed changes

dneto0 closed this Jul 7, 2017

This was referenced Jul 10, 2017

Eliminate unreachable blocks in a function: #702

Closed

Aggressive dead code elimination including empty control flow removal #541

Closed

dneto0 pushed a commit to dneto0/SPIRV-Tools that referenced this pull request Sep 14, 2024

Remove SetShaderModel from API (KhronosGroup#734)

16f6e6a

The downstream users that use this method have been updated. SetHLSLShaderModel is the replacement. Fixes KhronosGroup#673

Add DeadBranchElimPass #673

Add DeadBranchElimPass #673

Conversation

greg-lunarg commented Jun 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg-lunarg Jun 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dneto0 commented Jun 20, 2017

atgoo commented Jun 20, 2017

dneto0 commented Jun 30, 2017

greg-lunarg commented Jun 30, 2017 • edited Loading

greg-lunarg commented Jun 30, 2017

dneto0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg-lunarg Jun 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg-lunarg Jul 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dneto0 commented Jun 30, 2017

greg-lunarg commented Jul 3, 2017 • edited Loading

dneto0 commented Jul 4, 2017

greg-lunarg commented Jul 5, 2017 • edited Loading

dneto0 commented Jul 5, 2017

dneto0 commented Jul 5, 2017

greg-lunarg commented Jul 5, 2017 • edited Loading

greg-lunarg commented Jul 5, 2017

johnkslang commented Jul 5, 2017 • edited Loading

dneto0 commented Jul 6, 2017

greg-lunarg commented Jul 6, 2017 • edited Loading

dneto0 commented Jul 6, 2017

greg-lunarg commented Jul 6, 2017

dneto0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg-lunarg commented Jun 7, 2017 •

edited

Loading

greg-lunarg Jun 20, 2017 •

edited

Loading

greg-lunarg commented Jun 30, 2017 •

edited

Loading

greg-lunarg Jun 30, 2017 •

edited

Loading

greg-lunarg Jul 5, 2017 •

edited

Loading

greg-lunarg commented Jul 3, 2017 •

edited

Loading

greg-lunarg commented Jul 5, 2017 •

edited

Loading

greg-lunarg commented Jul 5, 2017 •

edited

Loading

johnkslang commented Jul 5, 2017 •

edited

Loading

greg-lunarg commented Jul 6, 2017 •

edited

Loading