Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DeadBranchElimPass #673

Closed
wants to merge 24 commits into from
Closed

Conversation

greg-lunarg
Copy link
Contributor

@greg-lunarg greg-lunarg commented Jun 7, 2017

This is another in a series of passes designed to reduce SPIR-V file size.

For each entry point function, this pass will look for BranchConditionals
with constant condition and convert to a branch. Any resulting dead blocks
are eliminated. For all phi functions in merge block, replace all uses with the
id corresponding to the living predecessor.

This pass requires that all control flow be structured to allow optimization.

This pass is most effective when preceeded by passes which eliminate
local loads and stores, effectively propagating constant values into BranchConditionals
where possible.

Note that is pass contains functions and data from previous passes in this series.
These will be coalesced under the planned optimization context class.


Pass::Status DeadBranchElimPass::ProcessImpl() {
// Current functionality assumes structured control flow.
// TODO(): Handle non-structured control-flow.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding your username or email to TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.


bool modified = false;
// Call Mem2Reg on all remaining functions.
for (auto& e : module_->entry_points()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

static const int kSpvBranchCondTrueLabId = 1;
static const int kSpvBranchCondFalseLabId = 2;
static const int kSpvSelectionMergeMergeBlockId = 0;
static const int kSpvPhiVal0Id = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider better naming and moving these constants in the body of the function where they are used.

Also consider using anonymous namespace instead of static.

Copy link
Contributor Author

@greg-lunarg greg-lunarg Jun 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of these names is:

kSpv[Name of Operation][NameOfOperand]

My goal was to avoid just have having magic numbers all over the code, and if at some future point an instruction changed we might have some hope of tracking its presence down.

Can you suggest a better structure? I am trying to keep their length somewhat manageable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spv is not really needed, since it's a local variable. I would add InIdx at the end, to signify that these are not ids, but 'in operand indices'. I would also move these into the functions where they are used.
const uint32_t kPhiVal0IdInIdx = 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I added an underscore to the names to separate the operator from the operand names, for example: kSpvBranchCond_TrueLabId.

I also make them non-static and anonymous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Removed Spv and added InIdx. But I would suggest keeping them at global scope allows them to be used in multiple functions, thus reducing redundancy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I took out underscores.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. And changed them to uint32_t

spvtools::CFA<ir::BasicBlock>::DepthFirstTraversal(
&*func->begin(), StructuredSuccessorsFunction(), ignore_block,
[&](cbb_ptr b) { order->push_front(const_cast<ir::BasicBlock*>(b)); }, ignore_edge);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be more readable this way:

auto ignore_block = {};
auto ignore_edge = [](cbb_ptr, cbb_ptr) {};
auto get_structured_successors = [this](const ir::BasicBlock* block) { return &(block2structured_succs_[block]); };
auto whatever = [&](cbb_ptr b) { order->push_front(const_castir::BasicBlock*(b)); }

spvtools::CFAir::BasicBlock::DepthFirstTraversal(
&*func->begin(), get_structured_successors, ignore_block, whatever, ignore_edge);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

uint32_t condId, bool* condVal, bool* condIsConst) {
ir::Instruction* cInst = def_use_mgr_->GetDef(condId);
switch (cInst->opcode()) {
case SpvOpConstantFalse: {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing indentation before case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

(void)GetConstCondition(cInst->GetSingleWordInOperand(0),
condVal, condIsConst);
if (*condIsConst)
*condVal = !*condVal;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider
bool negated_value;
if (!IsConstCondition(cInst->GetSingleWordInOperand(0), &negated_value))
return false;

*condVal = !negated_value
return true;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

continue;

// Replace conditional branch with unconditional branch
uint32_t trueLabId = br->GetSingleWordInOperand(kSpvBranchCondTrueLabId);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding const qualifier to local variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@dneto0
Copy link
Collaborator

dneto0 commented Jun 20, 2017

Hey @atgoo: Does the algorithm look correct to you? Is there enough test coverage to be confident that this algorithm isn't broken and won't break under reasonable changes elsewhere? Are any of your suggestions blocking?

@atgoo
Copy link

atgoo commented Jun 20, 2017

No, my suggestions are about readability. I'm not qualified to judge if the logic is correct on a large scale, and I didn't find any problems on a tactical level.

@dneto0
Copy link
Collaborator

dneto0 commented Jun 30, 2017

Brainstorming cases after reading just the MR description (not a request for feedback or changes. Just personal notes!):

  • How do you determine a block is dead?
  • A dead block could itself be a merge block or a continue target. In that case the block has to survive in order to fulfill its role in structured control flow. Example is:
   %entry = OpLabel
   OpBranch %loop

  %loop = OpLabel
  OpLoopMerge %merge %loop None
  OpBranchConditional %false %merge %loop  ; tempted to kill %merge

  %merge = OpLabel  ;;  This has to stick around. Even if it's unreachable from the function entry block.
  OpReturn

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Jun 30, 2017

Interesting example.

My intent with this optimization was to only process OpBranchConditionals that are preceded by by OpSelectionMerge. If the code does not reflect that, I will make that change immediately.

Sorry for the confusion.

@greg-lunarg
Copy link
Contributor Author

The code does only process OpSelectionMerge. I will make that clearer in the comments.

Copy link
Collaborator

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm has a couple of problems. I've given a couple of examples.

const uint32_t kSelectionMergeMergeBlockIdInIdx = 0;
const uint32_t kPhiVal0IdInIdx = 0;
const uint32_t kPhiLab0IdInIdx = 1;
const uint32_t kPhiVal1IdInIdx = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this patch in order, it's mysterious why phi input branch pairs 0 and 1 are the only ones mentioned... Hope there's a loop later. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because only OpSelectionMerge/OpBranchConditional is optimized (for the moment). This means that when patching phis in the merge block, we only have to deal with phis with 2 and only 2 pairs.

Status Process(ir::Module*) override;

private:
// Returns the id of the merge block declared by a merge instruction in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next method raise the question in my mind about continue targets.

// this block, if any. If none, returns zero.
uint32_t MergeBlockIdIfAny(const ir::BasicBlock& blk) const;

// Compute structured successors for function |func|.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to treat continue target specially as well? It's similar to a merge block in that we might branch out of a nested construct to a nesting continue target.

Copy link
Contributor Author

@greg-lunarg greg-lunarg Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the spec does mention the possibility of the continue target being unreachable, given the other requirements, I am having difficulty coming up with a flow graph where this is the case. Here is my thinking:

  • A loop must have a back edge
  • There must be path from the header to the backedge block (otherwise it can't be a backedge)
  • The continue target must dominate the backedge block
    Thus, there must be a path from the header to the continue target
    Thus the continue target is only unreachable iff the header is unreachable.

So it seems to me that the continue target does not to be treated specially.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few things going on here:

  1. Surprisingly, the definition of "back-edge" is still up for debate.
    See Need to refine definition of a back edge used by structured control flow rules SPIRV-Headers#16 and the corresponding internal issue https://gitlab.khronos.org/spirv/SPIR-V/issues/78

I got involved because there are simple shaders that leave the continue target unreachable, and those exist in the Vulkan CTS suite. I had to adjust the validator to accommodate them. For example, this fragment shader:

#version 310 es

void main() {
  bool c;
  while(c) {
     break;
  }
}

will compile via Glslang into:

               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint Fragment %main "main"
               OpExecutionMode %main OriginUpperLeft
               OpSource ESSL 310
               OpName %main "main"
               OpName %c "c"
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %bool = OpTypeBool
%_ptr_Function_bool = OpTypePointer Function %bool
       %main = OpFunction %void None %3
          %5 = OpLabel
          %c = OpVariable %_ptr_Function_bool Function
               OpBranch %6
          %6 = OpLabel
               OpLoopMerge %8 %9 None
               OpBranch %10
         %10 = OpLabel
         %14 = OpLoad %bool %c
               OpBranchConditional %14 %7 %8
          %7 = OpLabel
               OpBranch %8
          %9 = OpLabel
               OpBranch %6
          %8 = OpLabel
               OpReturn
               OpFunctionEnd

That has continue target block %9 unreachable. and yet we still want the branch from %9 to %6 to be a back edge.

  1. There was a recent fix (I think in SPIR-V 1.0 Rev10) in structured control flow rules to say "the loop header must dominate the Continue Target, unless the Continue Target is unreachable in the CFG" (emphasis mine). This was to accommodate the case in 1.

  2. Even given all that, the reason you're computing these "structured successors" is to list the blocks that you can safely delete once you delete a newly-found dead block. Those blocks are characterized by being the blocks dominated by the newly-dead block. The structured successors is a proxy for dominance. But a newly-dead block can escape out to an outer continue target. So my intuition thinks there's still something fishy here. I'll go try out some more cases through the new rev of your code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here's a concrete example of my concern. (See point 3 in previous comment).

Example fragment shader:

#version 450
void main() {
  bool c;
  bool d;
  while(c) {
    if(d) {
     continue;
    }
    if(false) {  // setting this up to be killed.  branches out to an outer continue target.
     continue;
    }
    discard;
  }
}

Compiles to:

               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint Fragment %main "main"
               OpExecutionMode %main OriginUpperLeft
               OpSource GLSL 450
               OpName %main "main"
               OpName %c "c"
               OpName %d "d"
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %bool = OpTypeBool
%_ptr_Function_bool = OpTypePointer Function %bool
      %false = OpConstantFalse %bool
       %main = OpFunction %void None %3
          %5 = OpLabel
          %c = OpVariable %_ptr_Function_bool Function
          %d = OpVariable %_ptr_Function_bool Function
               OpBranch %6
          %6 = OpLabel
               OpLoopMerge %8 %9 None
               OpBranch %10
         %10 = OpLabel
         %14 = OpLoad %bool %c
               OpBranchConditional %14 %7 %8
          %7 = OpLabel
         %16 = OpLoad %bool %d
               OpSelectionMerge %18 None
               OpBranchConditional %16 %17 %18
         %17 = OpLabel
               OpBranch %9
         %18 = OpLabel
               OpSelectionMerge %22 None
               OpBranchConditional %false %21 %22
         %21 = OpLabel
               OpBranch %9
         %22 = OpLabel
               OpKill
          %9 = OpLabel
               OpBranch %6
          %8 = OpLabel
               OpReturn
               OpFunctionEnd

Control flow graph is:
stray

Then running through this optimization pass, I get:

               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint Fragment %main "main"
               OpExecutionMode %main OriginUpperLeft
               OpSource GLSL 450
               OpName %main "main"
               OpName %c "c"
               OpName %d "d"
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %bool = OpTypeBool
%_ptr_Function_bool = OpTypePointer Function %bool
      %false = OpConstantFalse %bool
       %main = OpFunction %void None %3
          %5 = OpLabel
          %c = OpVariable %_ptr_Function_bool Function
          %d = OpVariable %_ptr_Function_bool Function
               OpBranch %6
          %6 = OpLabel
               OpLoopMerge %8 %9 None
               OpBranch %10
         %10 = OpLabel
         %14 = OpLoad %bool %c
               OpBranchConditional %14 %7 %8
          %7 = OpLabel
         %16 = OpLoad %bool %d
               OpSelectionMerge %18 None
               OpBranchConditional %16 %17 %18
         %17 = OpLabel
               OpBranch %9
         %18 = OpLabel
               OpBranch %22
         %22 = OpLabel
               OpKill
          %8 = OpLabel
               OpReturn
               OpFunctionEnd

That's invalid because block %9 has been swept away but is mentioned as the continue target of the loop, and as the branch target of block %17.

The problem is that you determine that block %21 is dead (that's right) and all the blocks it dominates should be swept away. But your algorithm incorrectly determines that block %9 should be included in that set.

Copy link
Contributor Author

@greg-lunarg greg-lunarg Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a good point that, whether valid or invalid, glslang can generate this sequence and it would be bad to break it.

While I think a more general pass to clean up and normalize such sequences might be the best long term solution, I think a solution (which I think you suggested at one point) might be best in the short term: have ComputeStructuredSuccessors add the continue target after the merge block and before all the "real" successors. This should guarantee that in the "structured order", the continue target will appear after any loop code not part of the continue sequence. Since the continue target dominates the backedge and the backedge post-dominates the continue target, this should remove any possibility of continue code being eliminated by this pass.

FWIW, despite the seeming questions around valid control flow, I am going to continue the policy of not generating any new unreachable blocks with this pass, even if it means reinserting a BranchConditional. This already has a TODO on it to revisit.

br->GetSingleWordInOperand(kBranchCondFalseLabIdInIdx);
const uint32_t mergeLabId =
mergeInst->GetSingleWordInOperand(kSelectionMergeMergeBlockIdInIdx);
const uint32_t liveLabId = condVal == true ? trueLabId : falseLabId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok, but generally I just use the absorption rule to collapse "condVal == true" to just "condVal". :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Its almost more of a readability thing here.

// If block |bp| contains constant conditional branch, return true
// and return branch and merge instructions in |branchInst| and |mergeInst|
// and the boolean constant in |condVal|.
bool GetConstConditionalBranch(ir::BasicBlock* bp,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can have conditional branches that don't have merge instruction.
Examples include:

 %entry= OpLabel
 OpBranch %loop
  
%loop = OpLabel
  OpLoopMerge %merge %body None 
  OpBranch %body

%body = OpLabel
  %cond = .... %bool
  OpBranch %cond %exit %loop  ;;  this back edge does not have a preceding  merge instruction

%merge = OpLabel
 

Similarly, in an example generated from this:

   for (...)  {
      if ( cond)  {
          break;   // this breaks out to the merge block,, and it could have been compiled into an OpBranchConditional
   }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Already covered by request to clarify)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we are good here as I clarified that this pass is only targeting OpBranchConditionals preceded by OpSelectionMerge.

void KillAllInsts(ir::BasicBlock* bp);

// If block |bp| contains constant conditional branch, return true
// and return branch and merge instructions in |branchInst| and |mergeInst|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If block |bp| contains constant conditional branch that is also preceded by a selection merge, return true and ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

// If block |bp| contains constant conditional branch, return true
// and return branch and merge instructions in |branchInst| and |mergeInst|
// and the boolean constant in |condVal|.
bool GetConstConditionalBranch(ir::BasicBlock* bp,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend renaming this to GetConstConditionalSelectionBranch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

*condVal = true;
*condIsConst = true;
} break;
case SpvOpLogicalNot: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

const uint32_t deadLabId = condVal == true ? falseLabId : trueLabId;
AddBranch(liveLabId, *bi);
def_use_mgr_->KillInst(br);
def_use_mgr_->KillInst(mergeInst);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there's a nested structured selection that needs to be able to break out to this named merge block? I.e. there is a deeply nested break block that branches out to the merge block. But now it's no longer a named merge block.

Example:

  %selection0 = OpLabel
  OpSelectionMerge %merge0 None                                          ;; Candidate for removal.  Declares %merge0 as merge block
  OpBranchConditional %constantfalse %selection1 %merge0  ;;  Candidate for removal.

  ;  The nested selection, a one -armed "if"
  %selection1  = OpLabel
  OpSelectionMerge %merge1 None
  %cond = OpLoad %bool %myboolvar
  OpBranchConditional %cond %merge1 %body1

  %body1 = OpLabel  ;  break block ,i.e. branches out several levels of nesting.
  OpBranch %merge0   ;;  By structured control flow rules, needs %merge0 to be named as a merge block

  %merge1 = OpLabel
  OpBranch %merge0

 %merge0 = OpLabel
 ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little surprised by this. The "impression" I got from the spec is that this is not legal, although I grant that one could interpret it as being legal.

How sure are you about this? Have there been discussions where this was specifically called out as legal?

If necessary, I would "fix" this by leaving the BranchConditional (and OpSelectionMerge) but having the "dead" branch go right to the merge block (if it doesn't already).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right. I was misremembering.
Reading the spec again, you can only go out to the next outer merge block. You can't jump out two like I suggested. In short, structured control flow rules can't implement named-break like Java has. (or named "last" in Perl).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not exactly. What you can't do is break out of two Selection constructs. You can break out of the middle of a Selection to the merge block of an enclosing loop. So that's a branch that skips over a merge block in between out to an outer merge.
Example is generated from:

void main() {
  int x = 0;
  bool c, d;
  while (d) {
    if (c) {
      x = 1;
      break; // goes out to the y=x statement, skipping over the merge for this "if".
      x = 2;
    } else {
      x = 3;
      break; // goes out to the y = x statement.
      x = 4;
    }
  }
  int y = x; 
}

But this isn't the case you're handling. So your conclusion is fine.


// Iterate to merge block deleting dead blocks
std::unordered_set<uint32_t> deadLabIds;
deadLabIds.insert(deadLabId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a difficulty here because, I believe, the structured order of blocks could put a continue target we break out to before an inner merge block. (Example below)

So... When one of the candidates to kill is a "continue block", then it this could try to kill the corresponding continue target, and that would also orphan the reference in the corresponding OpLoopMerge.

Example is schematically similar to:

  for (...) {  // names a continue target
      if ( ... ) {
        break; // a branch to the continue target.
         // but the merge block for this structured selection will appear *after* the continue target in the structured order.
         // So we'll want to sweep away the continue target block when we shouldn't.
      }
     ; continue target is conceptually here.
  }

More explicitly here's a full example:

        OpCapability Shader
        OpMemoryModel Logical GLSL450
        OpEntryPoint Fragment %foo "foo"
%bool = OpTypeBool
%void = OpTypeVoid
%void_fn = OpTypeFunction %void
%boolptr = OpTypePointer Function %bool
%false = OpConstantFalse %bool

%foo =  OpFunction %void None %void_fn

%entry = OpLabel
%loopcondition_var = OpVariable %boolptr Function
%loopcondition = OpLoad %bool %loopcondition_var
OpBranch %loop

%loop = OpLabel
OpLoopMerge %mergeloop %continuetarget None
OpBranchConditional %loopcondition %mergeloop %body

%body = OpLabel
OpSelectionMerge %mergeselection None
OpBranchConditional %false %continuetarget %mergeselection

%mergeselection = OpLabel
OpKill

%continuetarget = OpLabel
OpBranchConditional %loopcondition %loop %mergeloop

%mergeloop = OpLabel
OpReturn
OpFunctionEnd
     

The above example passes validation. Running it through your algorithm, I get this result:

               OpCapability Shader
               OpMemoryModel Logical GLSL450
               OpEntryPoint Fragment %1 "foo"
       %bool = OpTypeBool
       %void = OpTypeVoid
          %4 = OpTypeFunction %void
%_ptr_Function_bool = OpTypePointer Function %bool
      %false = OpConstantFalse %bool
          %1 = OpFunction %void None %4
          %7 = OpLabel
          %8 = OpVariable %_ptr_Function_bool Function
          %9 = OpLoad %bool %8
               OpBranch %10
         %10 = OpLabel
               OpLoopMerge %11 %12 None ;;;;;;;;;;;;;;;; Whoops this is a dead reference to %12, and is invalid.
               OpBranchConditional %9 %11 %13
         %13 = OpLabel
               OpBranch %14
         %14 = OpLabel
               OpKill
         %11 = OpLabel
               OpReturn
               OpFunctionEnd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all due respect for the validator, I would argue that the input program is invalid.

Because %continuetarget is the target of an OpSelectionMerge/OpBranchCondition (and it is not the merge block), that means that the %continuetarget block is inside of the Selection construct. By the validation rules, if that block branches out of the Selection construct, it can only branch to the Selection construct's merge block, or the continue or break of the innermost loop. But this block potentially branches to the loop's header, which is not one of the choices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken that the validator is incomplete, and might have bugs. :-)

But if I follow your line of reasoning, there would be no way to escape an inner selection construct out to a continue target of an enclosing loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a key point that the continue target is named by the LoopMerge instruction of the enclosing loop. The loop encloses the selection construct because the loop header %loop dominates the selection header %body

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a clearer way to see it is that I can have multiple selection constructs inside the loop, each of them branching out to the continue target. The continue target is then not dominated by any of those inner selections, but only by the loop header.
Concretely, the body of the loop would be a sequence of selections structured something like:

  if (cond) { continue; }
  if (cond2) { continue; }
  if (cond3) { continue; }

Each of those "continue" statements is a branch the continue target. And so the continue target is not in any one of the selections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the block which I believe causes invalid code:

%body = OpLabel
OpSelectionMerge %mergeselection None
OpBranchConditional %false %continuetarget %mergeselection

Here is how I would rewrite it so that it is valid (by my reading) and still does exactly what you wish:

%body = OpLabel
OpSelectionMerge %mergeselection None
OpBranchConditional %false %continueblock %mergeselection

%continueblock = OpLabel
OpBranch %continuetarget

But even then I am still dropping the continue target. I will investigate further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this more, I believe even the code I have written is still not valid. The reason is that the %continuetarget block is still part of the %body select construct and this is invalid because it is not legal for the backedge to come out of a select construct. It is part of the select construct because it is dominated by the %body block and not dominated by the %selectionmerge block.

The big problem here is the OpKill in the %selectionmerge block, which causes the %selectionmerge block to NOT dominate ANY blocks in the loop.

I have tried to come up with a valid version of this code that causes a problem but have not been able to. Please let me know if you think you have one or disagree with my assessment on the validity of this code.

@dneto0
Copy link
Collaborator

dneto0 commented Jun 30, 2017

No worries on the confusion. That's the point of reviews. :-)

But there are some algorithmic problems. See my examples posted.

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Jul 3, 2017

Looking at the issues you raised, I realized it is possible for the current algorithm to create invalid code for some cases. I have just added code which should address these cases.

I added some logic near the end of the processing of a constant conditional branch. If the merge block associated with the constant branch ends up with no predecessors (dead), we go back and replace the replacement branch, going back to the original constant conditional branch (with preceeding SelectionMerge). One slight difference is that the "dead" label is now the merge label. This is to prevent creating a dead block from the merge block, which could cause the code to become invalid. There may be more elegant ways to do this, but this works for now while still allowing us to largely achieve our goals of reducing size by eliminating dead branch code.

I also added a test using your code which exercises this new functionality.

@dneto0
Copy link
Collaborator

dneto0 commented Jul 4, 2017

I still think the algorithm is incorrectly sweeping away blocks.

If I were doing this, I'd consider simplifying the pass so it only simplifies conditional branches and leaves dead code elimination to a different pass. But coder's choice.

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Jul 5, 2017

Here are two reasons I thought it best if I attempted some form of dead code elimination in this pass:

  1. I was trying to make sure that the pass generated valid code. Maybe that isn't a hard requirement. Maybe we can document some passes generate potentially invalid code that requires a set of passes to clean things up to get back to valid. Thoughts? Anyway, certain code sequences were leaving what I considered to be invalid code, and it seemed that dead block elimination was the only way to deal with it.

  2. Doing the elimination during the pass allowed me to do a more targeted elimination. This has two advantages: a) it is more efficient because it is dealing with a more limited problem domain, and b) it allowed me to accomplish the targeted size reduction without having to write a more complex, general elimination routine.

I think it would be good to write the more general routine sometime soon. Perhaps such a routine would be a necessary component to generating valid code from glslang, similar to how JohnK is thinking about generating valid sequences in other areas.

@dneto0
Copy link
Collaborator

dneto0 commented Jul 5, 2017

I agree that it would be nice to do targeted elimination of dead blocks, provided the code complexity is low enough. I think employing appropriately-formulated dominance would get us there.

I'm not clear on what invalid state you can get into without the dead code elimination. Can you give an example? (Hmm. I see that OpSelectionMerge is not allowed to immediately precede OpBranch. I dislike that, but it's the rule.)

For HLSL compilation, we're considering allowing an "expanded dialect" form of SPIR-V for Vulkan immediately after Glslang generates it. But I believe it's still fully valid by core SPIR-V rules; it's just that we can relax (violate) some of the Vulkan-specific rules.

@dneto0
Copy link
Collaborator

dneto0 commented Jul 5, 2017

Regarding the fact that OpSelectionMerge can't immediately precede OpBranch. A workaround is like this:
If you have

OpSelectionMerge...
OpBranchConditional %true %live %dead

Instead of changing that BranchConditional to an OpBranch (invalid by the rule I don't like), change it to:

OpSelectionMerge...
OpBranchConditional %true %live %live

It's silly, but it should work.

I'd leave it to future work to detect if there are non-local breaks out of the nested constructs out to the merge block: if there are none, then you can eliminate the OpSelectionMerge altogether.

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Jul 5, 2017

Re invalid code caused by dead blocks, re-reading the spec, it seems I misremembered the structured control flow rules. I was worried about dead blocks branching into a structured construct. I was misremembering that no branches were allowed from outside the construct into the construct, but the rule is that the header must dominate its construct, so I guess such dead blocks do not effect that dominance. So one could break dead block elimination into a separate pass.

@greg-lunarg
Copy link
Contributor Author

Re OpSelectionMerge preceding OpBranch, I also thought of the double live operand trick, but was a little worried about surprising a driver's shader compiler that may have never seen such a thing.

@johnkslang
Copy link
Member

johnkslang commented Jul 5, 2017

HLSL depends on dead-code elimination to remove some forms of invalid code. One example, if I recall correctly, the call tree of dead functions does not have to be complete, only the call tree root by the actual entry point. (HLSL can have multiple things that look like an entry point, but only one is chosen at a time to be the actual entry point.)

Currently, glslang has its own algorithm to look at call trees and eliminate functions that cannot be reached from the entry point to avoid generating invalid code.

I don't know how much this concept generalizes, but there is this idea in HLSL that dead code can be invalid to some degree.

(Of course, and I think this is already understood, I just don't have enough context, using if-then-else to conditionally set something that must be statically known is invalid, so has to be eliminated, under the assumption such cases will always be statically determinable to be constant conditions.)

PS. @greg-lunarg I'll notice this topic had a question for me faster with an @johnkslang than with a JohnK.

@dneto0
Copy link
Collaborator

dneto0 commented Jul 6, 2017

  • I stand corrected: without Linkage, calling a function that has no body is invalid SPIR-V. Some transforms will need to be able to process such modules, to cover @johnkslang 's HLSL scenarios.
    (I wasn't caught up on the discussion in Transform generalized SPIR-V into SPIR-V for Vulkan, to support HLSL compilation #687)

  • Consequence: I shouldn't insist on an optimization test case being valid before the tansform runs. But probably still good practice to make cases that are valid after the transform runs.

  • About dominance: It took me a while to get used to the idea that dominance exists only for blocks reachable from the entry block.

  • For simplicity, I could see three DCE transforms:

    • eliminate dead functions (unreachable in the static call tree of entry points)
    • eliminate dead blocks inside a function
    • eliminate unused types.
  • About surprising drivers with the double-live trick: Sounds like an opportunity to write a CTS test. :-)

  • The optimizer could use a common function to remove a dead value, handling cases where the value is a target of OpName or annotation instructions. It would let us generalize the existing transforms in a few places. I filed Add utility function to remove a value. #699

@greg-lunarg : Let me know how you'll proceed. I'd like to update https://github.com/KhronosGroup/SPIRV-Tools/projects/1 with a more accurate breakdown of the work.

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Jul 6, 2017

By "eliminate dead blocks inside a function", I presume you mean blocks that are not reachable from the first block. Notice that this a slightly different beast than dead code elimination, which usually focuses on an instruction's uses. Standard DCE looks at an instruction and if its result is not used, deletes it. Aggressive DCE starts with live stores and then works backward, marking live everything it uses, finally deleting everything not marked live. Note that neither of these necessarily gets rid of all unreachable blocks. So I would propose a pass called --eliminate-unreachable-blocks which would do just that. I think I could get authorization to write such a pass, especially if it was thought that such a pass would be necessary to create valid SPIR-V for some shaders.

@dneto Can you please clarify?: Are you suggesting that acceptance of this PR is contingent on such a pass? I would request that such a contingency not be put in place, but rather create a TODO for such a pass in the future, at which time we remove the dead block elimination in this pass.

As for dead functions and types, I have authorization to put such transforms/passes into spirv-opt. While they are functionally fairly distinct and separate functions should be written for each, we may want to ultimately combine them into a single pass which removes all dead instructions at a module level, maybe --dce-module.

Re #699, I have already written such a beast, which looks through an instruction's uses for debug and annotations and deletes them before deleting the instruction. I just need to put it into a shared location. I had been thinking that the proposed "optimization context" which enables passing of data from one pass to another (such as def_use_mgr_) would be a good place for these shared helper functions. @dneto0: Do you agree?

@dneto0
Copy link
Collaborator

dneto0 commented Jul 6, 2017

Yes, "eliminate dead blocks in function" means remove blocks that are not reachable from the entry point, nor are the merge block or continue target for any reachable loop header or selection header.

Acceptance of this PR is contingent on fixing the bugs. If you can keep the current strategy of doing some DCE in this pass and make it correct then that's perfectly fine by me.

In the DCE discussion I just wanted clarity as to what bits are imminent vs. left for later. For example if @johnkslang wants to write the "eliminate dead blocks in a function" pass then he should be able to grab it independently.

Yes, I later saw that #699 has almost everything I was asking for regarding more generic/robust/flexible removal of a value. Yes, the planned-for optimization context could be a good natural place for that functionality. Understand that I make lots of little TODOs in the form of tasks/cards/issues as a memory aid and organizing mechanism, without attached expectation of due date.

@greg-lunarg
Copy link
Contributor Author

I think I have addressed all bugs/concerns at this point. Let me know if I missed something.

Copy link
Collaborator

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there.

Thanks for fixing the too-aggressive sweeping of continue blocks.

I found a couple of small things, though.

  • See "If a dead block is a header block". Fix comment.
  • By same logic, if a dead block is a loop header, then its continue target is also dead.
  • I think the HasNonPhiRef will always be true for a merge block; so that fixup code to add back a branch can't ever execute. Should remove it. Also, unreachable merge blocks are ok to keep anyway.

If I'm wrong, let me know.

(*dbi)->ForEachSuccessorLabel([&deadSuccIds](uint32_t succ) {
deadSuccIds.insert(succ);
});
// If dead block is merge block, add its merge block to dead
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If dead block is a header block, ...."

By the same logic, shouldn't we also add its continue target if it has one? If the header block of a construct is dead then all its nested constructs would be dead too.

Optional to fix, since it could be cleaned up with a separate "remove dead blocks" pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. I will make the change.

const uint32_t deadLabId = condVal == true ? falseLabId : trueLabId;
AddBranch(liveLabId, *bi);
def_use_mgr_->KillInst(br);
def_use_mgr_->KillInst(mergeInst);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not exactly. What you can't do is break out of two Selection constructs. You can break out of the middle of a Selection to the merge block of an enclosing loop. So that's a branch that skips over a merge block in between out to an outer merge.
Example is generated from:

void main() {
  int x = 0;
  bool c, d;
  while (d) {
    if (c) {
      x = 1;
      break; // goes out to the y=x statement, skipping over the merge for this "if".
      x = 2;
    } else {
      x = 3;
      break; // goes out to the y = x statement.
      x = 4;
    }
  }
  int y = x; 
}

But this isn't the case you're handling. So your conclusion is fine.

// Process phi instructions in merge block.
// deadLabIds are now blocks which cannot precede merge block.
// If eliminated branch is to merge label, add current block to dead blocks.
if (deadLabId == mergeLabId)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised that the merge block for a selection merge can only have two predecessors (!).
But I think you're right, due to the "I can't break out of two levels of selection constructs" rule.

It's more tricky if you're trying to handle OpSwitch, or a loop. But you're not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is not obvious.

// and the mergeblock as the false branch. This is done so the merge block
// is not orphaned, which could cause invalid control flow in certain case.
// TODO(greg-lunarg): Do this only in cases where invalid code is caused.
if (!HasNonPhiRef(mergeLabId)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a merge block is always referenced from the merge instruction that names it. So HasNonPhiRef(mergeLabId) is always true. And doesn't that mean this "if" block is dead code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's valid for a merge block to be unreachable. That's explicitly called out in the spec

each header block must strictly dominate its merge block, unless the merge block is unreachable in the CFG

It's because of cases like:

if (c) { return; } else { return; }
where_the_merge_block_goes(); // never reached

So I don't think you need this block of code at all. ??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the point that I test if the (selection) merge block has non-phi refs, I have already removed the selection merge instruction that pointed to it. So we can get an unreachable (former) merge block. I even added a test that hits this very logic.

The case I was worried about (and still am worried about) is when dead branch elimination severs a loop header from its continue:

for (...) {
...
if (true)
discard;
...
}

to

for (...) {
...
discard;
...
}

Your recent proposal on latch edges notwithstanding, until the spec is changed, I am uncomfortable generating code that seems to be invalid. I would be more comfortable generating such a sequence once we have an unreachable code removal pass that can fix such sequences and make them valid by removing the unreachable continue as as removing any affected loop merge instruction. We would also need to warn users in the documentation that this pass can create invalid code and that running the unreachable code deleter would likely be necessary to restore validity.

Are you ok with this more conservative path?

Re your latch edge proposal, the only concern I had with it was the requirement that all latch edges point to a loop header. Doesn't that outlaw unreachable blocks that don't point to a loop header?

What is the status of this proposal? Is there something stopping it?

Another possible approach is to keep the current language describing back edge, but use a modified DFS similar to what I use where the DFS, when visiting a loop header, visits the merge block and continue block first and then all its other successors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed this point:

At the point that I test if the (selection) merge block has non-phi refs, I have already removed the selection merge instruction that pointed to it. So we can get an unreachable (former) merge block. I even added a test that hits this very logic.

Looking again, I see the OrphanMerge, and experimentation proves that it triggers that case.

In reading the code, I got the fact that the loop finding dead blocks avoids adding the merge block, but then missed the significance of the "if (deadLabId == mergeLabId) deadSuccIds.insert(...) " code sequence.

Ok. Looks good to me now. You've even got the TODO there.

Fair enough about the latch edge issue. I'm more comfortable being aggressive there because I've already seen CTS test cases working even though they use the more aggressive rule. I only made the rule when I noticed the CTS test cases need something more general than the spec or validator was willing to give.

@greg-lunarg
Copy link
Contributor Author

Thanks for the continuing very thorough and thoughtful reviews. I think I have addressed all of your latest concerns. Let me know if you would like any other changes.

@dneto0
Copy link
Collaborator

dneto0 commented Jul 7, 2017

Looks good. Thanks!

About trying to be careful: I try. I also am willing to be wrong. I figure if you're right 90% of the time and I'm right 90% of the time, but our wrongness doesn't overlap, then together we'll get to 99% right by correcting each other's mistakes. :-)

@dneto0
Copy link
Collaborator

dneto0 commented Jul 7, 2017

Squashed, rebased, and pushed onto master as 52e247f

@dneto0 dneto0 closed this Jul 7, 2017
dneto0 pushed a commit to dneto0/SPIRV-Tools that referenced this pull request Sep 14, 2024
…#675)

This method needs to live a little longer to allow downstream users to
migrate without causing build breakages.

This is part of KhronosGroup#673
dneto0 pushed a commit to dneto0/SPIRV-Tools that referenced this pull request Sep 14, 2024
The downstream users that use this method have been updated.

SetHLSLShaderModel is the replacement.

Fixes KhronosGroup#673
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants