Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inliner: Generates invalid structured control flow when function call is in a single-block loop, and callee has control flow #787

Closed
dneto0 opened this issue Aug 30, 2017 · 4 comments
Assignees
Labels

Comments

@dneto0
Copy link
Collaborator

dneto0 commented Aug 30, 2017

Here's a somewhat reduced example:

Original OpenCL C source:

int bar(global int *A, int n) {
  if (n < 19) {
    A[n] += 1;
  }
  return A[n];
}

kernel void foo(global int* A, int n) {
  for (int i = 0; i < n; i++) {
    A[i] = bar(A, n);
  }
}

Compling with https://github.com/google/clspv we get:

              OpCapability Shader
               OpCapability VariablePointers
               OpExtension "SPV_KHR_storage_buffer_storage_class"
               OpExtension "SPV_KHR_variable_pointers"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %35 "foo"
               OpSource OpenCL_C 120
               OpDecorate %17 SpecId 0
               OpDecorate %18 SpecId 1
               OpDecorate %19 SpecId 2
               OpDecorate %_runtimearr_uint ArrayStride 4
               OpMemberDecorate %_struct_4 0 Offset 0
               OpDecorate %_struct_4 Block
               OpMemberDecorate %_struct_6 0 Offset 0
               OpDecorate %_struct_6 Block
               OpDecorate %gl_WorkGroupSize BuiltIn WorkgroupSize
               OpDecorate %22 DescriptorSet 0
               OpDecorate %22 Binding 0
               OpDecorate %23 DescriptorSet 0
               OpDecorate %23 Binding 1
               OpDecorate %_ptr_StorageBuffer_uint ArrayStride 4
       %uint = OpTypeInt 32 0
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint
%_runtimearr_uint = OpTypeRuntimeArray %uint
  %_struct_4 = OpTypeStruct %_runtimearr_uint
%_ptr_StorageBuffer__struct_4 = OpTypePointer StorageBuffer %_struct_4
  %_struct_6 = OpTypeStruct %uint
%_ptr_StorageBuffer__struct_6 = OpTypePointer StorageBuffer %_struct_6
       %void = OpTypeVoid
          %9 = OpTypeFunction %void
       %bool = OpTypeBool
         %11 = OpTypeFunction %uint %_ptr_StorageBuffer_uint %uint
     %v3uint = OpTypeVector %uint 3
%_ptr_Private_v3uint = OpTypePointer Private %v3uint
     %uint_0 = OpConstant %uint 0
     %uint_1 = OpConstant %uint 1
    %uint_19 = OpConstant %uint 19
         %17 = OpSpecConstant %uint 1
         %18 = OpSpecConstant %uint 1
         %19 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %17 %18 %19
         %21 = OpVariable %_ptr_Private_v3uint Private %gl_WorkGroupSize
         %22 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer
         %23 = OpVariable %_ptr_StorageBuffer__struct_6 StorageBuffer
         %24 = OpFunction %uint None %11
         %25 = OpFunctionParameter %_ptr_StorageBuffer_uint
         %26 = OpFunctionParameter %uint
         %27 = OpLabel
         %28 = OpSLessThan %bool %26 %uint_19
         %29 = OpPtrAccessChain %_ptr_StorageBuffer_uint %25 %26
         %30 = OpLoad %uint %29
               OpSelectionMerge %33 None
               OpBranchConditional %28 %31 %33
         %31 = OpLabel
         %32 = OpIAdd %uint %30 %uint_1
               OpStore %29 %32
               OpBranch %33
         %33 = OpLabel
         %34 = OpPhi %uint %30 %27 %32 %31
               OpReturnValue %34
               OpFunctionEnd
         %35 = OpFunction %void None %9
         %36 = OpLabel
         %37 = OpAccessChain %_ptr_StorageBuffer_uint %22 %uint_0 %uint_0
         %38 = OpAccessChain %_ptr_StorageBuffer_uint %23 %uint_0
         %39 = OpLoad %uint %38
         %40 = OpSGreaterThan %bool %39 %uint_0
               OpSelectionMerge %50 None
               OpBranchConditional %40 %41 %50
         %41 = OpLabel
               OpBranch %42
         %42 = OpLabel
         %43 = OpPhi %uint %46 %42 %uint_0 %41
         %44 = OpFunctionCall %uint %24 %37 %39
         %45 = OpAccessChain %_ptr_StorageBuffer_uint %22 %uint_0 %43
               OpStore %45 %44
         %46 = OpIAdd %uint %43 %uint_1
         %47 = OpSLessThan %bool %46 %39
         %48 = OpLogicalNot %bool %47
               OpLoopMerge %49 %42 None
               OpBranchConditional %48 %49 %42
         %49 = OpLabel
               OpBranch %50
         %50 = OpLabel
               OpBranch %51
         %51 = OpLabel
               OpReturn
               OpFunctionEnd

The control flow graph is:
a

The 'bar' function starts at block with id 27, and has a simple selection structure.

The interesting bit is the loop with a single block with id 42 in the entry point 'foo'.

After inlining, I get this result:
b

I've attached the SPIR-V:
b.spv.txt

The problem is that the old block 42 has been split with head 42 the callee's body has been inserted as the last part of 42 and with new block 58 and 57. Unfortunately, the OpLoopMerge originally in 42 is kept as the last instruction in block 57, when it really should have been kept as the second last instruction in 42.

Basically, the structured control flow properties are corrupted. The validator catches this, saying:

error: 102: Back-edges (57 -> 42) can only be formed between a block and a loop header.

It detects the back-edge from 57 to 42, and expects the OpLoopMerge to be in block 42. It should be there, but isn't, and so it complains that block 42 is not a loop header. It's right.

@dneto0 dneto0 added the bug label Aug 30, 2017
@dneto0
Copy link
Collaborator Author

dneto0 commented Aug 30, 2017

Cc @greg-lunarg . Here's a fun one. I can probably hand-code more compact example, but I think you can think about the solution. :-)

@dneto0
Copy link
Collaborator Author

dneto0 commented Aug 31, 2017

Here's a simplified test case:

               OpCapability Shader
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %main "main"
               OpSource OpenCL_C 120

       %bool = OpTypeBool
       %true = OpConstantTrue %bool

       %void = OpTypeVoid
          %9 = OpTypeFunction %void

        %foo = OpFunction %void None %9
   %fooentry = OpLabel
               OpBranch %fooexit
    %fooexit = OpLabel
               OpReturn
               OpFunctionEnd

       %main = OpFunction %void None %9
      %entry = OpLabel
               OpBranch %loop

       %loop = OpLabel
        %nil = OpFunctionCall %void %foo
               OpLoopMerge %merge %loop None
               OpBranchConditional %true %loop %merge

      %merge = OpLabel
               OpReturn
               OpFunctionEnd

The CFG looks like this:
a

Inlining produces this

               OpCapability Shader
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %1 "main"
               OpSource OpenCL_C 120
       %bool = OpTypeBool
       %true = OpConstantTrue %bool
       %void = OpTypeVoid
          %5 = OpTypeFunction %void
          %6 = OpFunction %void None %5
          %7 = OpLabel
               OpBranch %8
          %8 = OpLabel
               OpReturn
               OpFunctionEnd
          %1 = OpFunction %void None %5
          %9 = OpLabel
               OpBranch %10
         %10 = OpLabel
               OpBranch %13
         %13 = OpLabel
               OpLoopMerge %12 %10 None
               OpBranchConditional %true %10 %12
         %12 = OpLabel
               OpReturn
               OpFunctionEnd

And this is the resulting CFG:
b

The Continue arc from 13 back to 10 is a symptom of the problem.

@dneto0 dneto0 self-assigned this Aug 31, 2017
@dneto0
Copy link
Collaborator Author

dneto0 commented Aug 31, 2017

I'm working on a fix.

@greg-lunarg
Copy link
Contributor

First impression: Before inlining, break all single block loops into "canonical" empty header form.

dneto0 added a commit to dneto0/SPIRV-Tools that referenced this issue Aug 31, 2017
If the caller block is a single-block loop and inlining will
replace the caller block by several blocks, then:
- The original OpLoopMerge instruction will end up in the *last*
  such block.  That's the wrong place to put it.
- Move it back to the end of the first block.
- Update its Continue Target ID to point to the last block

We also have to take care of cases where the inlined code
begins with a structured header block.  In this case
we need to ensure the restored OpLoopMerge does not appear
in the same block as the merge instruction from the callee's
first block.

Fixes KhronosGroup#787
dneto0 added a commit to dneto0/SPIRV-Tools that referenced this issue Sep 1, 2017
If the caller block is a single-block loop and inlining will
replace the caller block by several blocks, then:
- The original OpLoopMerge instruction will end up in the *last*
  such block.  That's the wrong place to put it.
- Move it back to the end of the first block.
- Update its Continue Target ID to point to the last block

We also have to take care of cases where the inlined code
begins with a structured header block.  In this case
we need to ensure the restored OpLoopMerge does not appear
in the same block as the merge instruction from the callee's
first block.

Fixes KhronosGroup#787
@dneto0 dneto0 closed this as completed in efff5fa Sep 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants