Question: why OpSelectionMerge and OpLoopMerge are separate instructions? #147

ghost · 2020-03-10T20:12:49Z

If the idea behind OpSelectionMerge and OpLoopMerge is that threads should know when they should diverge and converge, why have 2 separate instructions for conditional statements and loops? The fact that there are 2 instructions gives rise to many corner case questions, questions such as "should a OpLoopMerge block have OpSelectionMerge blocks within itself or not" (SPIR-V documentation example says no, while I sometimes see SPIR-V code in the wild that says yes). I see driver programmers from a big IHV hit these corner cases and complain about it on the Internet in the year of 2019, 5 years after the release of the spec, this is not normal, there are Android phones out there with broken SPIR-V compilers because of that, and many more will come if someone won't clarify all of this in detail.

The text was updated successfully, but these errors were encountered:

ghost · 2020-03-10T21:09:01Z

Here's my theory that so far fits all the SPIR-V code I saw from the Khronos Group and other vendors:

OpLoopMerge and OpSelectionMerge are functionally the same, they both are about thread divergence and convergence only, the only reason they were split to two is because OpSelectionMerge needs only one parameter, while OpLoopMerge needs two. (This rule aligns with the SPIR-V documentation example that doesn't specify OpSelectionMerge for OpBranchConditional %56 %50 %51 because it relies on OpLoopMerge's merge block, so if OpLoopMerge and OpSelectionMerge were about functionally different things then this wouldn't be legal. You could've make one instruction and ignore the second parameter for conditional statements then, but we can't go back in time and change the past, so whatever.)
You need to specify OpLoopMerge and OpSelectionMerge within other OpLoopMerge and OpSelectionMerge blocks even tho the threads are diverged already. (What? Diverge the already divergent threads? This rule aligns with what I saw in some SPIR-V code that had a OpSelectionMerge block within a parent OpLoopMerge block tho, so ok, whatever.)
You can omit OpSelectionMerge only if the conditional statement converges to already specified by the parent OpLoopMerge or OpSelectionMerge block's merge block. (This rule aligns with the SPIR-V documentation example and some other SPIR-V code I saw, so alright, fine.)

ghost · 2020-03-10T21:12:20Z

Correct me if I'm wrong.

johnkslang · 2020-03-11T14:41:09Z

Can you give an example of an implicit (undeclared) merge block?

We don't need merge instructions for branches that are reconverging the control flow, just for those that are introducing new nested divergence.

ghost · 2020-03-11T16:24:15Z

Can you give an example of an implicit (undeclared) merge block?

A merge block is always present, it's that

We don't need merge instructions for branches that are reconverging the control flow

makes OpSelectionMerge implicit (undeclared) sometimes. I hope only for reconverging the control flow.

just for those that are introducing new nested divergence.

Why divergence nests?

ghost · 2020-03-11T16:30:43Z

Specifically,

You need to specify OpLoopMerge and OpSelectionMerge within other OpLoopMerge and OpSelectionMerge blocks even tho the threads are diverged already.

This rule aligns with what I saw in some SPIR-V code that had a OpSelectionMerge block within a parent OpLoopMerge block

Why OpSelectionMerge needs to be issued in a OpLoopMerge block that is already divergent?

ghost · 2020-03-11T16:47:26Z

For example:

#version 450

layout(location = 0) in vec4 color1;
layout(location = 1) in vec4 multiplier;
layout(location = 2) noperspective in vec4 color2;

layout(location = 0) out vec4 color;

struct S {
    bool b;
    vec4 v[5];
    int i;
};

layout(binding = 0) uniform blockName {
    S s;
    bool cond;
};

void main()
{
    vec4 scale = vec4(1.0, 1.0, 2.0, 1.0);

    for (int i = 0; i < 4; ++i) {
        if (cond)
            color *= color1 + s.v[2];
        else
            color *= sqrt(color2) * scale;
        color *= multiplier;
    }
}

; SPIR-V
; Version: 1.0
; Generator: Khronos Glslang Reference Front End; 8
; Bound: 67
; Schema: 0
               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint Fragment %main "main" %color %color1 %color2 %multiplier
               OpExecutionMode %main OriginUpperLeft
               OpSource GLSL 450
               OpName %main "main"
               OpName %scale "scale"
               OpName %i "i"
               OpName %S "S"
               OpMemberName %S 0 "b"
               OpMemberName %S 1 "v"
               OpMemberName %S 2 "i"
               OpName %blockName "blockName"
               OpMemberName %blockName 0 "s"
               OpMemberName %blockName 1 "cond"
               OpName %_ ""
               OpName %color "color"
               OpName %color1 "color1"
               OpName %color2 "color2"
               OpName %multiplier "multiplier"
               OpDecorate %_arr_v4float_uint_5 ArrayStride 16
               OpMemberDecorate %S 0 Offset 0
               OpMemberDecorate %S 1 Offset 16
               OpMemberDecorate %S 2 Offset 96
               OpMemberDecorate %blockName 0 Offset 0
               OpMemberDecorate %blockName 1 Offset 112
               OpDecorate %blockName Block
               OpDecorate %_ DescriptorSet 0
               OpDecorate %_ Binding 0
               OpDecorate %color Location 0
               OpDecorate %color1 Location 0
               OpDecorate %color2 NoPerspective
               OpDecorate %color2 Location 2
               OpDecorate %multiplier Location 1
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
%_ptr_Function_v4float = OpTypePointer Function %v4float
    %float_1 = OpConstant %float 1
    %float_2 = OpConstant %float 2
         %12 = OpConstantComposite %v4float %float_1 %float_1 %float_2 %float_1
        %int = OpTypeInt 32 1
%_ptr_Function_int = OpTypePointer Function %int
      %int_0 = OpConstant %int 0
      %int_4 = OpConstant %int 4
       %bool = OpTypeBool
       %uint = OpTypeInt 32 0
     %uint_5 = OpConstant %uint 5
%_arr_v4float_uint_5 = OpTypeArray %v4float %uint_5
          %S = OpTypeStruct %uint %_arr_v4float_uint_5 %int
  %blockName = OpTypeStruct %S %uint
%_ptr_Uniform_blockName = OpTypePointer Uniform %blockName
          %_ = OpVariable %_ptr_Uniform_blockName Uniform
      %int_1 = OpConstant %int 1
%_ptr_Uniform_uint = OpTypePointer Uniform %uint
     %uint_0 = OpConstant %uint 0
%_ptr_Output_v4float = OpTypePointer Output %v4float
      %color = OpVariable %_ptr_Output_v4float Output
%_ptr_Input_v4float = OpTypePointer Input %v4float
     %color1 = OpVariable %_ptr_Input_v4float Input
      %int_2 = OpConstant %int 2
%_ptr_Uniform_v4float = OpTypePointer Uniform %v4float
     %color2 = OpVariable %_ptr_Input_v4float Input
 %multiplier = OpVariable %_ptr_Input_v4float Input
       %main = OpFunction %void None %3
          %5 = OpLabel
      %scale = OpVariable %_ptr_Function_v4float Function
          %i = OpVariable %_ptr_Function_int Function
               OpStore %scale %12
               OpStore %i %int_0
               OpBranch %17
         %17 = OpLabel
               OpLoopMerge %19 %20 None
               OpBranch %21
         %21 = OpLabel
         %22 = OpLoad %int %i
         %25 = OpSLessThan %bool %22 %int_4
               OpBranchConditional %25 %18 %19
         %18 = OpLabel
         %35 = OpAccessChain %_ptr_Uniform_uint %_ %int_1
         %36 = OpLoad %uint %35
         %38 = OpINotEqual %bool %36 %uint_0
               OpSelectionMerge %40 None
               OpBranchConditional %38 %39 %53
         %39 = OpLabel
         %45 = OpLoad %v4float %color1
         %48 = OpAccessChain %_ptr_Uniform_v4float %_ %int_0 %int_1 %int_2
         %49 = OpLoad %v4float %48
         %50 = OpFAdd %v4float %45 %49
         %51 = OpLoad %v4float %color
         %52 = OpFMul %v4float %51 %50
               OpStore %color %52
               OpBranch %40
         %53 = OpLabel
         %55 = OpLoad %v4float %color2
         %56 = OpExtInst %v4float %1 Sqrt %55
         %57 = OpLoad %v4float %scale
         %58 = OpFMul %v4float %56 %57
         %59 = OpLoad %v4float %color
         %60 = OpFMul %v4float %59 %58
               OpStore %color %60
               OpBranch %40
         %40 = OpLabel
         %62 = OpLoad %v4float %multiplier
         %63 = OpLoad %v4float %color
         %64 = OpFMul %v4float %63 %62
               OpStore %color %64
               OpBranch %20
         %20 = OpLabel
         %65 = OpLoad %int %i
         %66 = OpIAdd %int %65 %int_1
               OpStore %i %66
               OpBranch %17
         %19 = OpLabel
               OpReturn
               OpFunctionEnd

Why do you need to issue OpSelectionMerge %40 None if the whole loop is already in the divergent state by OpLoopMerge %19 %20 None?

johnkslang · 2020-03-11T18:47:22Z

The rules nest. Strip away the loop, and that's the convergence of the divergence that occurred. Not ignoring the loop, it's not a full convergence, but it's still a local convergence, rather than a further divergence.

ghost · 2020-03-11T20:28:51Z

Strip away the loop, and that's the convergence of the divergence that occurred.

I see, so basically SPIR-V is always about explicitly specified by the user divergence and convergence and you guys don't want to deal with cfg analysis and automatic structurization like in DXIL?

ghost · 2020-03-11T20:34:43Z

Because if SPIR-V code transform tools are allowed to insert new merge instructions, then you can strip that loop and insert OpSelectionMerge %40 None for the user.

johnkslang · 2020-03-12T06:44:31Z

SPIR-V for graphics explicitly declares its structured control flow. CFG analysis can be done to verify what was declared, but complex inference is not required.

It's a common theme in SPIR-V: Rather than infer (e.g., result types), declare and verify, because in its full generality, verification of a statement is simpler than inference of the statement.

ghost · 2020-03-12T09:29:05Z

I see. Thank you for answers, John!

The issue can be closed for now.

alan-baker transferred this issue from KhronosGroup/SPIRV-Tools Mar 11, 2020

ghost closed this as completed Mar 12, 2020

Jasper-Bekkers mentioned this issue Jul 27, 2020

Exploration EmbarkStudios/rust-gpu#1

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: why OpSelectionMerge and OpLoopMerge are separate instructions? #147

Question: why OpSelectionMerge and OpLoopMerge are separate instructions? #147

ghost commented Mar 10, 2020

ghost commented Mar 10, 2020 •

edited by ghost

Loading

ghost commented Mar 10, 2020

johnkslang commented Mar 11, 2020

ghost commented Mar 11, 2020

ghost commented Mar 11, 2020 •

edited by ghost

Loading

ghost commented Mar 11, 2020

johnkslang commented Mar 11, 2020

ghost commented Mar 11, 2020 •

edited by ghost

Loading

ghost commented Mar 11, 2020

johnkslang commented Mar 12, 2020

ghost commented Mar 12, 2020

Question: why OpSelectionMerge and OpLoopMerge are separate instructions? #147

Question: why OpSelectionMerge and OpLoopMerge are separate instructions? #147

Comments

ghost commented Mar 10, 2020

ghost commented Mar 10, 2020 • edited by ghost Loading

ghost commented Mar 10, 2020

johnkslang commented Mar 11, 2020

ghost commented Mar 11, 2020

ghost commented Mar 11, 2020 • edited by ghost Loading

ghost commented Mar 11, 2020

johnkslang commented Mar 11, 2020

ghost commented Mar 11, 2020 • edited by ghost Loading

ghost commented Mar 11, 2020

johnkslang commented Mar 12, 2020

ghost commented Mar 12, 2020

ghost commented Mar 10, 2020 •

edited by ghost

Loading

ghost commented Mar 11, 2020 •

edited by ghost

Loading

ghost commented Mar 11, 2020 •

edited by ghost

Loading