Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit uniformity / convergence #81

Open
darksylinc opened this issue Jul 16, 2019 · 2 comments
Open

Explicit uniformity / convergence #81

darksylinc opened this issue Jul 16, 2019 · 2 comments

Comments

@darksylinc
Copy link

darksylinc commented Jul 16, 2019

This is a proposal to add new keyword modifiers for variables for improved compile-time warning and error generation, compile-time diagnosed, and improved optimization opportunities.

Variables in GLSL can be of 4 different convergency levels:

  1. uniform
  2. threadgroup_uniform
  3. simd_uniform (aka dynamically uniform)
  4. dynamic

Each one a downgrade from the previous one.

Variables like gl_WorkGroupID are threadgroup_uniform by nature.

Any variable coming out from anyInvocationARB and family are explicitly upgraded to simd_uniform.

Uniform variables mixed up with simd_uniform variables get downgraded to simd_uniform
simd_uniform variables mixed up with dynamic variables get downgraded to dynamic.
"Mixing" can be addition, subtraction, multiplication, etc.

Any operation where two or more variables are involved with different attributes result in a variable that has the lowest attribute.

When a variable of lower convergence is used on a function that strictly requires a variable of higher convergence (e.g. using a dynamic variable on a function that requires a simd_uniform, threadgroup_uniform or uniform argument), it should throw an error.

The are several cases that would benefit from this.

Example 1
Execution barriers result in UB if called from within a branch evaluating variables of convergence lower than threadgroup_uniform.
This error could be easily detected at compile-time.

dynamic float value = ...;

if( value >= threshold )
    memoryBarrier(); //Error, memoryBarrier calling from a non-uniform branch at line 123

Solution:

dynamic float value = ...;

if( threadgroup_uniform_promise( value >= threshold ) )
    memoryBarrier(); //Allowed, but it's the user responsibility to ensure the data always meets 'value >= threshold' uniformly

Please note that threadgroup_uniform_promise( value >= threshold ) is not the same as threadgroup_uniform float value = threadgroup_uniform_promise( ... ).

The latter gives authorization to the compiler to put 'value' in an SGPR register which can have other undesired side effects if 'value' is used for something else than the evaluation of this branch.

Precedent: FXC compiler performs this type of diagnostic. See microsoft/DirectXShaderCompiler#1306

Example 2

The following statements can produce compile-time error unless VK_EXT_descriptor_indexing is present (or any similar extension that interacts with this feature):

int i = input_pixelshader.uv.x;
albedo = texture( myTex[i], uv ); //Compiler error, i is dynamic.

simd_uniform int i = input_pixelshader.uv.x; //Compiler error, cannot implicitly cast
albedo = texture( myTex[i], uv ); //OK

simd_uniform int i = readFirstInvocationARB( input_pixelshader.uv.x ); //OK
albedo = texture( myTex[i], uv ); //OK, i is explicitly simd_uniform

uint i = gl_WorkGroupID.x;
albedo = texture( myTex[i], uv ); //OK, i is implicitly threadgroup_uniform.

uint i = gl_WorkGroupID.x + some_dynamic_variable;
albedo = texture( myTex[i], uv ); //Compiler error, i is dynamic.

simd_uniform uint i = gl_WorkGroupID.x + some_dynamic_variable; //Compiler error, cannot implicitly cast
albedo = texture( myTex[i], uv ); //OK

Example 3

The following statement can be optimized thanks to extra information:

simd_uniform int i = readFirstInvocationARB( idx ) % 2;
if( i == 0 )
    albedo = texture( myTex[0], uv );
else if( i == 1 )
    albedo = texture( myTex[1], uv );

// Can be optimized to:

simd_uniform int i = readFirstInvocationARB( idx ) % 2;
albedo = texture( myTex[i], uv );

Likewise, the following readFirstInvocationARB call can be converted to a no-op:

simd_uniform int foo = ...;
simd_uniform int bar = readFirstInvocationARB( foo ); //No-op.

Downgrading:
Downgrading is simple. Shader inputs have a natural convergence type. gl_WorkGroupID is threadgroup_uniform, any value coming out of a texture fetch is dynamic.

Simply mixing shader inputs of different convergence results in the lowest common denominator.

Variables can also be explicitly downgraded. Compilers could generate warnings when this is unnecessary. For example:

dynamic uint idx = gl_WorkGroupID.x; //Warning: Unnecessary Convergence degradation. Consider declaring this variable of type threadgroup_uniform

Upgrading:

Upgrading must always be explicit:

uniform int var = uniform_promise( foo );
threadgroup_uniform int var = threadgroup_uniform_promise( foo );
simd_uniform int var = simd_uniform_promise( foo );

There are some functions that can also be used for explicit upgrading, such as readFirstInvocationARB.

The difference between readFirstInvocationARB and simd_uniform_promise is that the former may for example perform instructions to move a value out of a VGPR register to an SGPR register; while the latter is a simple assumption.
For example GPU cards which support non-dynamically-uniform indexing of textures may choose to generate instructions that index the texture by using the index from a VGPR directly, instead of moving the index to an SGPR register first.
On GCN cards readFirstInvocationARB and simd_uniform_promise would basically do the same.

If the user breaks his promise, the use of readFirstInvocationARB and simd_uniform_promise could mean the code behaves differently. Additionally, readFirstInvocationARB always results in defined behavior (results could be non-deterministic though due to race conditions or the data being fetched), while breaking the contract of simd_uniform_promise is always UB.

Variables with explicit convergence keyword cannot be implicitly downgraded or upgraded

The following results in compiler error:

simd_uniform variable = 0;
variable += texture( myTex, uv ).xyzw; //Compiler error

The following is correct, but won't explicitly upgrade.

simd_uniform variable = 0; //Variable is simd_uniform but could be uniform
variable += 5; //OK, but variable is still simd_uniform instead of uniform. Compiler could raise a warning

Note that compilers may still optimize based on the knowledge that 'variable' is actually uniform. But they must behave as if 'variable' is of type simd_uniform, e.g. when it comes to raising compiler errors.

Variables without explicit convergence keyword

Variables declared without any convergence keyword are automatically calculated given by their input, and they can mutate to other types even after their initial declaration. For example:

float4 value = 0; //value is now uniform
value += gl_WorkGroupID.x; //value is now threadgroup_uniform
value *= texture( myTex, uv ).xyzw; //value is now dynamic
value = 0; //value is now uniform

Compilers and other tools can help the programmer identify at which lines variables changed convergency.

With this simple scheme, it becomes possible to diagnose common human mistakes, it opens up potential new optimizations thanks to the extra available information, helps preventing accidental performance regressions caused by a variable becoming dynamic (when e.g. the code was originally written with the variable being simd_uniform), as well as opening the possibility of new tools to help programmers find bugs or improve performance.

Something as silly as writing at any random location:
simd_uniform int foo = a; //Compiles OK
Gives the programmer a lot of information about 'a' because it didn't fail to compile, meaning that 'a' at this point so far can still live in SGPR registers.

Additionally, it helps dispelling mysticism regarding GPUs (the compiler telling inexperienced programmers what they're doing wrong lowers the entry barrier. GPUs are already hard enough). Learning by trial and error is how self taught programmers train.

It may be possible that a scheme like this should also live in SPIR-V, however I lack the knowledge to debate there.

Dynamic Control flow handling

There are cases where a uniform variable needs to be downgraded due to a dynamic break.

In that case, how it's handled depends on whether the variable was implicit or explicit. Consider the following example:

dynamic int threshold = ...; // Can be implicitly dynamic too. Just making it explicit to point out the problem

int i = 2; // Uniform so far
for( i = 0; i < 256; ++i ) // i = 0 is now dynamic (continue reading)
{
   if( i <= 0 )
        memoryBarrier(); // Compiler Error: Error, i is dynamic (see next line)
   if( i < threshold )
        break; // The presence of this break turns i into dynamic
}

When it comes to optimizations, the compiler may perform advanced optimizations treating i as uniform up until the first if( i < threshold ) during the first iteration; since it's guaranteed that i is uniform while i = 0 up until the first break.

But when it comes to error generation it should reject this type of code.
The fix to this error would be need a promise:

dynamic int threshold = ...;

int i;
for( i = 0; i < 256; ++i )
{
   if( i <= 0 )
        memoryBarrier(); // OK
   if( threadgroup_uniform_promise( i < threshold ) )
        break; // Not our problem if user breaks the promise
}

When it comes to explicit, declaration, the code should error more explicitly:

dynamic int threshold = ...;

threadgroup_uniform int i;
for( i = 0; i < 256; ++i )
{
   if( i < threshold )
        break; // Compiler Error: Break turns i into dynamic, but it is of explicit type threadgroup_uniform
}

This error informs the user the code they're generating is no longer threadgroup_uniform, but variable i cannot be downgraded because it's of an explicit type.

Precedents and related discussion:
KhronosGroup/glslang#1809
microsoft/DirectXShaderCompiler#1306
https://reviews.llvm.org/D26348

@TravisGesslein
Copy link

came here from the twitter link. I've used compute shaders since they were first introduced and somehow didn't know about the barrier-inside-divergent-control-flow = UB thing, even though it makes perfect sense obviously, probably so much that it's assumed the programmer should realize this on their own. Nonetheless I missed it somehow.

Something as basic as that should definitely be detected at compile time, and if the language lacks feature support to make that happen, it should be added. This specific suggestion aside... can't this be detected already?

@darksylinc
Copy link
Author

darksylinc commented Feb 21, 2022

This specific suggestion aside... can't this be detected already?

Some basic checks are easy to implement. For example:

uniform int threshold;
if( threshold < 2.0f )
{
   memoryBarrier();
}

would be easy because the compiler already knows all the information, and memoryBarrier can be allowed, while disallow anything else where the conditions are not uniform and/or literals.

But for more advanced diagnostics, the compiler would have to track whether the conditions for the branch/loop are at least threadgroup_uniform. To implement that, the compiler would have to start tracking where all the involved variables in the condition come from.

Before you notice, you end up implementing what I'm proposing.
What I did was simply state the problem in a more formal way, to identify the tasks and risks correctly and avoid useless bugs (i.e. there is 4 types of control flow and downgrade happens implicitly while upgrade is explicit).

For example what should happen here?

uniform int threshold;

void myFunc()
{
  memoryBarrier();
}

if( threshold < 2.0f )
{
   myFunc(); // OK
}

if( dynamic_variable < 2.0f )
{
   myFunc(); // Not OK
}

With what I'm proposing, it would be trivial for the compiler to recognize myFunc is a problem in the second branch because the presence of memoryBarrier(); inside myFunc means myFunc must be executed from at least a threadgroup_uniform control flow block.

Hence calling the function from any block marked below that uniformity (e.g. it's called from a simd_uniform or dynamic block) must be an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants