Permalink
Find file Copy path
fe2d48b Oct 22, 2018
1 contributor

Users who have contributed to this file

977 lines (716 sloc) 24.8 KB

DXC Cookbook: HLSL Coding Patterns for SPIR-V

Author: Steven Perron

Date: Oct 22, 2018

Introduction

This document provides a set of examples that demonstrate what will and will not be accepted by the DXC compiler when generating SPIR-V. The difficulty in defining what is acceptable is that it cannot be specified by a grammar. The entire program must be taken into consideration. Hopefully this will be useful.

We are interested in how global resources are used. For a SPIR-V shader to be valid, accesses to global resources like structured buffers and images must be done directly on the global resources. They cannot be copied or have their address returned from functions. However, in HLSL, it is possible to copy a global resource or to pass it by reference to a function. Since this can be arbitrarily complex, DXC can generate valid SPIR-V only if the compiler is able to remove all of these copies.

The transformations that are used to remove the copies will be the same for both structured buffers and images, so we have chosen to focus on structured buffer. The process of transforming the code in this way is called legalization.

Support evolves over time as the optimizations in SPIRV-Tools are improved. At GDC 2018, Greg Fischer from LunarG presented earlier results in this space. The DXC, Glslang, and SPIRV-Tools maintainers work together to handle new HLSL code patterns. This document represents the state of the DXC compiler in October 2018.

Glslang does legalization as well. However, what it is able to legalize is different from DXC because of features it chooses to support, and the optimizations from SPIRV-Tools it choose to run. For example, Glslang does not support structured buffer aliasing yet, so many of these examples will not work with Glslang.

All of the examples are available in the DXC repository, at https://github.com/Microsoft/DirectXShaderCompiler/tree/master/tools/clang/test/CodeGenSPIRV/legal-examples . To open a link to Tim Jones' Shader Playground for an example, you can follow the url in the comments of each example.

Examples for structured buffers

Desired code

// 0-copy-sbuf-ok.hlsl
// http://shader-playground.timjones.io/e6af2bdce0c61ed07d3a826aa8a95d45

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer;

void main() {
  gRWSBuffer[i] = gSBuffer[i];
}

This example shows code that directly translates to valid SPIR-V. In this case, we have two structured buffers. When one of their elements is accessed, it is done by naming the resource from which to get the element.

Note that it is fine to copy an element of the structured buffer.

Single copy to a local

Cases that can be easily legalized are those where there is exactly one assignment to the local copy of the structured buffer. In this context, a local is either a global static or a function scope symbol. Something that can be accessed by only a single instance of the shader. When you have a single copy to a local, it is obvious which global is actually be used. This allows the compiler to replace a reference to the local symbol with the global resource.

Initialization of a static

// 1-copy-global-static-ok.hlsl
// http://shader-playground.timjones.io/815543dc91a4e6855a8d0c6a345d4a5a

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer;

static StructuredBuffer<S> sSBuffer = gSBuffer;

void main() {
  gRWSBuffer[i] = sSBuffer[i];
}

This example shows an implicitly addressed structured buffer gSBuffer assigned to a static sSBuffer. This copy is treated like a shallow copy. This is implemented by making sSBuffer a pointer to gSBuffer.

This example can be legalized because the compiler is able to see that sSbuffer is points to gSBuffer, which does not move, so uses of sSbuffer can be replaced by gSBuffer.

// 2-write-global-static-ok.hlsl
// http://shader-playground.timjones.io/1c65c467e395383945d219a60edbe10c

struct S {
  float4 f;
};

int i;

RWStructuredBuffer<S> gRWSBuffer;

static RWStructuredBuffer<S> sRWSBuffer = gRWSBuffer;

void main() {
  sRWSBuffer[i].f = 0.0;
}

This example is similar to the previous example, except in this case the shallow copy becomes important. sRWSBuffer is treated like a pointer to gRWSBuffer. As before, the references to sRWSBuffer can be replaced by gRWSBuffer. This means that the write that occurs will be visible outside of the shader.

Copy to function scope

// 3-copy-local-struct-ok.hlsl
// http://shader-playground.timjones.io/77dd20774e4943044c2f1b630c539f07

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};


int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer;

void main() {
  CombinedBuffers cb;
  cb.SBuffer = gSBuffer;
  cb.RWSBuffer = gRWSBuffer;
  cb.RWSBuffer[i] = cb.SBuffer[i];
}

It is also possible to copy a structured buffer to a function scope symbol. This is similar to a copy to a static scope symbol. The local copy is really a pointer to the original. This example demonstrates that DXC can legalize the copy even if it is a copy to part of a structure. There are no specific restrictions on the structure. The structured buffers can be anywhere in the structure, and there can be any number of members. Structured buffers can be in nested structures of any depth. The following is a move complicated example.

// 4-copy-local-nested-struct-ok.hlsl
// http://shader-playground.timjones.io/14f59ff2a28c0a0180daf6ce4393cf6b

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};

struct S2 {
  CombinedBuffers cb;
};

struct S1 {
  S2 s2;
};

int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer;

void main() {
  S1 s1;
  s1.s2.cb.SBuffer = gSBuffer;
  s1.s2.cb.RWSBuffer = gRWSBuffer;
  s1.s2.cb.RWSBuffer[i] = s1.s2.cb.SBuffer[i];
}

Function parameters

// 5-func-param-sbuf-ok.hlsl
// http://shader-playground.timjones.io/aeb06f527c5390d82d63bdb4eafc9ae7

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};


int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer;

void foo(StructuredBuffer<S> pSBuffer) {
  gRWSBuffer[i] = pSBuffer[i];
}

void main() {
  foo(gSBuffer);
}

It is possible to pass a structured buffer as a parameter to a function. As with the copies in the previous section, it is a pointer to the structured buffer that is actually being passed to foo. This is the same way that arrays work in C/C++.

// 6-func-param-rwsbuf-ok.hlsl
// http://shader-playground.timjones.io/f4e0194ce78118c0a709d85080ccea93

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer;

void foo(RWStructuredBuffer<S> pRWSBuffer) {
  pRWSBuffer[i] = gSBuffer[i];
}

void main() {
  foo(gRWSBuffer);
}

The same is true for RW structured buffers. So in this case, the write to pRWSBuffer is changing gRWSBuffer. This means that the write to pRWSBuffer will be visible outside of the function, and outside of the shader.

Return values

The next two examples show that structured buffers can be a function's return value. As before, the return value of foo is really a pointer to the global resource.

// 7-func-ret-tmp-var-ok.hlsl
// http://shader-playground.timjones.io/d6b706423f02dad58fbb01841282c6a1

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer;

RWStructuredBuffer<S> foo() {
  return gRWSBuffer;
}

void main() {
  RWStructuredBuffer<S> lRWSBuffer = foo();
  lRWSBuffer[i] = gSBuffer[i];
}
In this case, the compiler will replace lRWSBuffer by gRWSBuffer.
// 8-func-ret-direct-ok.hlsl
// http://shader-playground.timjones.io/6edbbc1aa6c6b6533c5a728135f87fb9

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer;

StructuredBuffer<S> foo() {
  return gSBuffer;
}

void main() {
  gRWSBuffer[i] = foo()[i];
}

This example is similar to the previous, but shows that you do not have to use an explicit temporary value.

Conditional control flow

The examples so far have do not have any conditional control flow. This makes it obvious which resources are being used. The introduction of conditional control flow makes the job of the compiler much harder, and in some cases impossible. Remember that the compiler is trying to determine at compile time which resource will be used at run time. In this section, we will look at how control flow affects the compiler's ability to do this. The bottom line is that the compiler has to be able to turn all of the conditional control flow that affects which resources are used into straight line code.

Inputs in if-statement

The first example is one where the compiler cannot determine which resource is actually being accessed.

// 9-if-stmt-select-fail.hlsl
// http://shader-playground.timjones.io/2896e95627fd8a6689ca96c81a5c7c68

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};


int i;

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

#define constant 0

void main() {

  StructuredBuffer<S> lSBuffer;
  if (constant > i) {          // Condition can't be computed at compile time.
    lSBuffer = gSBuffer1;      // Will produce invalid SPIR-V for Vulkan.
  } else {
    lSBuffer = gSBuffer2;
  }
  gRWSBuffer[i] = lSBuffer[i];
}

In this example, lsBuffer could be either gSBuffer1 or gSBuffer2. It depends on the value of i which is a parameter to the shader and cannot be known at compile time. At this time, the compiler is not able to convert this code into something that drivers will accept.

If this is the pattern that your code, I would suggest rewriting the code into the following:

// 10-if-stmt-select-ok.hlsl
// http://shader-playground.timjones.io/5063d8a0a7ad1f9d0839cd34a6d94dd2

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};


int i;

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

#define constant 0

void main() {

  StructuredBuffer<S> lSBuffer;
  if (constant > i) {
    lSBuffer = gSBuffer1;
    gRWSBuffer[i] = lSBuffer[i];
  } else {
    lSBuffer = gSBuffer2;
    gRWSBuffer[i] = lSBuffer[i];
  }
}

Notice that this involves replicating code. If the code that follows the if-statement is long, you could consider moving it to a function, and having two calls to that function.

If-statements with constants

Not all control flow is a problem. There are situations where the compiler is able to determine that a condition is always true or always false. For example, in the following code, the compiler looks at "0>2", and knows that is always false.

// 11-if-stmt-const-ok.hlsl
// http://shader-playground.timjones.io/7ef5b89b3ec3d56c22e1bca45b40516a

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

#define constant 0

void main() {

  StructuredBuffer<S> lSBuffer;
  if (constant > 2) {
    lSBuffer = gSBuffer1;
  } else {
    lSBuffer = gSBuffer2;
  }
  gRWSBuffer[i] = lSBuffer[i];
}

The compiler will turn this code into

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

#define constant 0

void main() {
  gRWSBuffer[i] = gSBuffer2[i];
}

The two previous examples show that handling control flow depends on what the compiler can do. This depends on the amount of optimization that is done, and which optimizations are done. In general, when you are writing code that will select a resource, keep the conditions as simple as possible to make it as easy as possible for the compiler to determine which path is taken.

Switch statements

Switch statements are similar to if-statements. If the selector is a constant, then the compiler will be able to propagate the copies.

// 12-switch-stmt-select-fail.hlsl
// http://shader-playground.timjones.io/b079f878daeba5d77842725b90a476ca

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};


int i;

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

#define constant 0

void main() {

  StructuredBuffer<S> lSBuffer;
  switch(i) {                   // Compiler can't determine which case will run.
    case 0:
      lSBuffer = gSBuffer1;     // Will produce invalid SPIR-V for Vulkan.
      break;
    default:
      lSBuffer = gSBuffer2;
      break;
  }
  gRWSBuffer[i] = lSBuffer[i];
}

The compiler is not able to remove the copies in this example because it does not know the value of i at compile time.

// 13-switch-stmt-const-ok.hlsl
// http://shader-playground.timjones.io/a46dd1f1a84eba38c047439741ec08ab

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};


int i;

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

const static int constant = 0;

void main() {

  StructuredBuffer<S> lSBuffer;
  switch(constant) {
    case 0:
      lSBuffer = gSBuffer1;
      break;
    default:
      lSBuffer = gSBuffer2;
      break;
  }
  gRWSBuffer[i] = lSBuffer[i];
}

However, if the selector is turned into a constant, the compiler can replace uses of lSBuffer by gSBuffer1.

Loop Induction Variables in conditions

Besides inputs, another type of variable that hinders the compiler are loop induction variables. These are variables that change value for each iteration of the loop. Consider this example.

// 14-loop-var-fail.hlsl
// http://shader-playground.timjones.io/8df364770e3f425e6321e71f817bcd1a

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

#define constant 0

void main() {

  StructuredBuffer<S> lSBuffer;

  for( int j = 0; j < 2; j++ ) {
    if (constant > j) {         // Condition is different for different iterations
      lSBuffer = gSBuffer1;     // Will produces invalid SPIR-V for Vulkan.
    } else {
      lSBuffer = gSBuffer2;
    }
    gRWSBuffer[j] = lSBuffer[j];
  }
}

In this example, j is an induction variable. It takes on the values 0 and 1. The information is there to be able to determine which path is taken in each iteration, but the compiler does not figure this out by default.

If you want the compiler to be able to legalize this code, then you will have to direct the compiler to unroll this loop using the unroll attribute. The following example can be legalized by the compiler:

// 15-loop-var-unroll-ok.hlsl
// http://shader-playground.timjones.io/3d0f6f830fc4a5102714e19c748e81c7

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

#define constant 0

void main() {

  StructuredBuffer<S> lSBuffer;

  [unroll]
  for( int j = 0; j < 2; j++ ) {
    if (constant > j) {
      lSBuffer = gSBuffer1;
    } else {
      lSBuffer = gSBuffer2;
    }
    gRWSBuffer[j] = lSBuffer[j];
  }
}

Variable iteration counts

Adding the unroll attribute to loops does not guarantee that the compiler is able to legalize the code. The compiler has to be able to fully unroll the loop. That means the compiler will have to create a copy of the body of the loop for each iteration so that there is no loop anymore. That can only be done if the number of iterations can be known at compile time.

This means that the compiler must be able to determine the initial value, the final value, and the step for the induction variable, j in the example. None of foo1, foo2, or foo3 can be legalized because the number of iterations cannot be known at compile time.

// 16-loop-var-range-fail.hlsl
// http://shader-playground.timjones.io/376f5f985c3ceceea004ab58edb336f2

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

int i;

#define constant 0

void foo1() {
  StructuredBuffer<S> lSBuffer;

  [unroll]
  for( int j = i; j < 2; j++ ) {  // Compiler can't determine the initial value
    if (constant > j) {
      lSBuffer = gSBuffer1;
    } else {
      lSBuffer = gSBuffer2;
    }
    gRWSBuffer[j] = lSBuffer[j];
  }
}

void foo2() {
  StructuredBuffer<S> lSBuffer;

  [unroll]
  for( int j = 0; j < i; j++ ) {  // Compiler can't determine the end value
    if (constant > j) {
      lSBuffer = gSBuffer1;
    } else {
      lSBuffer = gSBuffer2;
    }
    gRWSBuffer[j] = lSBuffer[j];
  }
}

void foo3() {
  StructuredBuffer<S> lSBuffer;

  [unroll]
  for( int j = 0; j < 2; j += i ) { // Compiler can't determine the step count
    if (constant > j) {
      lSBuffer = gSBuffer1;
    } else {
      lSBuffer = gSBuffer2;
    }
    gRWSBuffer[j] = lSBuffer[j];
  }
}


void main() {
  foo1(); foo2(); foo3();
}

As before the compiler will try to simplify expressions to determine their value at compile time, but it may not always be successful. We would recommend that you keep the expressions for the loop bounds as simple as possible to increase the chances the compiler can figure it out.

Other restrictions on unrolling

Not being able to determine the iteration count at compile time is a fundamental problem. No matter how good the compiler is, it will never be able to fully unroll the loop. However, due to the internal details (algorithms in the SPIRV-Tools optimizer), other cases cannot be handled. The most notable one is that the induction variable must be an integral type.

// 17-loop-var-float-fail.hlsl
// http://shader-playground.timjones.io/d5d2598699378688684a4a074553dddf

struct S {
  float4 f;
};

struct CombinedBuffers {
  StructuredBuffer<S> SBuffer;
  RWStructuredBuffer<S> RWSBuffer;
};

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer;

#define constant 0

void main() {

  StructuredBuffer<S> lSBuffer;

  [unroll]
  for( float j = 0; j < 2; j++ ) {  // Can't infer floating point induction values
    if (constant > j) {
      lSBuffer = gSBuffer1;
    } else {
      lSBuffer = gSBuffer2;
    }
    gRWSBuffer[j] = lSBuffer[j];
  }
}

This example cannot be legalized because j is a float.

Other interesting cases

Multiple calls to a function

// 18-multi-func-call-ok.hlsl
// http://shader-playground.timjones.io/e7b3ac1262a291c92902fd3f1fd3343c

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer1;
RWStructuredBuffer<S> gRWSBuffer2;


void foo(RWStructuredBuffer<S> pRWSBuffer) {
  pRWSBuffer[i] = gSBuffer[i];
}

void main() {
  foo(gRWSBuffer1);
  foo(gRWSBuffer2);
}

In this example, we see the same function is called twice. Each call has a different parameter. This can look like a problem because pRWSBuffer could be either gRWSBuffer1 or gRWSBuffer2. However, the compiler is able to work around this by creating a separate copy of foo for each call site. In fact, these copies will be placed inline.

Multiple returns

As we have already seen, a return from a function is a copy. At this point, it would be fair to ask what happens if there are multiple returns.

// 19-multi-func-ret-fail.hlsl
// http://shader-playground.timjones.io/922facb688a5ba09b153d64cf1fc4557

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer;
RWStructuredBuffer<S> gRWSBuffer1;
RWStructuredBuffer<S> gRWSBuffer2;

RWStructuredBuffer<S> foo(int l) {
  if (l == 0) {       // Compiler does not know which branch will be taken:
                      // Branch taken depends on input i.
    return gRWSBuffer1;
  } else {
    return gRWSBuffer2;
  }
}

void main() {
  RWStructuredBuffer<S> lRWSBuffer = foo(i);
  lRWSBuffer[i] = gSBuffer[i];
}

The compiler is not able to legalize this example because it does not know which value will be returned. However, if the compiler is able to determine which path will be taken, then it can be legalized.

// 20-multi-func-ret-const-ok.hlsl
// http://shader-playground.timjones.io/84b093c7cf9e3932c5f0d9691533bafe

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer1;
RWStructuredBuffer<S> gRWSBuffer2;

StructuredBuffer<S> foo(int l) {
  if (l == 0) {
    return gSBuffer1;
  } else {
    return gSBuffer2;
  }
}

void main() {
  gRWSBuffer1[i] = foo(0)[i];
  gRWSBuffer2[i] = foo(1)[i];
}

For each call to foo, the compiler is able to determine which value will be returned. In this case, the code can be legalized.

Combining elements

Individually, these examples are simple; however, these elements can be combined in arbitrary ways. As one last example, consider this HLSL source code.

// 21-combined-ok.hlsl
// http://shader-playground.timjones.io/9f00d2d359da0731cdf8d0b68520e2c4

struct S {
  float4 f;
};

int i;

StructuredBuffer<S> gSBuffer1;
StructuredBuffer<S> gSBuffer2;
RWStructuredBuffer<S> gRWSBuffer1;
RWStructuredBuffer<S> gRWSBuffer2;

#define constant 0

StructuredBuffer<S> bar() {
  if (constant > 2) {
    return gSBuffer1;
  } else {
    return gSBuffer2;
  }
}

void foo(RWStructuredBuffer<S> pRWSBuffer) {
  StructuredBuffer<S> lSBuffer = bar();
  pRWSBuffer[i] = lSBuffer[i];
}

void main() {
  foo(gRWSBuffer1);
  foo(gRWSBuffer2);
}

The compiler will do all of the transformations that mentioned earlier to identify a single resource for each load and store from a resource.

Conclusion

It is impossible to enumerate all of the possible code sequences that work or do not work, but hopefully this will give a guide as to what is possible or not. The general rule of thumb is that there must be a straightforward way to transform the code so that there are no copies of global resources.