Skip to content

Shader input copies are mean to compilers #514

@gfxstrand

Description

@gfxstrand

DXVK really likes to generate this pattern:

layout(location=0) in vec3 a1;
layout(location=1) in vec2 a2;
vec4 shader_in[32];
void vs_main() {
    // Do stuff
}

void main() {
    shader_in[0].xyz = a1;
    shader_in[0].xy = a2;
    vs_main();
}

This works, but unfortunately it's kind-of a pain for the compiler to chew through. It's not so bad for vertex shaders but for tessellation shaders it's especially bad. I'm seeing this pattern several places:

layout(location=0) in vec3[3] a1;
layout(location=1) in vec2[3] a2;
vec4 shader_in[3][32];
void tcs_main() {
    vec3 a = shader_in[gl_InvocationId][0].xyz;
    vec3 b = shader_in[gl_InvocationId][1].xy;
    // Do stuff
}

void main() {
    shader_in[0][0].xyz = a1[0];
    shader_in[1][0].xyz = a1[1];
    shader_in[2][0].xyz = a1[2];
    shader_in[0][0].xy = a2[0];
    shader_in[1][0].xy = a2[1];
    shader_in[2][0].xy = a2[2];
    tcs_main();
}

Right now, our compiler is doing a fairly literal translation which is rather problematic when you have a tessellation shader with 8 inputs each of which has 9 vertices; that's 4.5 KB of input data that get loaded and then stuffed into a temporary array. That array then gets spilled out to scratch space because it's 4.5 KB and the shader both slow and a mess to read/debug. This happens on even really simple shaders that just copy their inputs into the outputs.

What we'd like to have is vec3 a = a1[gl_InvocationId]. Unfortunately, turning what we get (which came from something like that) back into something sensible requires the compiler to figure out quite a bit of information:

  1. Only the x, y, and z components of shader_in[*][0] are ever read
  2. shader_in[*][0] is basically a vec3
  3. Because it's basically a vec3, the write-masks don't matter
  4. Since the write-masks don't matter, the assignment to shader_in[*][0] of a1 copies the entire array
  5. Since the assignment is a copy, we can treat a read from shader_in[x][0] as a read of a1[x]

That train of thought is easy for me to write down and you to read but making our compiler figure it all out is turning out to be rather painful. :-/ It's especially frustrating because DXVK clearly has enough information to declare the original inputs with their proper size. Is there some way DXVK could generate a bit nicer code? I know DX lets you do some crazy indirecting on inputs but maybe we can only make the copy if it's really needed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions