-
Notifications
You must be signed in to change notification settings - Fork 986
Description
DXVK really likes to generate this pattern:
layout(location=0) in vec3 a1;
layout(location=1) in vec2 a2;
vec4 shader_in[32];
void vs_main() {
// Do stuff
}
void main() {
shader_in[0].xyz = a1;
shader_in[0].xy = a2;
vs_main();
}
This works, but unfortunately it's kind-of a pain for the compiler to chew through. It's not so bad for vertex shaders but for tessellation shaders it's especially bad. I'm seeing this pattern several places:
layout(location=0) in vec3[3] a1;
layout(location=1) in vec2[3] a2;
vec4 shader_in[3][32];
void tcs_main() {
vec3 a = shader_in[gl_InvocationId][0].xyz;
vec3 b = shader_in[gl_InvocationId][1].xy;
// Do stuff
}
void main() {
shader_in[0][0].xyz = a1[0];
shader_in[1][0].xyz = a1[1];
shader_in[2][0].xyz = a1[2];
shader_in[0][0].xy = a2[0];
shader_in[1][0].xy = a2[1];
shader_in[2][0].xy = a2[2];
tcs_main();
}
Right now, our compiler is doing a fairly literal translation which is rather problematic when you have a tessellation shader with 8 inputs each of which has 9 vertices; that's 4.5 KB of input data that get loaded and then stuffed into a temporary array. That array then gets spilled out to scratch space because it's 4.5 KB and the shader both slow and a mess to read/debug. This happens on even really simple shaders that just copy their inputs into the outputs.
What we'd like to have is vec3 a = a1[gl_InvocationId]. Unfortunately, turning what we get (which came from something like that) back into something sensible requires the compiler to figure out quite a bit of information:
- Only the x, y, and z components of shader_in[*][0] are ever read
- shader_in[*][0] is basically a vec3
- Because it's basically a vec3, the write-masks don't matter
- Since the write-masks don't matter, the assignment to shader_in[*][0] of a1 copies the entire array
- Since the assignment is a copy, we can treat a read from shader_in[x][0] as a read of a1[x]
That train of thought is easy for me to write down and you to read but making our compiler figure it all out is turning out to be rather painful. :-/ It's especially frustrating because DXVK clearly has enough information to declare the original inputs with their proper size. Is there some way DXVK could generate a bit nicer code? I know DX lets you do some crazy indirecting on inputs but maybe we can only make the copy if it's really needed?