/
GLSL_NV_compute_shader_derivatives.txt
495 lines (360 loc) · 20.8 KB
/
GLSL_NV_compute_shader_derivatives.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
Name
NV_compute_shader_derivatives
Name Strings
GL_NV_compute_shader_derivatives
Contact
Pat Brown, NVIDIA (pbrown 'at' nvidia.com)
Contributors
Ashwin Lele, NVIDIA
Jeff Bolz, NVIDIA
Michael Chock, NVIDIA
Status
Shipping
Version
Last Modified: September 4, 2019
Revision: 5
Dependencies
This extension can be applied to OpenGL GLSL versions 4.50
(#version 450) and higher.
This extension can be applied to OpenGL ES ESSL versions 3.20
(#version 320) and higher.
This extension is written against the OpenGL Shading Language
Specification, version 4.60, dated July 23, 2017.
This extension interacts with ARB_sparse_texture2.
This extension interacts with ARB_sparse_texture_clamp.
This extension interacts with EXT_sparse_texture2.
This extension interacts with KHR_shader_subgroup.
This extension interacts with ARB_compute_variable_group_size.
Overview
In unextended OpenGL, the GLSL shading language supports derivatives in
fragment shaders, which are computed on shader inputs or on other computed
values using fragments' (x,y) coordinates as the domain. Derivatives are
not supported in any other shader stage. However, compute shaders also
arrange shader invocations in a three-dimensional array with (x,y,z)
coordinates. This extension provides a mechanism where compute shaders
can similarly evaluate derivatives using the x and y coordinates of the
local invocation ID across a set of invocations with identical z
coordinates.
This extension, when enabled, allows applications to use derivatives in
compute shaders. It adds compute shader support for built-in derivative
functions like dFdx(), automatic level of detail computation in
texture lookup functions like texture(), use of the optional LOD bias
parameter to adjust the computed level of detail values in texture lookup
functions, and the texture level of detail query function
textureQueryLod().
The extension provides new layout qualifiers that support two different
arrangements of compute shader invocations for the purpose of derivative
computation. When specifying
layout(derivative_group_quadsNV) in;
compute shader invocations are grouped into 2x2x1 arrays whose four local
invocation ID values follow the pattern:
+-----------------+------------------+
| (2x+0, 2y+0, z) | (2x+1, 2y+0, z) |
+-----------------+------------------+
| (2x+0, 2y+1, z) | (2x+1, 2y+1, z) |
+-----------------+------------------+
where Y increases from bottom to top. When specifying
layout(derivative_group_linearNV) in;
compute shader invocations are grouped into 2x2x1 arrays whose four local
invocation index values follow the pattern:
+------+------+
| 4n+0 | 4n+1 |
+------+------+
| 4n+2 | 4n+3 |
+------+------+
If neither layout qualifier is specified, derivatives in compute shaders
return zero, which is consistent with the handling of built-in texture
functions like texture() in GLSL 4.50 compute shaders.
Unlike fragment shaders, compute shaders never have any "helper"
invocations that are only used for derivatives. As a result, the local
work group width and height must be a multiple of two when using the
"quads" layout, and the total number of invocations in a local work group
must be a multiple of four when using the "linear" layout.
Mapping to SPIR-V
-----------------
For informational purposes (non-normative), the following is an
expected way for an implementation to map GLSL constructs to SPIR-V
constructs:
derivative_group_quadsNV layout qualifier -> DerivativeGroupQuadsNV Execution Mode
derivative_group_linearNV layout qualifier -> DerivativeGroupLinearNV Execution Mode
Modifications to the OpenGL Shading Language Specification, Version 4.50
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_NV_compute_shader_derivatives : <behavior>
where <behavior> is as specified in section 3.3.
A new preprocessor #define is added to the OpenGL Shading Language:
#define GL_NV_compute_shader_derivatives 1
Modify Section 4.4, Layout Qualifiers, p. 56
(add to supported layout qualifier table, pp. 57-58)
Qualifier Individual Block Allowed
Layout Qualifier Only Variable Block Member Interfaces
------------------------- --------- ---------- ----- ------ ----------
derivative_group_quadsNV X - - - compute in
derivative_group_linearNV X - - - compute in
Modify Section 4.4.1.4, Compute Shader Inputs, p. 66
(add support for new layout qualifiers, p. 66)
layout-qualifier-id:
...
derivative_group_quadsNV
derivative_group_linearNV
...
(insert at the end of the section)
The "derivative_group_quadsNV" and "derivative_group_linearNV" qualifiers
are used to specify how compute shader invocations are grouped for the
purposes of evaluating derivatives for derivative functions and automatic
texture level of detail computation. It is a compile-time error if both
qualifiers are specified. If neither qualifier is specified, derivatives
evaluated for compute shaders will return zero.
The layout qualifier "derivative_group_quadsNV" specifies that derivatives
in compute shaders are evaluated over 2x2 groups of invocations whose
gl_LocalInvocationID values are of the form (2x+{0,1}, 2y+{0,1}, z). It is
a compile- or link-time error if this qualifier is used with a local group
size whose first or second dimension is not a multiple of two.
The layout qualifier "derivative_group_linearNV" specifies that
derivatives in compute shaders are evaluated over groups of four
invocations with consecutive gl_LocalInvocationIndex values of the form
4x+{0,1,2,3}. It is a compile- or link-time error if this qualifier is
used with a local group size whose total number of invocations is not a
multiple of four.
Modify Section 8.9, Texture Functions, p. 162
(modify first paragraph of the section, p. 162, to document that automatic
level of detail calculations are now supported in compute shaders and to
refer to the derivatives section for more information on how derivatives
are evaluated)
Texture lookup functions are available in all shading stages. However,
automatic level of detail computations are performed only for fragment and
compute shaders, using derivative computations described in Section
8.13.1. Other shaders...
(modify first paragraph, p. 163, to allow LOD bias in compute shaders)
In all functions below, the <bias> parameter is optional for fragment and
compute shaders. ... For a fragment or compute shader, if <bias> is
present, it is added...
(modify third paragraph, p. 163 - Use "automatic level of detail
calculation" as in prior language instead of "implicit derivatives".
Additionally, drop the "undefined in non-fragment" langauge, since we
already defined that case to use zero for the level of detail.)
Some texture functions (non-"Lod" and non-"Grad" versions) may require
automatic level of detail calculations. The computed level of detail is
undefined within non-uniform control flow.
Modify Section 8.9.1, Texture Query Functions, p. 163
(modify second paragraph of section, p. 163, to allow LOD queries in
compute shaders)
The textureQueryLod functions are available only in fragment and compute
shaders. ...
Modify Section 8.13, Fragment Processing Functions, p. 184
(If this is added to a full specification, sub-section 8.13.1, "Derivative
Functions" should be lifted into its own section, since they are also
available in compute shaders. For the purposes of this extension, we
simply modify the introductory text in this section.)
Other than derivative functions, fragment processing functions are only
available in fragment shaders. Derivative functions are available in both
fragment and compute shaders.
Modify Section 8.13.1, Derivative Functions, p. 184
(add introductory text to the beginning of the section)
Derivative functions are available in fragment and compute shaders, and
are used to evaluate derivatives of values computed in these shaders over
a two-dimensional domain. For fragment shaders, derivatives are taken
relative to the (x,y) coordinates of the associated fragments. For
compute shaders, derivatives are evaluated over groups of four invocations
arranged in a 2x2 grid.
(add discussion of how derivatives are evaluated in compute shaders,
before the next-to-last paragraph, "In some implementations", p. 185)
Unlike fragment shader invocations, compute shader invocations don't have
associated screen-space (x,y) coordinates to use as a domain for
derivatives. Derivatives in compute shaders will be evaluated using
differencing over groups of four shader invocations. When the
"derivative_group_quadsNV" layout qualifier is specified in a compute
shader, derivatives will be evaluated over groups of four invocations
whose gl_LocalInvocationID values are of the form (2x+m, 2y+n, z), where
<x>, <y>, and <z> are integers, and <m> and <n> must be zero or one. When
the "derivative_group_linearNV" layout qualifier is specified, derivatives
will be evaluated over groups of four invocations whose
gl_LocalInvocationIndex values are of the form 4x+2n+m, where <x> is an
integer and <m> and <n> must be zero or one. If neither
"derivative_group_quadsNV" nor "derivative_group_linearNV" is specified,
derivatives in a compute shader return zero.
Invocations in each group of four compute shader invocations are assigned
(x,y) coordinates of (m,n), where values of <m> and <n> are zero or one as
described above. If F(m,n) specifies the value of F for the invocation at
(m,n), fine derivatives with respect to <x> for an invocation at (m,n) are
evaluated as
F(1,n) - F(0,n)
while fine derivatives with respect to <y> are evaluated as
F(m,1) - F(m,0).
As in fragment shaders, coarse derivatives in compute shaders may evaluate
a single value for each group of four invocations.
Dependencies on ARB_sparse_texture2
If ARB_sparse_texture2 is supported, the texture lookup functions provided
by that extension will support automatic level of detail computation in
compute shaders, as defined in this extension. Supported functions
include:
sparseTextureARB (with or without LOD bias)
sparseTextureOffsetARB (with or without LOD bias)
The following functions using derivatives provided by the shader were
already supported in compute shaders prior to this extension:
sparseTextureGradARB
sparseTextureGradOffsetARB
Dependencies on ARB_sparse_texture_clamp
If ARB_sparse_texture_clamp is supported, the texture lookup functions
provided by that extension will support automatic level of detail
computation in compute shaders, as defined in this extension. Supported
functions include:
sparseTextureClampARB (with or without LOD bias)
textureClampARB (with or without LOD bias)
sparseTextureOffsetClampARB (with or without LOD bias)
textureOffsetClampARB (with or without LOD bias)
The following functions using derivatives provided by the shader were
already supported in compute shaders prior to this extension:
sparseTextureGradClampARB
textureGradClampARB
sparseTextureGradOffsetClampARB
textureGradOffsetClampARB
Dependencies on EXT_sparse_texture2
If EXT_sparse_texture2 is supported, the texture lookup functions provided
by that extension will support automatic level of detail computation in
compute shaders, as defined in this extension. Supported functions
include:
sparseTextureEXT (with or without LOD bias)
sparseTextureClampEXT (with or without LOD bias)
textureClampEXT (with or without LOD bias)
sparseTextureOffsetEXT (with or without LOD bias)
sparseTextureOffsetClampEXT (with or without LOD bias)
textureOffsetClampEXT (with or without LOD bias)
The following functions using derivatives provided by the shader were
already supported in compute shaders prior to this extension:
sparseTextureGradEXT
sparseTextureGradClampEXT
textureGradClampEXT
sparseTextureGradOffsetEXT
sparseTextureGradOffsetClampEXT
textureGradOffsetClampEXT
Dependencies on KHR_shader_subgroup
If the KHR_shader_subgroup_quads feature of KHR_shader_subgroup is
supported, add the following language describing the mapping of
invocations to subgroup "quads" in compuote shaders. This language should
be added immediately after similar language for fragment shaders that
begins with "In fragment shaders, this quad corresponds to 4 pixels
arranged in a 2x2 grid".
In compute shaders using the "derivative_group_quadsNV" or
"derivative_group_linearNV" layout qualifier, this quad corresponds to
4 invocations arranged in a 2x2 grid:
0 | 1
--|--
2 | 3
When using "derivative_group_quadsNV", the four invocations are arranged
such that:
- 0th index has a local invocation ID of the form (2x + 0, 2y + 0, z)
- 1st index has a local invocation ID of the form (2x + 1, 2y + 0, z)
- 2nd index has a local invocation ID of the form (2x + 0, 2y + 1, z)
- 3rd index has a local invocation ID of the form (2x + 1, 2y + 1, z)
for some integer values <x>, <y>, and <z>. When using
"derivative_group_linearNV", the four invocations are arranged such
that:
- 0th index has a local invocation index of the form 4n + 0
- 1st index has a local invocation index of the form 4n + 1
- 2nd index has a local invocation index of the form 4n + 2
- 3rd index has a local invocation index of the form 4n + 3
for some integer value <n>.
Dependencies on ARB_compute_variable_group_size
If ARB_compute_variable_group_size is supported, modify the language
specifying a compile- or link-time error for "bad" workgroup sizes to not
apply if the "local_size_variable" layout qualifier is specified.
The layout qualifier "derivative_group_quadsNV" ... It is a compile- or
link-time error if this qualifier is used in a compute shader that does
not use the "local_size_variable" layout qualifier and whose local group
size has a first or second dimension that is not a multiple of two.
The layout qualifier "derivative_group_linearNV" ... It is a compile- or
link-time error if this qualifier is used in a compute shader that does
not use the "local_size_variable" layout qualifier and whose local group
size has a total number of invocations that is not a multiple of four.
Issues
(1) Should we provide support for "helper" invocations like we have for
fragment shaders?
RESOLVED: No, we are not providing any support for automatic "helper"
compute shader invocations that would be spawned solely for the purpose
of computing derivatives. When using this feature, the local work group
must be fully decomposable into 2x2 "quads" to ensure well-defined
derivatives. Using odd local work group sizes in compute shaders
requiring derivatives is not allowed and will result in compile-time
errors. If the natural size of a work group is not a multiple of the
2x2 "quad" size, applications can introduce their own "helper"
invocations by padding the workgroup size to a multiple of two and
manually ensuring that the "extra" invocations don't have unwanted side
effects (e.g., undesired writes to memory).
(2) What mechanisms should we provide to control the packing of compute
shader invocations into "quads" for differencing purposes?
RESOLVED: We provide two different options, which can be specified via
layout qualifier.
The most natural way to support derivatives is to take the
already-existing (x,y,z) coordinates in gl_LocalInvocationID and use the
(x,y) coordinates as the domain. In this model, mapping to the
"derivative_group_quadsNV" layout qualifier, we simply arrange invocations
into 2x2 "quads" with coordinates of the form (2m+{0,1}, 2n+{0,1}, z).
The "quads" approach might not work for all use cases that require
derivative support. For example, a shader might want to just use a
one-dimensional array of invocations and provide its own interpretation
of how each invocation maps to an (x,y) coordinate in the domain. To
support this model, we provide the "derivative_group_linearNV" layout
qualifier, where invocations are arranged into groups of four with
one-dimensional gl_LocalInvocationIndex values of the form 4n+{0,1,2,3}.
Shaders using this layout qualifier need to ensure that (x,y)
coordinates used to evaluate the inputs of derivative functions for each
invocation in a group are chosen so that differencing-based derivatives
make sense.
(3) Which texture built-in functions automatically compute derivatives
with this extension?
RESOLVED: In the core profile, computed derivatives are supported for
all functions that compute derivatives in fragment shaders: texture(),
textureProj(), textureOffset(), textureProjOffset(). If the
compatibility profile is supported, the old legacy built-in functions
should also compute derivatives:
texture1D(), texture1DProj(), texture2D(), texture2DProj(),
texture3D(), texture3DProj(), textureCube(), shadow1D(), shadow2D(),
shadow1DProj(), and shadow2DProj().
This extension also produces defined derivatives in texture lookup
functions from ARB_sparse_texture2, ARB_sparse_texture_clamp, and
EXT_sparse_texture2.
(4) How are derivatives handled when no "derivative_group" layout
qualifier is specified? How are derivatives handled in divergent flow
control?
RESOLVED: When no layout qualifier is specified, derivatives evaluated
in compute shaders for derivative functions and automatic level of
detail computations will return zero. While derivative functions (e.g.,
dFdx) were not supported in compute shaders prior to this extension, the
same texture lookup functions that would use derivatives for automatic
level of detail computation when called in fragment shaders were
supported in compute shaders. In those functions, the base level of
detail (zero) is supposed to be used, which is consistent with the
derivatives of zero we specify here.
In divergent flow control, not all of the four invocations in a given
derivative group may be active. The values used for derivatives may not
have been computed for inactive threads, so differencing will produce
incorrect results. While this situation produces undefined results for
fragment shaders in the core GLSL specification, NVIDIA's hardware
implementation will recognize divergence and produce derivatives of
zero.
(5) How does this extension interact with ARB_compute_variable_group_size?
RESOLVED: When derivatives are used with a compute shader with a variable
local group size, it is not possible to enforce a compile- or link-time
error that complains about "bad" local group sizes. For OpenGL, we will
instead enforce this restriction at dispatch time by throwing an
INVALID_VALUE error. As of September 2019, variable local group sizes are
not supported in Vulkan or SPIR-V.
Revision History
Revision 5, 2019/09/04 (pbrown)
- Add interactions with ARB_compute_variable_group_size, where a compile-
or link-time error for bad local group sizes is not possible. Also, fix
the language to specify that such errors can be either compile-time or
link-time -- it's not possible to throw a compile-time error if the
derivative mode is specified in one compilation unit and the local group
size is specified in another.
Revision 4, 2018/11/09 (mchock)
- Add interactions with EXT_sparse_texture2
Revision 3, 2018/09/13 (pbrown)
- Add interactions with KHR_shader_subgroup (KHR_shader_subgroup_quads)
to explicitly document how invocations are arranged into subgroup quads.
We use the same grouping as for derivatives.
Revision 2, 2018/09/12 (pbrown)
- Prepare spec for publication.
Revision 1
- Internal revisions.