Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
GLSL/extensions/nv/GLSL_NV_shading_rate_image.txt
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
357 lines (272 sloc)
16.9 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Name | |
NV_shading_rate_image | |
Name Strings | |
GL_NV_shading_rate_image | |
Contact | |
Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) | |
Contributors | |
Daniel Koch, NVIDIA | |
Mark Kilgard, NVIDIA | |
Status | |
Shipping | |
Version | |
Last Modified: September 11, 2018 | |
Revision: 2 | |
Dependencies | |
This extension can be applied to OpenGL GLSL versions 4.50 | |
(#version 450) and higher. | |
This extension can be applied to OpenGL ES ESSL versions 3.20 | |
(#version 320) and higher. | |
This extension is written against the OpenGL Shading Language | |
Specification, version 4.60, dated July 23, 2017. | |
This extension interacts with ARB_fragment_shader_interlock and | |
NV_fragment_shader_interlock. | |
Overview | |
This extension provides OpenGL Shading Language (GLSL) support for the API | |
extension "NV_shading_rate_image". In that extension, applications can use | |
a texture to control the number of fragment shader invocations that will | |
be spawned for a particular neighborhood of covered pixels. We refer to | |
the density of fragment shader invocations as the "shading rate". Our | |
implementation supports shading rates that run one invocation for multiple | |
pixels as well as rates that run multiple invocations for a single pixel. | |
The texture used to control the shading rate is referred to as a "shading | |
rate image", where each texel specifies the shading rate for a fixed-size | |
region of the framebuffer. | |
This extension provides GLSL built-in variables that allow the fragment | |
shader to determine the shading rate used for a particular invocation. | |
When using a shading rate where a single invocation covers multiple | |
pixels, the built-in gl_FragmentSizeNV indicates the size of the rectangle | |
of pixels used for that invocation. When using a shading rate where | |
multiple invocations can be spawned for each pixel, the built-in | |
gl_InvocationsPerPixel indicates the number of invocations that will be | |
spawned for a fully covered pixel. | |
Additionally, if the ARB_fragment_shader_interlock extension is supported, | |
this extension adds support for new input layout qualifiers that can | |
ensure mutual exclusion across all pixels covered by a fragment. | |
Mapping to SPIR-V | |
----------------- | |
For informational purposes (non-normative), the following is an | |
expected way for an implementation to map GLSL constructs to SPIR-V | |
constructs supported by the SPV_NV_shading_rate extension: | |
- gl_FragmentSizeNV -> FragmentSizeNV decorated variable | |
- gl_InvocationsPerPixelNV -> InvocationsPerPixelNV decorated variable | |
- shading_rate_interlock_ordered -> (not supported - no ARB_fragment_shader_interlock in current SPIR-V) | |
- shading_rate_interlock_unordered -> (not supported - no ARB_fragment_shader_interlock in current SPIR-V) | |
Modifications to the OpenGL Shading Language Specification, Version 4.60 | |
Including the following line in a shader can be used to control the | |
language features described in this extension: | |
#extension GL_NV_shading_rate_image : <behavior> | |
where <behavior> is as specified in section 3.3. | |
New preprocessor #defines are added to the OpenGL Shading Language: | |
#define GL_NV_shading_rate_image 1 | |
Modify Section 4.4.1.3, Fragment Shader Inputs (p. 65) | |
(add to the list of layout qualifiers containing "early_fragment_tests", | |
p. 66, as modified by ARB_fragment_shader_interlock, and modify the | |
surrounding language to reflect that multiple layout qualifiers are | |
supported for "in") | |
layout-qualifier-id | |
... | |
shading_rate_interlock_ordered | |
shading_rate_interlock_unordered | |
(modify the language added to the end of the section by | |
ARB_fragment_shader_interlock, p. 66, to reflect the new layout | |
qualifiers) | |
The identifiers "pixel_interlock_ordered", "pixel_interlock_unordered", | |
"sample_interlock_ordered", "sample_interlock_unordered", | |
"shading_rate_interlock_ordered", and "shading_rate_interlock_unordered" | |
control the ordering of the execution of shader invocations between calls | |
to the built-in functions beginInvocationInterlockARB() and | |
endInvocationInterlockARB(), as described in section 8.13.3. A compile or | |
link error will be generated if more than one of these layout qualifiers | |
is specified in shader code. If a program containing a fragment shader | |
includes none of these layout qualifiers, it is as though | |
"pixel_interlock_ordered" were specified. | |
Modify Section 7.1, Built-In Language Variables, p. 122 | |
(add to the list of fragment language variables, middle of p. 124) | |
in ivec2 gl_FragmentSizeNV; | |
in int gl_InvocationsPerPixelNV; | |
(add documentation of the new built-in variables, before | |
gl_HelperInvocation discussion at the bottom of p.128) | |
The input variable gl_FragmentSizeNV represents the size of a rectangle of | |
pixels corresponding to this fragment shader invocation. The first | |
component is the width of the rectangle (in pixels); the second component | |
is the height (in pixels). When a shading rate image is not used, or | |
when running multiple fragment shader invocations per pixel | |
(multisampling), both components will be one. When using a shading rate | |
image, either or both components may be greater than one. When the | |
fragment size is greater than a single pixel, the outputs of the shader | |
invocation will be broadcast to all covered pixels/samples in the | |
rectangle. | |
The input variable gl_InvocationsPerPixelNV represents the maximum number | |
of fragment shader invocations executed for each pixel, as derived from | |
the effective shading rate for the fragment. If a primitive does not | |
fully cover a pixel, the number of fragment shader invocations for its | |
covered pixels may be less than the value of gl_InvocationsPerPixelNV. | |
When a shading rate image is not used, this value is a function of the | |
framebuffer sample counts and the sample shading fraction programmed in | |
the OpenGL API via MinSampleShading. When using the shading rate image, | |
this value will also be affected by the contents of the shading rate image | |
and may depend on the location of the pixel in the framebuffer. If | |
multisampling is disabled, or if the fragment shader invocation covers | |
multiple pixels, the value of this input will be one. | |
Modify Section 8.13.1, Derivative Functions, p. 184 | |
(add a new paragraph before the last paragraph "It is typical to consider | |
a 2x2 square", p. 184) | |
When using a shading rate where each fragment covers multiple columns | |
and/or rows of pixels, the values of dx and/or dy in equations 1b and 2b | |
above will be greater than 1.0. However, we recommend that | |
implementations approximate derivatives in this case using dx = dy = 1.0. | |
Modify Section 8.13.3, Fragment Shader Execution Ordering Functions, as | |
added by ARB_fragment_shader_interlock | |
(rework the paragraph in ARB_fragment_shader_interlock describing the | |
difference in ordering guarantees between "pixel" and "sample" qualifiers | |
to cover the new "shading rate" qualifiers as well) | |
The paired functions beginInvocationInterlockARB() and | |
endInvocationInterlockARB() allow shaders to specify a critical section, | |
inside which stronger execution ordering is guaranteed. When ordering | |
guarantees apply, fragment shader invocations X and Y corresponding to | |
fragments A and B are guaranteed to not execute concurrently when the | |
coverage of A and B is considered to overlap. No ordering guarantees are | |
provided between non-overlapping fragments. | |
When using the "sample_interlock_ordered" or "sample_interlock_unordered" | |
qualifier, mutual exclusion is guaranteed for fragment shader invocations | |
X and Y if and only if at least one sample is covered by both fragments. | |
No mutual exclusion is guaranteed when fragments A and B both cover the | |
same pixel but have no covered fragments in common, as is often the case | |
for pixels containing an edge separating two triangles. | |
When using the "pixel_interlock_ordered" or "pixel_interlock_unordered" | |
qualifier, mutual exclusion is guaranteed for fragment shader invocations | |
X and Y if and only if fragments A and B both cover at least one sample | |
from the same pixel. Mutual exclusion is guaranteed in this case even if | |
no sample in the pixel is covered by both fragments. When using "pixel" | |
interlock modes, no ordering guarantees are provided for pairs of fragment | |
shader invocations corresponding to a single fragment. This can occur | |
when using "sample" auxiliary storage qualifier, OpenGL API commands | |
forcing multiple shader invocations per fragment, a shading rate with | |
greater than one invocation per pixel, or for other | |
implementation-dependent reasons. | |
When using the "shading_rate_interlock_ordered" or | |
"shading_rate_interlock_unordered" qualifier, mutual exclusion is | |
guaranteed for fragment shader invocations X and Y if and only if | |
fragments A and B have one or more associated samples in common. Mutual | |
exclusion is guaranteed in this case even if none of the common samples | |
are covered by both fragments. When the shading rate specifies one or | |
multiple fragment shader invocations per pixel, each invocation is | |
considered to be associated with all the samples of the single pixel | |
belonging to the fragment. When the shading rate specifies that each | |
fragment shader invocation corresponds to multiple pixels, each invocation | |
is considered to be associated with all samples of all pixels belonging to | |
the fragment. | |
(update the discussion of "unordered" and "ordered" to discuss the new | |
qualifiers) | |
When using the "pixel_interlock_unordered", "sample_interlock_unordered", | |
or "shading_rate_interlock_unordered" qualifier, the interlock will ensure | |
that the critical sections of fragment shader invocations X and Y with | |
overlapping coverage will never execute concurrently. ... | |
When using the "pixel_interlock_ordered", "sample_interlock_ordered", or | |
"shading_rate_interlock_ordered" layout qualifier, the critical sections of | |
invocations X and Y with overlapping coverage will be executed in a | |
specific order, based on the relative order assigned to their fragments A | |
and B. ... | |
Dependencies on ARB_fragment_shader_interlock and NV_fragment_shader_interlock | |
If neither ARB_fragment_shader_interlock nor NV_fragment_shader_interlock | |
are supported, remove references to the "shading_rate_interlock_ordered" | |
and "shading_rate_interlock_unordered" layout qualifiers, and all of the | |
edits to Section 8.13.3 of the GLSL Specification. | |
Issues | |
(1) How should we name this extension? | |
RESOLVED: We are calling this extension "NV_shading_rate_image", based | |
on the name we chose for an API extension supporting this feature. We | |
use the term "shading rate" to indicate the variable number of fragment | |
shader invocations that will be spawned for a particular neighborhood of | |
covered pixels. The API extension can support shading rates running one | |
invocation for multiple pixels and/or multiple invocations for a single | |
pixel. We use "image" in the extension name because the API feature | |
allows applications to control the shading rate using an image, where | |
each pixel specifies a shading rate for a portion of the framebuffer. | |
This particular specification covers only the OpenGL Shading Language | |
(GLSL) portion of the feature, where there is no image involved. So it | |
could also be sensible to use an alternate extension name without | |
"image". When doing the SPIR-V extension covering this functionality, | |
we chose to drop "image" and use simply "NV_shading_rate". But we are | |
currently retaining "NV_shading_rate_image" for the GLSL extension in | |
order to match the API extension name and to avoid name changes to an | |
otherwise complete extension. | |
(2) How do derivatives work for dFdx and texture LOD calculations when a | |
single fragment shader invocation covers multiple pixels? | |
RESOLVED: In the NVIDIA implementation of this extension, derivatives | |
will be computed by differencing, where the "neighboring" fragment | |
shader invocation also covers the same number of pixels. Differencing | |
can be used to approximate derivatives at a given (x,y) by evaluating | |
equations like the following: | |
df/dx(x,y) = (f(x2,y) - f(x1,y)) / (x2 - x1) | |
where <x1> and <x2> are X coordinates of for the two fragment shader | |
invocations used for differencing. <x> is typically either <x1> or | |
<x2>. For normal fragment shader invocations, where each fragment | |
typically covers a pixel, we assume that x2-x1 is always 1.0 and reduce | |
the approximation to: | |
df/dx(x,y) = f(x2,y) - f(x1,y) | |
We end up using this equation in this extension, even when fragments | |
cover multiple pixels and x2-x1 is greater than 1.0. If a given | |
fragment shader invocation covers a 2x2 region, this approach means that | |
dFdx and dFdy will return values approximately twice as large as | |
the real derivatives. | |
We could adjust computed derivatives to compensate, but choose not to | |
because using raw differences without adjustment is preferable for | |
texture LOD handling. Derivatives in LOD calculations are used to | |
approximate the set of texels covered by the pixel being processed. If | |
that pixel covers many texels in a full-resolution image, we use a | |
lower-resolution mipmap level to reduce noise. When a fragment shader | |
invocation covers multiple pixels, its footprint in texture space is | |
larger than it would be if it covered a single pixel. In this case, our | |
"mathematically too large" derivatives provide a better approximation of | |
the set of texels involved in the lookup. | |
If a shader using this feature does need mathematically accurate | |
derivatives, it can adjust by dividing through by the X and Y | |
components of gl_FragmentSizeNV. | |
(3) What sort of support (if any) should we provide for fragment shader | |
interlocks when using the shading rate image? | |
RESOLVED: We provide new layout qualifiers: | |
shading_rate_interlock_ordered | |
shading_rate_interlock_unordered | |
that extend the semantics of the "pixel" interlock to ensure mutual | |
exclusion over a set of samples whose size depends on the shading rate. | |
When the shading rate specifies one or multiple fragment shader | |
invocations per pixel, these interlock modes are equivalent to the | |
"pixel" modes. When the shading rate specifies one invocation | |
corresponding to multiple pixels, mutual exclusion is guaranteed across | |
the full set of pixels corresponding to the fragment shader invocations. | |
One example where this feature is useful is to support the "pixel" | |
interlock when emulating multisample render targets with more samples | |
than the underlying GPU supports. For example, an application can | |
emulate 16x multisample rendering by using a 4x multisample texture that | |
is twice as wide and tall (in pixels) as a nominal "16x" texture. In | |
this mode, the "pixel" interlock modes would only ensure mutual | |
exclusion for groups of four samples, allowing for up to four concurrent | |
invocations for a given "16x" pixel. When using this extension, a "2x2" | |
shading rate can be selected, which will ensure a single fragment shader | |
invocation for each group of 16 samples. When using the "2x2" shading | |
rate, the "shading_rate" interlock modes ensure mutual exclusion across | |
an entire group of 16 samples. | |
Note that if an application is using a variable shading rate, it must | |
account for the variable fragment size when setting up the shared data | |
structures protected by the interlock. For example, if some portions of | |
a scene are rendered with fragments representing a 2x2-pixel region, an | |
application can use a single data structure for each 2x2 region. | |
However, if other portions of the scene use 1x1-pixel fragments, those | |
areas can't safely share a single data structure for a 2x2 block of | |
pixels, since "shading_rate" interlock will not prevent multiple 1x1 | |
fragments from different quadrants of a 2x2 block of pixels. | |
(4) How should we name the built-in variable describing the number of | |
pixels covered by a given fragment? | |
UNRESOLVED: We are using "gl_FragmentSizeNV" and specifying it as a | |
two-component vector. However, it might have been more consistent with | |
existing naming conventions to have used "gl_FragSizeNV" instead. There | |
are a number of GLSL built-ins that use "Frag" as a prefix | |
(gl_FragCoord, gl_FragDepth, gl_FragColor, gl_FragData) and none that | |
use "Fragment". | |
Revision History | |
Revision 2, 2018/09/11 (pbrown) | |
- Minor edits preparing the spec for publication. | |
Revision 1 | |
- Internal revisions. |