Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
358 lines (272 sloc) 16.9 KB
Name
NV_shading_rate_image
Name Strings
GL_NV_shading_rate_image
Contact
Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
Contributors
Daniel Koch, NVIDIA
Mark Kilgard, NVIDIA
Status
Shipping
Version
Last Modified: September 11, 2018
Revision: 2
Dependencies
This extension can be applied to OpenGL GLSL versions 4.50
(#version 450) and higher.
This extension can be applied to OpenGL ES ESSL versions 3.20
(#version 320) and higher.
This extension is written against the OpenGL Shading Language
Specification, version 4.60, dated July 23, 2017.
This extension interacts with ARB_fragment_shader_interlock and
NV_fragment_shader_interlock.
Overview
This extension provides OpenGL Shading Language (GLSL) support for the API
extension "NV_shading_rate_image". In that extension, applications can use
a texture to control the number of fragment shader invocations that will
be spawned for a particular neighborhood of covered pixels. We refer to
the density of fragment shader invocations as the "shading rate". Our
implementation supports shading rates that run one invocation for multiple
pixels as well as rates that run multiple invocations for a single pixel.
The texture used to control the shading rate is referred to as a "shading
rate image", where each texel specifies the shading rate for a fixed-size
region of the framebuffer.
This extension provides GLSL built-in variables that allow the fragment
shader to determine the shading rate used for a particular invocation.
When using a shading rate where a single invocation covers multiple
pixels, the built-in gl_FragmentSizeNV indicates the size of the rectangle
of pixels used for that invocation. When using a shading rate where
multiple invocations can be spawned for each pixel, the built-in
gl_InvocationsPerPixel indicates the number of invocations that will be
spawned for a fully covered pixel.
Additionally, if the ARB_fragment_shader_interlock extension is supported,
this extension adds support for new input layout qualifiers that can
ensure mutual exclusion across all pixels covered by a fragment.
Mapping to SPIR-V
-----------------
For informational purposes (non-normative), the following is an
expected way for an implementation to map GLSL constructs to SPIR-V
constructs supported by the SPV_NV_shading_rate extension:
- gl_FragmentSizeNV -> FragmentSizeNV decorated variable
- gl_InvocationsPerPixelNV -> InvocationsPerPixelNV decorated variable
- shading_rate_interlock_ordered -> (not supported - no ARB_fragment_shader_interlock in current SPIR-V)
- shading_rate_interlock_unordered -> (not supported - no ARB_fragment_shader_interlock in current SPIR-V)
Modifications to the OpenGL Shading Language Specification, Version 4.60
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_NV_shading_rate_image : <behavior>
where <behavior> is as specified in section 3.3.
New preprocessor #defines are added to the OpenGL Shading Language:
#define GL_NV_shading_rate_image 1
Modify Section 4.4.1.3, Fragment Shader Inputs (p. 65)
(add to the list of layout qualifiers containing "early_fragment_tests",
p. 66, as modified by ARB_fragment_shader_interlock, and modify the
surrounding language to reflect that multiple layout qualifiers are
supported for "in")
layout-qualifier-id
...
shading_rate_interlock_ordered
shading_rate_interlock_unordered
(modify the language added to the end of the section by
ARB_fragment_shader_interlock, p. 66, to reflect the new layout
qualifiers)
The identifiers "pixel_interlock_ordered", "pixel_interlock_unordered",
"sample_interlock_ordered", "sample_interlock_unordered",
"shading_rate_interlock_ordered", and "shading_rate_interlock_unordered"
control the ordering of the execution of shader invocations between calls
to the built-in functions beginInvocationInterlockARB() and
endInvocationInterlockARB(), as described in section 8.13.3. A compile or
link error will be generated if more than one of these layout qualifiers
is specified in shader code. If a program containing a fragment shader
includes none of these layout qualifiers, it is as though
"pixel_interlock_ordered" were specified.
Modify Section 7.1, Built-In Language Variables, p. 122
(add to the list of fragment language variables, middle of p. 124)
in ivec2 gl_FragmentSizeNV;
in int gl_InvocationsPerPixelNV;
(add documentation of the new built-in variables, before
gl_HelperInvocation discussion at the bottom of p.128)
The input variable gl_FragmentSizeNV represents the size of a rectangle of
pixels corresponding to this fragment shader invocation. The first
component is the width of the rectangle (in pixels); the second component
is the height (in pixels). When a shading rate image is not used, or
when running multiple fragment shader invocations per pixel
(multisampling), both components will be one. When using a shading rate
image, either or both components may be greater than one. When the
fragment size is greater than a single pixel, the outputs of the shader
invocation will be broadcast to all covered pixels/samples in the
rectangle.
The input variable gl_InvocationsPerPixelNV represents the maximum number
of fragment shader invocations executed for each pixel, as derived from
the effective shading rate for the fragment. If a primitive does not
fully cover a pixel, the number of fragment shader invocations for its
covered pixels may be less than the value of gl_InvocationsPerPixelNV.
When a shading rate image is not used, this value is a function of the
framebuffer sample counts and the sample shading fraction programmed in
the OpenGL API via MinSampleShading. When using the shading rate image,
this value will also be affected by the contents of the shading rate image
and may depend on the location of the pixel in the framebuffer. If
multisampling is disabled, or if the fragment shader invocation covers
multiple pixels, the value of this input will be one.
Modify Section 8.13.1, Derivative Functions, p. 184
(add a new paragraph before the last paragraph "It is typical to consider
a 2x2 square", p. 184)
When using a shading rate where each fragment covers multiple columns
and/or rows of pixels, the values of dx and/or dy in equations 1b and 2b
above will be greater than 1.0. However, we recommend that
implementations approximate derivatives in this case using dx = dy = 1.0.
Modify Section 8.13.3, Fragment Shader Execution Ordering Functions, as
added by ARB_fragment_shader_interlock
(rework the paragraph in ARB_fragment_shader_interlock describing the
difference in ordering guarantees between "pixel" and "sample" qualifiers
to cover the new "shading rate" qualifiers as well)
The paired functions beginInvocationInterlockARB() and
endInvocationInterlockARB() allow shaders to specify a critical section,
inside which stronger execution ordering is guaranteed. When ordering
guarantees apply, fragment shader invocations X and Y corresponding to
fragments A and B are guaranteed to not execute concurrently when the
coverage of A and B is considered to overlap. No ordering guarantees are
provided between non-overlapping fragments.
When using the "sample_interlock_ordered" or "sample_interlock_unordered"
qualifier, mutual exclusion is guaranteed for fragment shader invocations
X and Y if and only if at least one sample is covered by both fragments.
No mutual exclusion is guaranteed when fragments A and B both cover the
same pixel but have no covered fragments in common, as is often the case
for pixels containing an edge separating two triangles.
When using the "pixel_interlock_ordered" or "pixel_interlock_unordered"
qualifier, mutual exclusion is guaranteed for fragment shader invocations
X and Y if and only if fragments A and B both cover at least one sample
from the same pixel. Mutual exclusion is guaranteed in this case even if
no sample in the pixel is covered by both fragments. When using "pixel"
interlock modes, no ordering guarantees are provided for pairs of fragment
shader invocations corresponding to a single fragment. This can occur
when using "sample" auxiliary storage qualifier, OpenGL API commands
forcing multiple shader invocations per fragment, a shading rate with
greater than one invocation per pixel, or for other
implementation-dependent reasons.
When using the "shading_rate_interlock_ordered" or
"shading_rate_interlock_unordered" qualifier, mutual exclusion is
guaranteed for fragment shader invocations X and Y if and only if
fragments A and B have one or more associated samples in common. Mutual
exclusion is guaranteed in this case even if none of the common samples
are covered by both fragments. When the shading rate specifies one or
multiple fragment shader invocations per pixel, each invocation is
considered to be associated with all the samples of the single pixel
belonging to the fragment. When the shading rate specifies that each
fragment shader invocation corresponds to multiple pixels, each invocation
is considered to be associated with all samples of all pixels belonging to
the fragment.
(update the discussion of "unordered" and "ordered" to discuss the new
qualifiers)
When using the "pixel_interlock_unordered", "sample_interlock_unordered",
or "shading_rate_interlock_unordered" qualifier, the interlock will ensure
that the critical sections of fragment shader invocations X and Y with
overlapping coverage will never execute concurrently. ...
When using the "pixel_interlock_ordered", "sample_interlock_ordered", or
"shading_rate_interlock_ordered" layout qualifier, the critical sections of
invocations X and Y with overlapping coverage will be executed in a
specific order, based on the relative order assigned to their fragments A
and B. ...
Dependencies on ARB_fragment_shader_interlock and NV_fragment_shader_interlock
If neither ARB_fragment_shader_interlock nor NV_fragment_shader_interlock
are supported, remove references to the "shading_rate_interlock_ordered"
and "shading_rate_interlock_unordered" layout qualifiers, and all of the
edits to Section 8.13.3 of the GLSL Specification.
Issues
(1) How should we name this extension?
RESOLVED: We are calling this extension "NV_shading_rate_image", based
on the name we chose for an API extension supporting this feature. We
use the term "shading rate" to indicate the variable number of fragment
shader invocations that will be spawned for a particular neighborhood of
covered pixels. The API extension can support shading rates running one
invocation for multiple pixels and/or multiple invocations for a single
pixel. We use "image" in the extension name because the API feature
allows applications to control the shading rate using an image, where
each pixel specifies a shading rate for a portion of the framebuffer.
This particular specification covers only the OpenGL Shading Language
(GLSL) portion of the feature, where there is no image involved. So it
could also be sensible to use an alternate extension name without
"image". When doing the SPIR-V extension covering this functionality,
we chose to drop "image" and use simply "NV_shading_rate". But we are
currently retaining "NV_shading_rate_image" for the GLSL extension in
order to match the API extension name and to avoid name changes to an
otherwise complete extension.
(2) How do derivatives work for dFdx and texture LOD calculations when a
single fragment shader invocation covers multiple pixels?
RESOLVED: In the NVIDIA implementation of this extension, derivatives
will be computed by differencing, where the "neighboring" fragment
shader invocation also covers the same number of pixels. Differencing
can be used to approximate derivatives at a given (x,y) by evaluating
equations like the following:
df/dx(x,y) = (f(x2,y) - f(x1,y)) / (x2 - x1)
where <x1> and <x2> are X coordinates of for the two fragment shader
invocations used for differencing. <x> is typically either <x1> or
<x2>. For normal fragment shader invocations, where each fragment
typically covers a pixel, we assume that x2-x1 is always 1.0 and reduce
the approximation to:
df/dx(x,y) = f(x2,y) - f(x1,y)
We end up using this equation in this extension, even when fragments
cover multiple pixels and x2-x1 is greater than 1.0. If a given
fragment shader invocation covers a 2x2 region, this approach means that
dFdx and dFdy will return values approximately twice as large as
the real derivatives.
We could adjust computed derivatives to compensate, but choose not to
because using raw differences without adjustment is preferable for
texture LOD handling. Derivatives in LOD calculations are used to
approximate the set of texels covered by the pixel being processed. If
that pixel covers many texels in a full-resolution image, we use a
lower-resolution mipmap level to reduce noise. When a fragment shader
invocation covers multiple pixels, its footprint in texture space is
larger than it would be if it covered a single pixel. In this case, our
"mathematically too large" derivatives provide a better approximation of
the set of texels involved in the lookup.
If a shader using this feature does need mathematically accurate
derivatives, it can adjust by dividing through by the X and Y
components of gl_FragmentSizeNV.
(3) What sort of support (if any) should we provide for fragment shader
interlocks when using the shading rate image?
RESOLVED: We provide new layout qualifiers:
shading_rate_interlock_ordered
shading_rate_interlock_unordered
that extend the semantics of the "pixel" interlock to ensure mutual
exclusion over a set of samples whose size depends on the shading rate.
When the shading rate specifies one or multiple fragment shader
invocations per pixel, these interlock modes are equivalent to the
"pixel" modes. When the shading rate specifies one invocation
corresponding to multiple pixels, mutual exclusion is guaranteed across
the full set of pixels corresponding to the fragment shader invocations.
One example where this feature is useful is to support the "pixel"
interlock when emulating multisample render targets with more samples
than the underlying GPU supports. For example, an application can
emulate 16x multisample rendering by using a 4x multisample texture that
is twice as wide and tall (in pixels) as a nominal "16x" texture. In
this mode, the "pixel" interlock modes would only ensure mutual
exclusion for groups of four samples, allowing for up to four concurrent
invocations for a given "16x" pixel. When using this extension, a "2x2"
shading rate can be selected, which will ensure a single fragment shader
invocation for each group of 16 samples. When using the "2x2" shading
rate, the "shading_rate" interlock modes ensure mutual exclusion across
an entire group of 16 samples.
Note that if an application is using a variable shading rate, it must
account for the variable fragment size when setting up the shared data
structures protected by the interlock. For example, if some portions of
a scene are rendered with fragments representing a 2x2-pixel region, an
application can use a single data structure for each 2x2 region.
However, if other portions of the scene use 1x1-pixel fragments, those
areas can't safely share a single data structure for a 2x2 block of
pixels, since "shading_rate" interlock will not prevent multiple 1x1
fragments from different quadrants of a 2x2 block of pixels.
(4) How should we name the built-in variable describing the number of
pixels covered by a given fragment?
UNRESOLVED: We are using "gl_FragmentSizeNV" and specifying it as a
two-component vector. However, it might have been more consistent with
existing naming conventions to have used "gl_FragSizeNV" instead. There
are a number of GLSL built-ins that use "Frag" as a prefix
(gl_FragCoord, gl_FragDepth, gl_FragColor, gl_FragData) and none that
use "Fragment".
Revision History
Revision 2, 2018/09/11 (pbrown)
- Minor edits preparing the spec for publication.
Revision 1
- Internal revisions.