Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
1284 lines (996 sloc) 62.2 KB
Name
NV_mesh_shader
Name String
GL_NV_mesh_shader
Contact
Christoph Kubisch, NVIDIA (ckubisch 'at' nvidia.com)
Pat Brown, NVIDIA (pbrown 'at' nvidia.com)
Contributors
Yury Uralsky, NVIDIA
Daniel Koch, NVIDIA
Status
Shipping
Version
Last Modified Date: October 5, 2018
NVIDIA Revision: 5
Dependencies
This extension can be applied to OpenGL GLSL versions 4.50
(#version 450) and higher.
This extension can be applied to OpenGL ES ESSL versions 3.20
(#version 320) and higher.
This extension is written against the GLSL 4.50.6 Specification
(Compatibility Profile), dated April 14, 2016.
This extension interacts with GLSL 4.60 and KHR_vulkan_glsl.
This extension interacts with NV_viewport_array2.
This extension interacts with NV_stereo_view_rendering.
This extension interacts with NVX_multiview_per_view_attributes.
This extension interacts with ARB_shader_draw_parameters.
Overview
This extension provides a new mechanism allowing applications to use two
new programmable shader types -- the task and mesh shader -- to generate
collections of geometric primitives to be processed by fixed-function
primitive assembly and rasterization logic. When the task and mesh
shaders are dispatched, they replace the standard programmable vertex
processing pipeline, including vertex array attribute fetching, vertex
shader processing, tessellation, and the geometry shader processing.
Both new shader types have execution environments similar to that of
compute shaders, where a collection of shader invocations form a work
group and cooperate to produce a set of outputs. Unlike traditional
vertex, tessellation, and geometry shaders that typically process a vertex
or primitive at a time, the mesh and task shaders process and generate a
batch of primitives at once. The optional task shader pre-processes
geometry and generates a variable number of mesh shader tasks. The mesh
shader evaluates the geometry corresponding to its task and emits a mesh
-- a collection of vertices arranged into point, line, or triangle
primitives. The primitives emitted by the mesh shader are then processed
by fixed-function primitive assembly and rasterization logic and generate
fragments that will be processed by the fragment shader.
Work is submitted to the mesh pipeline by launching the work from the API
which spawns a one-dimensional array of tasks, similar to the
API dispatch for compute spawns a three-dimensional array of compute shader
work groups. If a task shader is present, each task generated by
this launch spawns a task shader work group. If no task shader is
present, each task generated by the launch spawns a mesh shader
work group.
When a task shader work group is executed, its invocations execute in
parallel and evaluate geometry associated with the task. The task shader
has no built-in or user-defined input variables other than the built-ins
identifying the work group and invocation being executed. The task shader
can use that information to read properties of the geometry associated
with the task from memory, using shader storage buffers, textures, or
other resources. The task shader determines the number of mesh shader
tasks that should be spawned for the task it is processing and writes the
task count to the built-in variable gl_TaskCountNV. Additionally, the
task shader can compute and write additional properties of the geometry it
processes to user-defined output variables qualified with "taskNV" to
task memory, which can be read as inputs by all of the mesh shaders that
it spawns. The task shader can be used to drive level-of-detail
calculations for procedurally generated geometry, to perform coarse-level
culling for batches of static or dynamic geometry, and for other forms of
work reduction or amplification.
When a mesh shader work group is executed, its invocations execute in
parallel to evaluate geometry corresponding to its task and emit a mesh
for further processing by subsequent pipeline stages. As with task
shaders, mesh shaders have no built-in inputs other than those identifying
the work group and invocation being executed, and must fetch their inputs
explicitly from memory. The mesh shader invocations collectively must
produce a mesh, which consists of:
* a primitive count, written to the built-in output gl_PrimitiveCountNV;
* a collection of vertex attributes, where each vertex in the mesh has a
set of built-in and user-defined per-vertex output variables and blocks;
* a collection of primitive attributes, where each of the
gl_PrimitiveCountNV primitives in the mesh has a set of built-in and
user-defined per-primitive output variables and blocks; and
* an array of vertex index values written to the built-in output array
gl_PrimitiveIndicesNV, where each output primitive has a set of one,
two, or three indices that identify the output vertices in the mesh used
to form the primitive.
The number of primitives and vertices emitted by the mesh shader can be
variable, but the mesh shader must specify maximum vertex and primitive
counts. There are implementation-dependent limits on the number of
vertices and primitives emitted by the mesh shader, and are also
implementation-dependent limits on the total amount of memory consumed by
a mesh. In the initial implementation of this extension, implementation
limits are sufficiently low that complex geometry will need to be
decomposed into multiple tasks.
A typical mesh shader used to render static triangle data might operate in
three phases. The first phase fetches vertex position data and local
index data of the primitives that the mesh represents. The index data
would have been prepared offline to leverage vertex re-use within the
mesh. In the second phase, triangles would be culled and output primitive
indices written. Finally, other vertex attributes of the surviving subset
of vertices would be loaded and computed. During this process, the
invocations would sometimes work on a per-vertex and sometimes on a
per-primitive level.
Additionally, mesh shaders include infrastructure to allow a single mesh
shader work group to compute a mesh with multiple "views" (e.g., left and
right eye views for stereoscopic rendering), using a "view index" similar
to the view IDs used in the OVR_multiview (OpenGL and OpenGL ES) and
VK_KHR_multiview (Vulkan) extensions. Unlike those extensions, the
programming model here does not run separate shader invocations for each
view but instead allows shaders to designate individual outputs as
"per-view". When a mesh shader completes, its primitives will be
processed separately for each view with fragments directed at separate
layers of the framebuffer. For each view, outputs designated as per-view
(such as position) will take on values written for that view and all other
outputs will take on a single shared value written for all views.
Conventional From Application
Vertex |
Pipeline v
Launch Mesh Tasks
(Fig 3.1) |
| +---+-----+
| | |
| | |
| | Task Shader ---+
| | | |
| | v |
| | Task Generation | Image Load/Store
| | | | Atomic Counter
| +---+-----+ |<--> Shader Storage
| | | Texture Fetch
| v | Uniform Block
| Mesh Shader ----------+
| | |
+-------------> + |
| |
v |
Rasterization |
| |
v |
Fragment Shader ------+
|
v
Per-Fragment Operations
|
v
Framebuffer
Mesh Processing Pipeline
Mapping to SPIR-V
-----------------
For informational purposes (non-normative), the following is an
expected way for an implementation to map GLSL constructs to SPIR-V
constructs:
task shader -> TaskNV Execution model
mesh shader -> MeshNV Execution model
shared qualifier -> Workgroup Storage Class (existing)
points layout qualifier -> OutputPoints Execution Mode (existing)
lines layout qualifier -> OutputLinesNV Execution Mode
triangles layout qualifier -> OutputTrianglesNV Execution Mode
max_vertices layout qualifier -> OutputVertices Execution Mode (existing)
max_primitives layout qualifier -> OutputPrimitivesNV Execution Mode
local_size_(xyz) layout qualifiers -> LocalSize Execution Mode (existing)
local_size_(xyz)_id layout qualifiers -> LocalSizeId Execution Mode (existing)
perprimitiveNV auxiliary storage qualifier -> PerPrimitiveNV Decoration
perviewNV auxiliary storage qualifier -> PerViewNV Decoration
taskNV auxiliary storage qualifier -> PerTaskNV Decoration
gl_WorkGroupSize -> WorkgroupSize decorated OpVariable (existing)
gl_WorkGroupID -> WorkgroupId decorated OpVariable (existing)
gl_LocalInvocationID -> LocalInvocationId decorated OpVariable (existing)
gl_GlobalInvocationID -> GlobalInvocationId decorated OpVariable (existing)
gl_LocalInvocationIndex -> LocalInvocationIndex decorated OpVariable (existing)
gl_TaskCountNV -> TaskCountNV decorated OpVariable
gl_PrimitiveCountNV -> PrimitiveCountNV decorated OpVariable
gl_PrimitiveIndicesNV -> PrimitiveIndicesNV decorated OpVariable
gl_Position -> Position decorated OpVariable (existing)
gl_PositionPerViewNV -> PositionPerViewNV decorated OpVariable (existing extension)
gl_PointSize -> PointSize decorated OpVariable (existing)
gl_ClipDistance -> ClipDistance decorated OpVariable (existing)
gl_ClipDistancePerViewNV -> ClipDistancePerViewNV decorated OpVariable
gl_CullDistance -> CullDistance decorated OpVariable (existing)
gl_CullDistancePerViewNV -> CullDistancePerViewNV decorated OpVariable
gl_PrimitiveID -> PrimitiveId decorated OpVariable (existing)
gl_Layer -> Layer decorated OpVariable (existing)
gl_LayerPerViewNV -> LayerPerViewNV decorated OpVariable
gl_ViewportIndex -> ViewportIndex decorated OpVariable (existing)
gl_ViewportMask -> ViewportMaskNV decorated OpVariable (existing extension)
gl_ViewportMaskPerViewNV -> ViewportMaskPerViewNV decorated OpVariable (existing extension)
gl_MeshViewCountNV -> MeshViewCountNV decorated OpVariable
gl_MeshViewIndicesNV -> MeshViewIndicesNV decorated OpVariable
gl_DrawID -> DrawIndex decorated OpVariable (existing 1.3, extension)
gl_MeshPerVertexNV -> block name, not needed
gl_MeshPerPrimitiveNV -> block name, not needed
writePackedPrimitiveIndices4x8NV -> OpWritePackedPrimitiveIndices4x8NV()
Modifications to the OpenGL Shading Language Specification, Version 4.50.6
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_NV_mesh_shader : <behavior>
where <behavior> is as specified in section 3.3.
A new preprocessor #define is added to the OpenGL Shading Language:
#define GL_NV_mesh_shader 1
Modify the introduction to Chapter 2, Overview of OpenGL Shading (p. 7)
(modify first paragraph) ... Currently, these processors are the vertex,
tessellation control, tessellation evaluation, geometry, fragment,
compute, task, and mesh processors.
(modify second paragraph) ... The specific languages will be referred to
by the name of the processor they target: vertex, tessellation control,
tessellation evaluation, geometry, fragment, compute, task, or mesh.
Insert new sections at the end of Chapter 2 (p. 9)
Section 2.7, Task Processor
The task processor is a programmable unit that operates in conjunction
with the mesh processor to produce a collection of primitives that will be
processed by subsequent stages of the graphics pipeline. The task and
mesh processors form a primitive processing pipeline that can be used
instead of the conventional primitive processing pipeline that includes
the vertex, tessellation control, tessellation evaluation, and geometry
processors. Compilation units written in the OpenGL Shading Language to
run on this processor are called task shaders. When a set of task shaders
is successfully compiled and linked, they result in a task shader
executable that runs on the task processor.
A task shader has access to many of the same resources as fragment and
other shader processors, including textures, buffers, image variables, and
atomic counters. The task shader has no fixed-function inputs other than
variables identifying the specific work group and invocation; any vertex
attributes or other data required by the task shader must be fetched from
memory. The only fixed output of the task shader is a task count,
identifying the number of mesh shader work groups to spawn. The task
shader can write additional outputs to task memory, which can be read by
all of the mesh shader work groups it spawns.
A task shader operates on a group of work items called a work group. A
work group is a collection of shader invocations that execute the same
code, potentially in parallel. An invocation within a work group may share
data with other members of the same work group through shared variables
and issue memory and control barriers to synchronize with other members of
the same work group.
Section 2.8, Mesh Processor
The mesh processor is a programmable unit that operates in conjunction
with the task processor to produce a collection of primitives that will be
processed by subsequent stages of the graphics pipeline. The task and
mesh processors form a primitive processing pipeline that can be used
instead of the conventional primitive processing pipeline that includes
the vertex, tessellation control, tessellation evaluation, and geometry
processors. Compilation units written in the OpenGL Shading Language to
run on this processor are called mesh shaders. When a set of mesh shaders
is successfully compiled and linked, they result in a mesh shader
executable that runs on the mesh processor.
A mesh shader has access to many of the same resources as fragment and
other shader processors, including textures, buffers, image variables, and
atomic counters. The only inputs available to the mesh shader are
variables identifying the specific work group and invocation and any
outputs written to task memory by the task shader that spawned the mesh
shader's work group. Any vertex attributes or other data required by the
mesh shader must be fetched from memory. The invocations of the mesh
shader work group write an output mesh, comprising a set of primitives
with per-primitive attributes, a set of vertices with per-vertex
attributes, and an array of indices identifying the mesh vertices that
belong to each primitive. The primitives of this mesh are then processed
by subsequent graphics pipeline stages, where the outputs of the mesh
shader form an interface with the fragment shader.
A mesh shader operates on a group of work items called a work group. A
work group is a collection of shader invocations that execute the same
code, potentially in parallel. An invocation within a work group may share
data with other members of the same work group through shared variables
and issue memory and control barriers to synchronize with other members of
the same work group.
Modify Section 3.6, Keywords (p. 18)
(add to the end of the list of keywords, p. 19)
perprimitiveNV
perviewNV
taskNV
Modify Section 3.8.2, Dynamically Uniform Expressions and Uniform Control
Flow (p. 21)
(modify third paragraph of this section)
An invocation group is the complete set of invocations collectively
processing a particular compute, task, or mesh shader workgroup, or a
graphical operation, where the scope ...
Modify Section 4.3, Storage Qualifiers (p. 43)
(modify table of base storage qualifiers, p. 43)
Qualifier Meaning
------------------ -----------------------------------------------
shared variable storage for compute, task, and mesh shaders
shared across all work items in a local work group
(add to table of auxiliary storage qualifiers, p. 44)
Auxiliary Storage
Qualifier Meaning
------------------ -----------------------------------------------
perprimitiveNV mesh shader outputs with per-primitive instances
perviewNV mesh shader outputs with per-view instances
taskNV generic outputs for task shader work groups
Modify Section 4.3.4, Input Variables (p. 46)
(modify third paragraph, p. 47, to treat all mesh shader outputs as
"arrayed" interfaces)
Some inputs and outputs are arrayed ... Geometry shader inputs,
tessellation control shader inputs and outputs, tessellation evaluation
inputs, and mesh shader outputs all have an additional level of arrayness
relative to other shader inputs and outputs. Component limits for these
arrayed interfaces (e.g., gl_MaxTessControlInputComponents) are limits for
a single instance and not for the entire interface.
(insert before the last paragraph, p. 47, "Fragment shader inputs get")
Task shaders do not permit user-defined input variables and do not form a
formal interface with any previous shader stage. See section 7.1 "Built-In
Variables" for a description of built-in task shader input variables. All
other input to a task shader is retrieved explicitly through image loads,
texture fetches, loads from uniforms, uniform buffers, or shader storage
buffers, or other user supplied code. Redeclaration of built-in input
variables in task shaders is not permitted.
Mesh shaders form an interface with task shaders and support a collection
of input variables in task memory. All user-defined mesh shader inputs
must be declared as members of a single interface block qualified with
"taskNV" qualifier. Mesh shaders do not support user-defined inputs
declared outside interface blocks or without "taskNV" and do not support
more than one input interface block. In addition to user-defined inputs,
mesh shaders support the built-in input variables described in section
7.1. User-defined mesh shader input variables are filled with the values
of matching user-defined output variables written by the task shader. As
with other input variables, mesh shader inputs in task memory must be
declared using the same type and qualification as task memory outputs from
the previous (task) shader stage. It is a compile-time error to use the
"taskNV" qualfier with inputs in any stage other than the mesh shader.
All other input to a task shader is retrieved explicitly through image
loads, texture fetches, loads from uniforms, uniform buffers, or shader
storage buffers, or other user supplied code. Redeclaration of built-in
input variables in mesh shaders is not permitted.
(modify last paragraph, p. 47)
Fragment shader inputs get... The auxiliary storage qualifiers centroid,
sample, and perprimitiveNV can also be applied, as well as...
(modify first paragraph, p. 48)
Fragment shader inputs that are signed or unsigned integers, integer
vectors, or any double-precision floating-point type must be qualified
with the interpolation qualifier flat or with the auxillary storage
qualifier perprimitiveNV.
(add a new example to the second paragraph, p. 48)
primitive in vec3 triangleNormal;
(modify third paragraph, p. 48)
The fragment shader inputs form an interface with the mesh shader or last
active shader in the conventional vertex processing pipeline (e.g.,
vertex, tessellation evaluation, geometry). ... Also, interpolation
qualification (e.g., flat) and auxiliary qualification other than
"perprimitiveNV" (e.g. centroid) may differ. ...
Modify Section 4.3.6, Output Variables (p. 49)
(modify last paragraph, p. 49 to add task and mesh shaders)
It is a compile-time error to declare a vertex, tessellation evaluation,
tessellation control, geometry, task, or mesh shader output that contains
any of the following: ...
(insert before the next-to-last paragraph "The order of execution", p. 50)
Task shader output variables may be used to write values in task memory
that can be read by the mesh shader invocations for the tasks that it
spawns. All user-defined task shader outputs must be declared as members
of a single interface block qualified with "taskNV" qualifier. Task
shaders do not support user-defined outputs declared outside interface
blocks or without "taskNV" and do not support more than one output. It is
a compile-time error to use the "taskNV" qualifier in output declarations
in any other shader stage.
Mesh shader output variables may be used to write per-vertex or
per-primitive data. Output variables qualified with "perprimitiveNV"
have separate instances for each primitive in the output mesh; all other
output variables have separate instances for each vertex in the output
mesh. It is a compile-time error to use the "perprimitiveNV" qualifier
in output declarations in any other shader stage. Both types of output
variables are arrayed (see "arrayed" under 4.3.4, Inputs) and each
per-vertex or per-primitive output variable (or output block, see
interface blocks below) needs to be declared as an array. For example,
out float vertexColor[]; // per-vertex color
perprimitiveNV out vec3 triangleNormal[]; // per-triangle normal
Each element of such an array corresponds to one vertex or primitive of
the output mesh. Each array can optionally have a size declared. The
array size will be set by (or if provided must be consistent with) the
output layout declaration(s) establishing the maximum number of vertices
and primitives in the output mesh. When checking a mesh shader against
implementation limits on the total number of output variable components,
the compiler adds the number of per-vertex outputs for a single vertex
instance and the number of per-primitive outputs for a single primitive
instance. Unlike tessellation control shaders, a mesh shader invocation
may write to outputs for any vertex or primitive.
Mesh shader outputs qualified with "perviewNV" are considered to be
per-view and arrayed with a second additional level of arrayness. Each
non-block output variable must to be declared as an array with at least
two dimensions. For output block members, one level of arrayness applies
to the block declaration and a second applies to the block member
declaration. For example,
perviewNV out float perViewVertexColor[][];
out PerVertexBlock {
perviewNV vec2 perViewTextureCoord[];
} v[];
For non-block output variables, each element in the outer (leftmost)
dimension of such an array corresponds to one vertex or primitive of the
output mesh, as described immediately above. Each element in the second
(next-to-leftmost) dimension corresponds to a single view of the output
primitive or vertex. The array dimension corresponding to the view number
can optionally have a size declared. The array size will be set to (or if
provided must be consistent with) the maximum number of views supported by
the implementation given by the constant gl_MaxMeshViewCountNV.
When using per-view outputs, all view instances of per-view outputs count
separately against implementation limits on the total number of output
components. Additionally, values for extra views will be stored in the
upper end of the set of available locations for mesh shader outputs. A
compile- or link-time error will be generated if extra storage required
for extra per-view outputs leaves the compiler unable to assign locations
for all outputs or includes a location already consumed by an active
output variable with an associated "location" layout qualifier.
(modify the next-to-last and last paragraph, p. 50)
The order of execution of tessellation control, task, and mesh shader
invocations relative to the other invocations for the same input patch or
local work group is undefined unless the built-in function barrier() is
used to provide some control over relative execution order. When a shader
invocation calls barrier(), ...
Because tessellation control, task, and mesh shader invocations execute in
undefined order between barriers, the values of output variables will
sometimes be undefined. ...
Modify Section 4.3.8, Shared Variables (p. 52)
(modify first paragraph of the section, p. 52)
The shared qualifier is used to declare variables that have storage shared
between all work items in a compute, task, or mesh shader local work
group. Variables declared as shared may only be used in compute, task, or
mesh shaders. ...
(modify last paragraph of the section, p. 52)
There is a limit to the total size of all variables declared as shared in
a single shader stage. This limit, expressed in units of basic machine
units may be determined by using the OpenGL API to query the value of
MAX_COMPUTE_SHARED_MEMORY_SIZE (compute shaders),
MAX_TASK_SHARED_MEMORY_SIZE_NV (task shaders), or
MAX_MESH_SHARED_MEMORY_SIZE_NV (mesh shaders)
Modify Section 4.3.9, Interface Blocks, p. 52
(rework grammar rules, p. 53, to allow "taskNV", "perprimitiveNV", and
"perviewNV" to qualify blocks)
interface-qualifier:
in-block-qualifiers(_opt) "in"
out-block-qualifiers(_opt) "out"
uniform
buffer
// Note: Not shown for simplicity, but memory qualifiers may also be used
in-block-qualifiers:
patch
taskNV
perprimitiveNV
out-block-qualifiers:
out-block-qualifier
out-block-qualifier out-block-qualifiers
out-block-qualifier:
patch
taskNV
perprimitiveNV
perviewNV
Modify Section 4.4, Layout Qualifiers, p. 57
(modify the layout qualifier table, pp. 58-59)
Layout Qualifier | Qualifier | Individual | Block | Block | Allowed interfaces
| only | variable | | Member |
-------------------+-----------+------------+-------+--------+--------------------
local_size_x = | | | | | compute in
local_size_y = | X | | | | mesh in
local_size_z = | | | | | task in
-------------------+-----------+------------+-------+--------+--------------------
max_vertices = | X | | | | geometry out
| | | | | mesh out
-------------------+-----------+------------+-------+--------+--------------------
max_primitives = | X | | | | mesh out
-------------------+-----------+------------+-------+--------+--------------------
[ points ] | | | | |
[ lines ] | X | | | | mesh out
[ triangles ] | | | | |
Add new Section 4.4.1.5, Task Shader Inputs, p. 67
(note: the content of this section is nearly identical to the content of
section 4.4.1.4, Compute Shader Inputs)
There are no layout location qualifiers for task shader inputs.
Layout qualifier identifiers for task shader inputs are the work group
size qualifiers:
layout-qualifier-id :
local_size_x = integer-constant-expression
local_size_y = integer-constant-expression
local_size_z = integer-constant-expression
These task shader input layout qualifers behave identically to the
equivalent compute shader qualifiers and specify a fixed local group size
used for each task shader work group. If no size is specified in any of
the three dimensions, a default size of one will be used.
If the fixed local group size of the shader in any dimension is greater
than the maximum size supported by the implementation for that dimension,
a compile-time error results. Also, if such a layout qualifier is
declared more than once in the same shader, all those declarations must
set the same set of local workgroup sizes and set them to the same values;
otherwise a compile-time error results. If multiple task shaders attached
to a single program object declare a fixed local group size, the
declarations must be identical; otherwise a link-time error results.
Furthermore, if a program object contains any task shaders, at least one
must contain an input layout qualifier specifying a fixed local group size
for the program, or a link-time error will occur.
Note that task shaders do not currently support multi-dimensional work
groups; the maximum value for local_size_y and local_size_z will be one.
Add new Section 4.4.1.6, Mesh Shader Inputs, p. 67
(note: the content of this section is nearly identical to the content of
section 4.4.1.4, Compute Shader Inputs)
There are no layout location qualifiers for mesh shader inputs.
Layout qualifier identifiers for mesh shader inputs are the work group
size qualifiers:
layout-qualifier-id :
local_size_x = integer-constant-expression
local_size_y = integer-constant-expression
local_size_z = integer-constant-expression
These mesh shader input layout qualifers behave identically to the
equivalent compute shader qualifiers and specify a fixed local group size
used for each mesh shader work group. If no size is specified in any of
the three dimensions, a default size of one will be used.
If the fixed local group size of the shader in any dimension is greater
than the maximum size supported by the implementation for that dimension,
a compile-time error results. Also, if such a layout qualifier is
declared more than once in the same shader, all those declarations must
set the same set of local workgroup sizes and set them to the same values;
otherwise a compile-time error results. If multiple mesh shaders attached
to a single program object declare a fixed local group size, the
declarations must be identical; otherwise a link-time error results.
Furthermore, if a program object contains any mesh shaders, at least one
must contain an input layout qualifier specifying a fixed local group size
for the program, or a link-time error will occur.
Note that mesh shaders do not currently support multi-dimensional work
groups; the maximum value for local_size_y and local_size_z will be one.
Modify section 4.4.2.1, Transform Feedback Layout Qualifiers, p. 69
(add a new paragraph at the end of the section, p. 71)
Transform feedback is not supported to capture the outputs of task and
mesh shaders. Use of transform feedback layout qualifiers in these shader
types will result in a compile-time error.
Add new Section 4.4.2.5, Mesh Shader Outputs, p. 75
Mesh shaders can have three additional types of output layout identifiers:
an output primitive type, a maximum output vertex count, and a maximum
output primitive count. The primitive type, vertex and primitive count
identifiers are allowed only on the interface qualifier out, not on an
output block, block member, or variable declaration.
The layout qualifier identifiers for mesh shader outputs are
layout-qualifier-id :
points
lines
triangles
max_vertices = integer-constant-expression
max_primitives = integer-constant-expression
The primitive type identifiers "points", "lines", and "triangles" are used
to specify the type of output primitive produced by the mesh shader, and
only one of these is accepted. At least one mesh shader (compilation
unit) in a program must declare an output primitive type, and all mesh
shader output primitive type declarations in a program must declare the
same primitive type. It is not required that all mesh shaders in a
program declare an output primitive type.
The vertex count identifier "max_vertices" is used to specify the maximum
number of vertices the shader will ever emit for the invocation group. At
least one mesh shader (compilation unit) in a program must declare a
maximum output vertex count, and all mesh shader output vertex count
declarations in a program must declare the same count. It is not required
that all mesh shaders in a program declare a count.
The primitive count identifier "max_primitives" is used to specify the
maximum number of primitives the shader will ever emit for the invocation
group. At least one mesh shader (compilation unit) in a program must
declare a maximum output primitive count, and all mesh shader output
primitive count declarations in a program must declare the same count. It
is not required that all mesh shaders in a program declare a count.
The intrinsically declared output block gl_MeshVerticesNV[] and any user-defined
output variables or blocks not qualified with "perprimitiveNV" will be
sized by the "max_vertices" output declaration. The intrinsically
declared output block gl_MeshPrimitivesNV[] and any user-defined output
variables or blocks qualified with "perprimitiveNV" will be sized by the
"max_primitives" output declaration. The intrinsically declared array
gl_PrimitiveIndicesNV[] will be sized according to the primitive type and
"max_primitives" declarations, where the size is:
* the value of "max_primitives" if "points" is declared
* two times the value of "max_primitives" if "lines" is declared, or
* three times the value of "max_primitives" if "triangles" is declared.
For outputs declared without an array size, including intrinsically
declared outputs (e.g., gl_MeshVerticesNV), a layout must be declared before any use
of the method length() or other array use that requires its size to be
known. It is a compile-time error if an output array is declared with an
explicit size that does not match the array size derived from the layout
qualifier.
Modify Section 4.5, Interpolation Qualifiers, p. 83
(modify first paragraph of the section, p. 83)
The presence of and type of interpolation is controlled by the above
interpolation qualifiers as well as the auxiliary storage qualifiers
centroid and sample. The auxiliary storage qualifiers "patch", "taskNV",
"perprimitiveNV" are not used for interpolation; it is a compile-time
error to use interpolation qualifiers with those auxillary storage
qualifiers. The auxillary storage qualifier "perviewNV" may not be used
when declaring fragment shader inputs, but can be used with interpolation
qualifiers in the declaration of mesh shader outputs.
(add a new paragraph at the end of the section, p. 84)
A variable qualified with the auxillary storage qualifier
"perprimitiveNV" will also not be interpolated. Instead, it will use
the same per-primitive value for all fragments generated by each
primitive. Such a variable can also qualified with an interpolation
qualifier with centroid or sample, but those qualifications will mean the
same thing as only qualifying with "perprimitiveNV".
Modify Section 7.1, Built-In Language Variables (p. 120)
(insert after the first paragraph and variable list, p. 123)
In the task language, built-in variables are intrinsically declared as:
const uvec3 gl_WorkGroupSize;
in uvec3 gl_WorkGroupID;
in uvec3 gl_LocalInvocationID;
in uvec3 gl_GlobalInvocationID;
in uint gl_LocalInvocationIndex;
in uint gl_MeshViewCountNV;
in uint gl_MeshViewIndicesNV[];
out uint gl_TaskCountNV;
In the mesh language, built-in variables are intrinsically declared as:
const uvec3 gl_WorkGroupSize;
in uvec3 gl_WorkGroupID;
in uvec3 gl_LocalInvocationID;
in uvec3 gl_GlobalInvocationID;
in uint gl_LocalInvocationIndex;
in uint gl_MeshViewCountNV;
in uint gl_MeshViewIndicesNV[];
out uint gl_PrimitiveCountNV;
out uint gl_PrimitiveIndicesNV[];
out gl_MeshPerVertexNV {
vec4 gl_Position;
perviewNV vec4 gl_PositionPerViewNV[]; // NVX_multiview_per_view_attributes
float gl_PointSize;
float gl_ClipDistance[];
perviewNV float gl_ClipDistancePerViewNV[][];
float gl_CullDistance[];
perviewNV float gl_CullDistancePerViewNV[][];
} gl_MeshVerticesNV[];
perprimitiveNV out gl_MeshPerPrimitiveNV {
int gl_PrimitiveID;
int gl_Layer;
perviewNV int gl_LayerPerViewNV[];
int gl_ViewportIndex;
int gl_ViewportMask[]; // NV_viewport_array2
perviewNV int gl_ViewportMaskPerViewNV[][];
} gl_MeshPrimitivesNV[];
(modify the discussion of the built-in variables shared with compute
shaders, which starts on p. 123)
The built-in constant gl_WorkGroupSize is a compute, task, or mesh shader
constant containing the local work-group size of the shader. The size ...
The built-in variable gl_WorkGroupID is a compute, task, or mesh shader
input variable containing the three-dimensional index of the global work
group that the current invocation is executing in. ...
The built-in variable gl_LocalInvocationID is a compute, task, or mesh
shader input variable containing the three-dimensional index of the local
work group within the global work group that the current invocation is
executing in. ...
The built-in variable gl_GlobalInvocationID is a compute, task, or mesh
shader input variable containing the global index of the current work
item. This value uniquely identifies this invocation from all other
invocations across all local and global work groups initiated by the
current DispatchCompute or DispatchMeshTasksNV call or by a previously
executed task shader. ...
The built-in variable gl_LocalInvocationIndex is a compute, task, or mesh
shader input variable that contains the one-dimensional representation of
the gl_LocalInvocationID.
(modify discussion of gl_PrimitiveID, gl_Layer, and gl_ViewportIndex to
allow as a mesh output, pp. 125-127)
The output variable gl_PrimitiveID is available only in the geometry and
mesh languages and provides a single integer that serves as a primitive
identifier. This is then available to fragment shaders as the fragment
input gl_PrimitiveID, which will select the written primitive ID from the
provoking vertex in the primitive being shaded when using a geometry
shader or from the appropriate per-primitive output value when using a
mesh shader. If a fragment shader using gl_PrimitiveID is active and a
geometry or mesh shader is also active, the geometry or mesh shader must
write to gl_PrimitiveID or the fragment shader input gl_PrimitiveID is
undefined. ...
The variable gl_Layer is available as an output variable in the geometry
and mesh languages and an input variable in the fragment language. In the
geometry and mesh languages, it is used to select a specific layer (or
face and layer of a cube map) of a multi-layer framebuffer attachment.
When using a geometry shader, the actual layer used will come from one of
the vertices in the primitive being shaded. Which vertex the layer comes
from is discussed in section 11.3.4.6 "Layer and Viewport Selection" of
the OpenGL Specification. It might be undefined, so it is best to write
the same layer value for all vertices of a primitive. When using a mesh
shader, the actual layer will come from the appropriate per-primitive
output value written by the mesh shader. ...
The input variable gl_Layer in the fragment language will have the same
value that was written to the output variable gl_Layer in the geometry or
mesh language. If the geometry or mesh stage does not dynamically assign
... If the geometry or mesh stage makes no static assignment to gl_Layer,
the input value... Otherwise, the fragment stage will read the same value
written by the geometry or mesh stage, even if...
The variable gl_ViewportIndex is available as an output variable in the
geometry and mesh languages and an input variable in the fragment
language. In the geometry and mesh language, it provides the ...
Primitives generated by the geometry or mesh shader will undergo viewport
transformation and scissor testing using the viewport transformation and
scissor rectangle selected by the value of gl_ViewportIndex. When using a
geometry shader, the viewport index used will come from one of the
vertices in the primitive being shaded. However, which vertex the
viewport index comes from is implementation-dependent, so it is best to
use the same viewport index for all vertices of the primitive. When using
a mesh shader, the viewport index used will come from the appropriate
per-primitive output value written by the mesh shader. If a geometry or
mesh shader does not assign a value to gl_ViewportIndex, ... If a
geometry or mesh shader statically assigns a value to gl_ViewportIndex...
The input variable gl_ViewportIndex in the fragment stage will have the
same value that was written to the output variable gl_ViewportIndex in the
geometry or mesh stage. If the geometry or mesh stage does not dynamically
assign... If the geometry or mesh stage makes no static assignment...
Otherwise, the fragment stage will read the same value written by the
geometry or mesh stage, even if...
(insert new paragraphs before the seventh paragraph, starting with
"Fragment shaders output values", p. 127, describing new task and mesh
built-in variables)
The input variable gl_MeshViewCountNV is only available in the mesh and
task languages and defines the number of views processed by the current
mesh and task shader invocations. When using the multi-view API feature,
the primitives emitted by the mesh shader will be processed separately for
each enabled view and sent to a different layer of a layered render
target. Mesh shader outputs qualified with "perviewNV" are declared as
arrays with separate values for each view. To ensure defined results,
mesh shaders must write values for array elements zero through
gl_MeshViewCountNV-1 for each such per-view output.
The input variable gl_MeshViewIndicesNV is only available in the mesh and
task languages. This variable is an array where each element holds the
view number of one of the views being processed by the current mesh and
task shader invocations. The array elements with indices greater than or
equal to the value of gl_MeshViewCountNV are undefined. If the value of
gl_MeshViewIndicesNV[i] is <j>, then any outputs qualified with
"perviewNV" will take on the value of array element <i> when processing
primitives for view index <j>.
The output variable gl_TaskCountNV is only available in the task language
and defines the number of subsequent mesh shader work groups to generate
upon completion of the task shader.
The output variable gl_PrimitiveCountNV is only available in the mesh
language and defines the number of primitives in the output mesh produced
by the mesh shader that should be processed by subsequent pipeline stages.
The output array variable gl_PrimitiveIndicesNV[] is only available in the
mesh language. Depending on the output primitive type declared using a
layout qualifier, each group of one (points), two (lines), three
(triangles) specifies the indices of the vertices making up the primitive.
All index values must be in the range [0, N-1], where N is the value of
the "max_vertices" layout qualifier. Out-of-bounds index values will
result in undefined behavior.
The mesh shader output block members gl_PositionPerViewNV[],
gl_ClipDistancePerViewNV[][], gl_CullDistancePerViewNV[],
gl_LayerPerViewNV[], and glViewportMaskPerViewNV[][] are per-view versions
of the single-view variables with equivalent names that lack the
"PerViewNV" suffix:
Per-View Variable Single-View Variable
---------------------------- --------------------
gl_PositionPerViewNV[] gl_Position
gl_ClipDistancePerViewNV[][] gl_ClipDistance[]
gl_CullDistancePerViewNV[][] gl_CullDistance[]
gl_LayerPerViewNV[] gl_Layer
gl_ViewportMaskPerViewNV[][] gl_ViewportMask[]
All of these outputs are considered arrayed, with separate values for each
view. The view number is used to index in the first dimension of these
arrays. For all of these variables, if a shader statically assigns a
value to any element of a per-view array, it may not statically assign a
value to the equivalent single-view variable in any mesh shader
compilation unit.
As with the gl_ClipDistance[] and gl_CullDistance[] arrays, the second
dimension of gl_ClipDistancePerViewNV[] and gl_CullDistancePerViewNV[] is
predeclared as unsized and must be sized by the shader either redeclaring
it with a size or indexing it only with integral constant expressions. The
size determines the number and set of enabled clip or cull distances and
can be at most gl_MaxClipDistances or gl_MaxCullDistances, respectively.
The number of varying components consumed by these arrays will match the
size of the array, and shaders writing to either array must write all
enabled distances, or clipping/culling results will be undefined.
(modify the fifth paragraph, p. 129)
The gl_PerVertex, gl_MeshPerVertexNV, and gl_MeshPerPrimitiveNV blocks can
be redeclared in a shader to explicitly indicate what subset of the fixed
pipeline interface will be used. ...
(modify the sixth paragraph, p. 129)
This establishes the output interface the shader will use with the
subsequent pipeline stage. It must be a subset of the built-in members of
gl_PerVertex, gl_MeshPerVertexNV, or gl_MeshPerPrimitiveNV. ...
Modify Section 7.3, Built-In Constants (p. 136)
Add to the end of the long list of constants that makes up this section:
const int gl_MaxMeshViewCountNV = 4;
Add new Section 8.xx, Mesh Shader Functions, after section 8.15, p. 187
These functions are only available in mesh shaders.
Insert a syntax/description table similar to the previous section.
Syntax:
void writePackedPrimitiveIndices4x8NV(uint indexOffset,
uint packedIndices)
Description:
Interprets the <packedIndices> as four 8 bit unsigned int values and
stores them into the gl_PrimitiveIndicesNV array starting from the
provided <indexOffset>, which must be a multiple of four.
Lower bytes are stored at lower addresses in the array.
The write operations must not exceed the size of the
gl_PrimitiveIndicesNV array.
Modify Section 8.16, Shader Invocation Control Functions, p. 186
(modify first paragraph of the section, p. 186)
The shader invocation control function is available only in tessellation
control, compute, task, and mesh shaders and compute shaders. It is used
to control the relative execution order of multiple shader invocations
used to process a patch (in the case of tessellation control shaders) or a
local work group (in the case of compute, task, and mesh shaders), which
are otherwise executed with an undefined relative order.
(modify the last paragraph, p. 186)
For compute, task, and mesh shaders, the barrier() function may be placed
within flow control, but that flow control must be uniform flow control.
...
Modify Section 8.17, Shader Memory Control Functions, p. 187
(modify table of functions, p. 187)
void memoryBarrierShared()
Control the ordering of memory transactions to shared variables issued
within a single shader invocation.
Only available in compute, task, and mesh shaders.
void groupMemoryBarrier()
Control the ordering of all memory transactions issued within a single
shader invocation, as viewed by other invocations in the same work
group.
Only available in compute, task, and mesh shaders.
(modify last paragraph, p. 187)
... all of the above variable types. The functions memoryBarrierShared()
and groupMemoryBarrier() are available only in compute, task, and mesh
shaders; the other functions are available in all shader types.
(modify last paragraph, p. 188)
... When using the function groupMemoryBarrier(), this ordering guarantee
applies only to other shader invocations in the same compute, task, or
mesh shader work group; all other memory barrier functions provide the
guarantee to all other shader invocations. ...
Interactions with GLSL 4.60 and KHR_vulkan_glsl
If GLSL 4.60 or KHR_vulkan_glsl is supported, the layout qualifiers
"local_size_x_id", "local_size_y_id", and "local_size_z_id" are supported
in mesh and task shaders, as in compute shaders.
In the big layout qualifier table in section 4.4, add:
Layout Qualifier | Qualifier | Individual | Block | Block | Allowed interfaces
| only | variable | | Member |
-------------------+-----------+------------+-------+--------+--------------------
local_size_x_id = | | | | | compute in
local_size_y_id = | X | | | | mesh in
local_size_z_id = | | | | | task in
| | | | | (SPIR-V generation
| | | | | only)
No changes are required to the spec language describing these layout
qualifiers, since the language doesn't specifically reference compute
shaders and the mesh/task support should be identical.
Interactions with NV_viewport_array2
If NV_viewport_array2 is not supported, remove gl_ViewportMask[] from the
gl_PerPrimitiveNV block declaration.
Interactions with NV_stereo_view_rendering
Mesh shaders support a fully generic set of per-view positions and
viewport masks, so we include no support for the more limited
gl_SecondaryPositionNV and gl_SecondaryViewportMaskNV[] built-ins from
NV_stereo_view_rendering.
Interactions with NVX_multiview_per_view_attributes
If NVX_multiview_per_view_attributes is not supported, remove
gl_PositionPerViewNV[] from the gl_PerVertex block declaration and remove
gl_ViewportMaskPerViewNV[] from the gl_PerPrimitiveNV block declaration.
If NVX_multiview_per_view_attributes is supported, it is a compile-time
error for a mesh shader to make a static assignment to
gl_PositionPerViewNV as well as to either of gl_Position or
gl_SecondaryPositionNV.
If NVX_multiview_per_view_attributes is supported, it is a compile-time
error for a mesh shader to make a static assignment to
gl_ViewportMaskPerViewNV[] as well as to either of glViewportMask[] or
gl_SecondaryViewportMaskNV[].
Interactions with ARB_shader_draw_parameters
If ARB_shader_draw_parameters is supported, the task and mesh shaders
will also have the following built-in inputs:
in int gl_DrawIDARB;
The variable <gl_DrawIDARB> is a vertex, task and mesh language input
variable that holds the integer index of the drawing command to which the
current vertex belongs (see "Shader Inputs" in section 11.1.3.9 of the
OpenGL Graphics System Specification), or for the latter the current
task or mesh workgroup. If the vertex or workgroup is not invoked by a
Multi* form of a draw command, then the value of gl_DrawIDARB is zero.
Issues
(1) What are the matching requirements between mesh outputs declared
with "perprimitiveNV" and fragment shader inputs? What should we do
with interpolation and other auxillary storage qualifiers on
per-primitive values?
RESOLVED: In the initial implementation of this extension, reading
per-primitive mesh shader outputs in a fragment shader would return
incorrect/undefined values if the fragment shader input has no special
qualification. As a result, we require that mesh shader outputs
qualified with "perprimitiveNV" be matched with fragment shader inputs
qualified with "perprimitiveNV" and vice versa.
We currently allow any of the interpolation and related auxillary
storage qualifiers (e.g, flat, centroid) on fragment shader inputs
qualified with "perprimitiveNV". These qualifiers have no effect. This
resolution is consistent with the core GLSL specification language that
allows (and ignores) auxilliary storage qualifiers such as "sample" or
"centroid" to be used on inputs qualified by "flat", despite the fact
that the storage qualifiers are meaningless for flat-shaded attributes.
(2) How do "arrayed" outputs and blocks work for mesh shaders? Do you
have to declare an array dimension? If you do declare an array
dimension, how is it checked?
RESOLVED: The rules for mesh shader outputs are the same as for arrayed
inputs and outputs in tessellation control, tessellation evaluation, and
geometry shaders. When declaring an "arrayed" block, the size is
optional. If omitted, the size is taken from the maximum vertex or
primitive counts declared using layout qualifiers ("max_vertices" and
"max_primitives"). If a size is provided, it must match the limits
specified by the layout qualifiers.
(3) How are location layout qualfiers handled in mesh and task shaders?
Do we support some sort of layout or offset qualifier for task memory?
RESOLVED: For mesh shader outputs, the "location" layout qualifier is
supported and is used for interface matching with the fragment shader.
Locations assigned to mesh shader outputs have the same semantics as
locations assigned to vertex, tessellation control, tessellation
evaluation, and geometry shader outputs. As with tessellation control
shaders, mesh shader outputs are "arrayed" with separate instances of
each variable or block for each output vertex or primitive. These
multiple instances do not consume separate locations for each
vertex/primitive.
For task shader outputs (used as mesh shader inputs), we've chosen not
to support any location or offset layout qualifiers. Instead, we limit
task and mesh shaders to use at most one block qualified by "taskNV" and
do not allow non-block variables to use "taskNV". With a single block
where member declarations need to match between stages, any internal
offsets/locations can be assigned by the compiler without any external
annotation.
(4) For mesh shaders supporting multiple views, how do applications
specify the set of views that should be produced?
RESOLVED: Ignoring mesh shaders, there are significant differences in
how multiple views are handled in OpenGL and Vulkan. OVR_multiview
(OpenGL ES) specifies the view count using the "num_views" layout
qualifier, where shaders will implicitly use views 0 through
num_views-1. VK_KHR_multiview (Vulkan) provides no view information in
the shader, other than references to a view index. Instead, the Vulkan
render pass specifies a bitfield identifying the set of views to
produce. In the Vulkan algorithm, there is no explicit notion of a view
count in the shader, and the view mask is not known at shader compile
time.
For mesh shaders in OpenGL, we use the same OVR_multiview "num_views"
layout qualifier to specify the view count. Unlike multiview vertex
shaders, multiview mesh shaders are not run separately for each view.
The "num_views" layout qualifier is used only to determine array sizes
for outputs qualified with "perviewNV". For mesh shaders in Vulkan, the
view mask of the render pass is used to determine the storage
requirements of per-view attributes and controls the values of the
gl_MeshViewCount and gl_MeshViewIndicesNV built-ins.
(5) For outputs declared with "perviewNV", which are arrays with separate
elements for each view, what are the rules for array sizing and
indexing? Do you have to declare an array dimension? If you do
declare an array dimension, how is it checked?
RESOLVED: The rules for per-view mesh shader outputs are the same as
for arrayed inputs and outputs in tessellation control, tessellation
evaluation, and geometry shaders, as well as the per-vertex and
per-primitive mesh shader output arrays. When declaring an output
qualified with "perviewNV", an extra array dimension needs to be used
for indexing across views. The array size in that dimension is
optional. If omitted, the size is taken from the implementation
dependent maximum view count. If provided, the size must match the
maximum view count.
Given that the view count on Vulkan is inferred at *run time* from the
view mask in the render pass, we can't use that derived view count for
SPIR-V code generation and compile-time error checking. Because of
this, we have chosen to use the *maximum* view count for sizing per-view
arrays, which is known at compile time.
(6) What built-ins should be provided for multi-view mesh shaders?
RESOLVED: We provide per-view versions of gl_Position,
gl_ClipDistance[], and gl_CullDistance[] in the built-in block
gl_MeshPerVertexNV:
perviewNV vec4 gl_PositionPerViewNV[];
perviewNV float gl_ClipDistancePerViewNV[][];
perviewNV float gl_CullDistancePerViewNV[][];
Because these per-view built-ins refer to the same attributes as the
equivalent standard built-ins, we prohibit the static use of a per-view
built-in and its standard equivalent in a single shader.
We considered instead allowing shaders to redeclare output blocks to add
"perviewNV" qualification to existing built-ins, such as:
out gl_PerVertex {
perviewNV vec4 gl_Position[];
} v[];
This approach was rejected because modifying the basic types of built-in
variables could result in new declarations that consist with the basic
definitions built into the compiler.
(7) For multi-view, how do we broadcast mesh shader outputs to multiple
layers or viewports, where at least some outputs have per-view values?
RESOLVED: In the OpenGL and Vulkan multi-view extensions, the
programming model has logically separate shader invocations for each
view. These extensions have a view ID/index built-in that can be used
to determine which view is being processed by a given invocation. If a
hardware platform is capable of compiling a multi-view shader to
correctly process multiple views in a single shader invocation, the
implementation is free to perform such an optimization.
For mesh shaders, a transparent optimization that combines invocations
for N different views is significantly more problematic. Separate
invocations could produce structurally different output (e.g., different
primitive counts or different topology), which would be more difficult
to "broadcast". To simplify matters, we instead use a programming model
where there is a single work group that processes all views at once.
For per-view attributes, the mesh shader is responsible for computing
separate output values for each view.
(8) Should the gl_NumWorkGroups built-in be supported in task or mesh
shaders, as with compute shaders?
RESOLVED: No, this isn't worth the trouble. If required, an
application can pass a workgroup count manually via a uniform.
If we were to support such a thing, it would be necessary to figure out
how this built-in would interact with gl_NumWorkGroups. For compute
shaders, if you dispatched five workgroups with DispatchCompute, they
would always be numbered 0..4 and have values less than
gl_NumWorkGroups. If you called glDrawMeshTasksNV with <first> set to 3
and <count> set to 5, the work groups would be numbered 3..7 and it
would be necessary to decide if gl_NumWorkGroups should be 5 or 8.
Revision History
Version 5, October 5, 2018 (pbrown)
- Add an interaction with GLSL 4.60 and GL_KHR_vulkan_glsl to allow the
use of "local_size_[xyz]_in" where applicable.
Version 4, October 4, 2018 (pbrown)
- Fix incorrect layout qualifier table entries. "local_size_[xyz]" is
legal in task shaders.
Version 3, September 18, 2018 (pbrown)
- Additional edits preparing for publication.
Version 2, September 11, 2018 (pbrown)
- Miscellaneous edits preparing for publication.
Version 1 (ckubisch, pbrown)
- NVIDIA internal revisions.