Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
GLSL/extensions/nv/GLSL_NV_mesh_shader.txt
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1298 lines (1006 sloc)
62.7 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Name | |
NV_mesh_shader | |
Name String | |
GL_NV_mesh_shader | |
Contact | |
Christoph Kubisch, NVIDIA (ckubisch 'at' nvidia.com) | |
Pat Brown, NVIDIA (pbrown 'at' nvidia.com) | |
Contributors | |
Yury Uralsky, NVIDIA | |
Daniel Koch, NVIDIA | |
Sahil Parmar, NVIDIA | |
Status | |
Shipping | |
Version | |
Last Modified Date: March 6, 2019 | |
NVIDIA Revision: 7 | |
Dependencies | |
This extension can be applied to OpenGL GLSL versions 4.50 | |
(#version 450) and higher. | |
This extension can be applied to OpenGL ES ESSL versions 3.20 | |
(#version 320) and higher. | |
This extension is written against the GLSL 4.50.6 Specification | |
(Compatibility Profile), dated April 14, 2016. | |
This extension interacts with GLSL 4.60 and KHR_vulkan_glsl. | |
This extension interacts with NV_viewport_array2. | |
This extension interacts with NV_stereo_view_rendering. | |
This extension interacts with NVX_multiview_per_view_attributes. | |
This extension interacts with ARB_shader_draw_parameters. | |
This extension interacts with EXT_clip_cull_distance. | |
Overview | |
This extension provides a new mechanism allowing applications to use two | |
new programmable shader types -- the task and mesh shader -- to generate | |
collections of geometric primitives to be processed by fixed-function | |
primitive assembly and rasterization logic. When the task and mesh | |
shaders are dispatched, they replace the standard programmable vertex | |
processing pipeline, including vertex array attribute fetching, vertex | |
shader processing, tessellation, and the geometry shader processing. | |
Both new shader types have execution environments similar to that of | |
compute shaders, where a collection of shader invocations form a work | |
group and cooperate to produce a set of outputs. Unlike traditional | |
vertex, tessellation, and geometry shaders that typically process a vertex | |
or primitive at a time, the mesh and task shaders process and generate a | |
batch of primitives at once. The optional task shader pre-processes | |
geometry and generates a variable number of mesh shader tasks. The mesh | |
shader evaluates the geometry corresponding to its task and emits a mesh | |
-- a collection of vertices arranged into point, line, or triangle | |
primitives. The primitives emitted by the mesh shader are then processed | |
by fixed-function primitive assembly and rasterization logic and generate | |
fragments that will be processed by the fragment shader. | |
Work is submitted to the mesh pipeline by launching the work from the API | |
which spawns a one-dimensional array of tasks, similar to the | |
API dispatch for compute spawns a three-dimensional array of compute shader | |
work groups. If a task shader is present, each task generated by | |
this launch spawns a task shader work group. If no task shader is | |
present, each task generated by the launch spawns a mesh shader | |
work group. | |
When a task shader work group is executed, its invocations execute in | |
parallel and evaluate geometry associated with the task. The task shader | |
has no built-in or user-defined input variables other than the built-ins | |
identifying the work group and invocation being executed. The task shader | |
can use that information to read properties of the geometry associated | |
with the task from memory, using shader storage buffers, textures, or | |
other resources. The task shader determines the number of mesh shader | |
tasks that should be spawned for the task it is processing and writes the | |
task count to the built-in variable gl_TaskCountNV. Additionally, the | |
task shader can compute and write additional properties of the geometry it | |
processes to user-defined output variables qualified with "taskNV" to | |
task memory, which can be read as inputs by all of the mesh shaders that | |
it spawns. The task shader can be used to drive level-of-detail | |
calculations for procedurally generated geometry, to perform coarse-level | |
culling for batches of static or dynamic geometry, and for other forms of | |
work reduction or amplification. | |
When a mesh shader work group is executed, its invocations execute in | |
parallel to evaluate geometry corresponding to its task and emit a mesh | |
for further processing by subsequent pipeline stages. As with task | |
shaders, mesh shaders have no built-in inputs other than those identifying | |
the work group and invocation being executed, and must fetch their inputs | |
explicitly from memory. The mesh shader invocations collectively must | |
produce a mesh, which consists of: | |
* a primitive count, written to the built-in output gl_PrimitiveCountNV; | |
* a collection of vertex attributes, where each vertex in the mesh has a | |
set of built-in and user-defined per-vertex output variables and blocks; | |
* a collection of primitive attributes, where each of the | |
gl_PrimitiveCountNV primitives in the mesh has a set of built-in and | |
user-defined per-primitive output variables and blocks; and | |
* an array of vertex index values written to the built-in output array | |
gl_PrimitiveIndicesNV, where each output primitive has a set of one, | |
two, or three indices that identify the output vertices in the mesh used | |
to form the primitive. | |
The number of primitives and vertices emitted by the mesh shader can be | |
variable, but the mesh shader must specify maximum vertex and primitive | |
counts. There are implementation-dependent limits on the number of | |
vertices and primitives emitted by the mesh shader, and are also | |
implementation-dependent limits on the total amount of memory consumed by | |
a mesh. In the initial implementation of this extension, implementation | |
limits are sufficiently low that complex geometry will need to be | |
decomposed into multiple tasks. | |
A typical mesh shader used to render static triangle data might operate in | |
three phases. The first phase fetches vertex position data and local | |
index data of the primitives that the mesh represents. The index data | |
would have been prepared offline to leverage vertex re-use within the | |
mesh. In the second phase, triangles would be culled and output primitive | |
indices written. Finally, other vertex attributes of the surviving subset | |
of vertices would be loaded and computed. During this process, the | |
invocations would sometimes work on a per-vertex and sometimes on a | |
per-primitive level. | |
Additionally, mesh shaders include infrastructure to allow a single mesh | |
shader work group to compute a mesh with multiple "views" (e.g., left and | |
right eye views for stereoscopic rendering), using a "view index" similar | |
to the view IDs used in the OVR_multiview (OpenGL and OpenGL ES) and | |
VK_KHR_multiview (Vulkan) extensions. Unlike those extensions, the | |
programming model here does not run separate shader invocations for each | |
view but instead allows shaders to designate individual outputs as | |
"per-view". When a mesh shader completes, its primitives will be | |
processed separately for each view with fragments directed at separate | |
layers of the framebuffer. For each view, outputs designated as per-view | |
(such as position) will take on values written for that view and all other | |
outputs will take on a single shared value written for all views. | |
Conventional From Application | |
Vertex | | |
Pipeline v | |
Launch Mesh Tasks | |
(Fig 3.1) | | |
| +---+-----+ | |
| | | | |
| | | | |
| | Task Shader ---+ | |
| | | | | |
| | v | | |
| | Task Generation | Image Load/Store | |
| | | | Atomic Counter | |
| +---+-----+ |<--> Shader Storage | |
| | | Texture Fetch | |
| v | Uniform Block | |
| Mesh Shader ----------+ | |
| | | | |
+-------------> + | | |
| | | |
v | | |
Rasterization | | |
| | | |
v | | |
Fragment Shader ------+ | |
| | |
v | |
Per-Fragment Operations | |
| | |
v | |
Framebuffer | |
Mesh Processing Pipeline | |
Mapping to SPIR-V | |
----------------- | |
For informational purposes (non-normative), the following is an | |
expected way for an implementation to map GLSL constructs to SPIR-V | |
constructs: | |
task shader -> TaskNV Execution model | |
mesh shader -> MeshNV Execution model | |
shared qualifier -> Workgroup Storage Class (existing) | |
points layout qualifier -> OutputPoints Execution Mode (existing) | |
lines layout qualifier -> OutputLinesNV Execution Mode | |
triangles layout qualifier -> OutputTrianglesNV Execution Mode | |
max_vertices layout qualifier -> OutputVertices Execution Mode (existing) | |
max_primitives layout qualifier -> OutputPrimitivesNV Execution Mode | |
local_size_(xyz) layout qualifiers -> LocalSize Execution Mode (existing) | |
local_size_(xyz)_id layout qualifiers -> LocalSizeId Execution Mode (existing) | |
perprimitiveNV auxiliary storage qualifier -> PerPrimitiveNV Decoration | |
perviewNV auxiliary storage qualifier -> PerViewNV Decoration | |
taskNV auxiliary storage qualifier -> PerTaskNV Decoration | |
gl_WorkGroupSize -> WorkgroupSize decorated OpVariable (existing) | |
gl_WorkGroupID -> WorkgroupId decorated OpVariable (existing) | |
gl_LocalInvocationID -> LocalInvocationId decorated OpVariable (existing) | |
gl_GlobalInvocationID -> GlobalInvocationId decorated OpVariable (existing) | |
gl_LocalInvocationIndex -> LocalInvocationIndex decorated OpVariable (existing) | |
gl_TaskCountNV -> TaskCountNV decorated OpVariable | |
gl_PrimitiveCountNV -> PrimitiveCountNV decorated OpVariable | |
gl_PrimitiveIndicesNV -> PrimitiveIndicesNV decorated OpVariable | |
gl_Position -> Position decorated OpVariable (existing) | |
gl_PositionPerViewNV -> PositionPerViewNV decorated OpVariable (existing extension) | |
gl_PointSize -> PointSize decorated OpVariable (existing) | |
gl_ClipDistance -> ClipDistance decorated OpVariable (existing) | |
gl_ClipDistancePerViewNV -> ClipDistancePerViewNV decorated OpVariable | |
gl_CullDistance -> CullDistance decorated OpVariable (existing) | |
gl_CullDistancePerViewNV -> CullDistancePerViewNV decorated OpVariable | |
gl_PrimitiveID -> PrimitiveId decorated OpVariable (existing) | |
gl_Layer -> Layer decorated OpVariable (existing) | |
gl_LayerPerViewNV -> LayerPerViewNV decorated OpVariable | |
gl_ViewportIndex -> ViewportIndex decorated OpVariable (existing) | |
gl_ViewportMask -> ViewportMaskNV decorated OpVariable (existing extension) | |
gl_ViewportMaskPerViewNV -> ViewportMaskPerViewNV decorated OpVariable (existing extension) | |
gl_MeshViewCountNV -> MeshViewCountNV decorated OpVariable | |
gl_MeshViewIndicesNV -> MeshViewIndicesNV decorated OpVariable | |
gl_DrawID -> DrawIndex decorated OpVariable (existing 1.3, extension) | |
gl_MeshPerVertexNV -> block name, not needed | |
gl_MeshPerPrimitiveNV -> block name, not needed | |
writePackedPrimitiveIndices4x8NV -> OpWritePackedPrimitiveIndices4x8NV() | |
Modifications to the OpenGL Shading Language Specification, Version 4.50.6 | |
Including the following line in a shader can be used to control the | |
language features described in this extension: | |
#extension GL_NV_mesh_shader : <behavior> | |
where <behavior> is as specified in section 3.3. | |
A new preprocessor #define is added to the OpenGL Shading Language: | |
#define GL_NV_mesh_shader 1 | |
Modify the introduction to Chapter 2, Overview of OpenGL Shading (p. 7) | |
(modify first paragraph) ... Currently, these processors are the vertex, | |
tessellation control, tessellation evaluation, geometry, fragment, | |
compute, task, and mesh processors. | |
(modify second paragraph) ... The specific languages will be referred to | |
by the name of the processor they target: vertex, tessellation control, | |
tessellation evaluation, geometry, fragment, compute, task, or mesh. | |
Insert new sections at the end of Chapter 2 (p. 9) | |
Section 2.7, Task Processor | |
The task processor is a programmable unit that operates in conjunction | |
with the mesh processor to produce a collection of primitives that will be | |
processed by subsequent stages of the graphics pipeline. The task and | |
mesh processors form a primitive processing pipeline that can be used | |
instead of the conventional primitive processing pipeline that includes | |
the vertex, tessellation control, tessellation evaluation, and geometry | |
processors. Compilation units written in the OpenGL Shading Language to | |
run on this processor are called task shaders. When a set of task shaders | |
is successfully compiled and linked, they result in a task shader | |
executable that runs on the task processor. | |
A task shader has access to many of the same resources as fragment and | |
other shader processors, including textures, buffers, image variables, and | |
atomic counters. The task shader has no fixed-function inputs other than | |
variables identifying the specific work group and invocation; any vertex | |
attributes or other data required by the task shader must be fetched from | |
memory. The only fixed output of the task shader is a task count, | |
identifying the number of mesh shader work groups to spawn. The task | |
shader can write additional outputs to task memory, which can be read by | |
all of the mesh shader work groups it spawns. | |
A task shader operates on a group of work items called a work group. A | |
work group is a collection of shader invocations that execute the same | |
code, potentially in parallel. An invocation within a work group may share | |
data with other members of the same work group through shared variables | |
and issue memory and control barriers to synchronize with other members of | |
the same work group. | |
Section 2.8, Mesh Processor | |
The mesh processor is a programmable unit that operates in conjunction | |
with the task processor to produce a collection of primitives that will be | |
processed by subsequent stages of the graphics pipeline. The task and | |
mesh processors form a primitive processing pipeline that can be used | |
instead of the conventional primitive processing pipeline that includes | |
the vertex, tessellation control, tessellation evaluation, and geometry | |
processors. Compilation units written in the OpenGL Shading Language to | |
run on this processor are called mesh shaders. When a set of mesh shaders | |
is successfully compiled and linked, they result in a mesh shader | |
executable that runs on the mesh processor. | |
A mesh shader has access to many of the same resources as fragment and | |
other shader processors, including textures, buffers, image variables, and | |
atomic counters. The only inputs available to the mesh shader are | |
variables identifying the specific work group and invocation and any | |
outputs written to task memory by the task shader that spawned the mesh | |
shader's work group. Any vertex attributes or other data required by the | |
mesh shader must be fetched from memory. The invocations of the mesh | |
shader work group write an output mesh, comprising a set of primitives | |
with per-primitive attributes, a set of vertices with per-vertex | |
attributes, and an array of indices identifying the mesh vertices that | |
belong to each primitive. The primitives of this mesh are then processed | |
by subsequent graphics pipeline stages, where the outputs of the mesh | |
shader form an interface with the fragment shader. | |
A mesh shader operates on a group of work items called a work group. A | |
work group is a collection of shader invocations that execute the same | |
code, potentially in parallel. An invocation within a work group may share | |
data with other members of the same work group through shared variables | |
and issue memory and control barriers to synchronize with other members of | |
the same work group. | |
Modify Section 3.6, Keywords (p. 18) | |
(add to the end of the list of keywords, p. 19) | |
perprimitiveNV | |
perviewNV | |
taskNV | |
Modify Section 3.8.2, Dynamically Uniform Expressions and Uniform Control | |
Flow (p. 21) | |
(modify third paragraph of this section) | |
An invocation group is the complete set of invocations collectively | |
processing a particular compute, task, or mesh shader workgroup, or a | |
graphical operation, where the scope ... | |
Modify Section 4.3, Storage Qualifiers (p. 43) | |
(modify table of base storage qualifiers, p. 43) | |
Qualifier Meaning | |
------------------ ----------------------------------------------- | |
shared variable storage for compute, task, and mesh shaders | |
shared across all work items in a local work group | |
(add to table of auxiliary storage qualifiers, p. 44) | |
Auxiliary Storage | |
Qualifier Meaning | |
------------------ ----------------------------------------------- | |
perprimitiveNV mesh shader outputs with per-primitive instances | |
perviewNV mesh shader outputs with per-view instances | |
taskNV generic outputs for task shader work groups | |
Modify Section 4.3.4, Input Variables (p. 46) | |
(modify third paragraph, p. 47, to treat all mesh shader outputs as | |
"arrayed" interfaces) | |
Some inputs and outputs are arrayed ... Geometry shader inputs, | |
tessellation control shader inputs and outputs, tessellation evaluation | |
inputs, and mesh shader outputs all have an additional level of arrayness | |
relative to other shader inputs and outputs. Component limits for these | |
arrayed interfaces (e.g., gl_MaxTessControlInputComponents) are limits for | |
a single instance and not for the entire interface. | |
(insert before the last paragraph, p. 47, "Fragment shader inputs get") | |
Task shaders do not permit user-defined input variables and do not form a | |
formal interface with any previous shader stage. See section 7.1 "Built-In | |
Variables" for a description of built-in task shader input variables. All | |
other input to a task shader is retrieved explicitly through image loads, | |
texture fetches, loads from uniforms, uniform buffers, or shader storage | |
buffers, or other user supplied code. Redeclaration of built-in input | |
variables in task shaders is not permitted. | |
Mesh shaders form an interface with task shaders and support a collection | |
of input variables in task memory. All user-defined mesh shader inputs | |
must be declared as members of a single interface block qualified with | |
"taskNV" qualifier. Mesh shaders do not support user-defined inputs | |
declared outside interface blocks or without "taskNV" and do not support | |
more than one input interface block. In addition to user-defined inputs, | |
mesh shaders support the built-in input variables described in section | |
7.1. User-defined mesh shader input variables are filled with the values | |
of matching user-defined output variables written by the task shader. As | |
with other input variables, mesh shader inputs in task memory must be | |
declared using the same type and qualification as task memory outputs from | |
the previous (task) shader stage. It is a compile-time error to use the | |
"taskNV" qualfier with inputs in any stage other than the mesh shader. | |
All other input to a task shader is retrieved explicitly through image | |
loads, texture fetches, loads from uniforms, uniform buffers, or shader | |
storage buffers, or other user supplied code. Redeclaration of built-in | |
input variables in mesh shaders is not permitted. | |
(modify last paragraph, p. 47) | |
Fragment shader inputs get... The auxiliary storage qualifiers centroid, | |
sample, and perprimitiveNV can also be applied, as well as... | |
(modify first paragraph, p. 48) | |
Fragment shader inputs that are signed or unsigned integers, integer | |
vectors, or any double-precision floating-point type must be qualified | |
with the interpolation qualifier flat or with the auxillary storage | |
qualifier perprimitiveNV. | |
(add a new example to the second paragraph, p. 48) | |
perprimitiveNV in vec3 triangleNormal; | |
(modify third paragraph, p. 48) | |
The fragment shader inputs form an interface with the mesh shader or last | |
active shader in the conventional vertex processing pipeline (e.g., | |
vertex, tessellation evaluation, geometry). ... Also, interpolation | |
qualification (e.g., flat) and auxiliary qualification other than | |
"perprimitiveNV" (e.g. centroid) may differ. ... | |
Modify Section 4.3.6, Output Variables (p. 49) | |
(modify last paragraph, p. 49 to add task and mesh shaders) | |
It is a compile-time error to declare a vertex, tessellation evaluation, | |
tessellation control, geometry, task, or mesh shader output that contains | |
any of the following: ... | |
(insert before the next-to-last paragraph "The order of execution", p. 50) | |
Task shader output variables may be used to write values in task memory | |
that can be read by the mesh shader invocations for the tasks that it | |
spawns. All user-defined task shader outputs must be declared as members | |
of a single interface block qualified with "taskNV" qualifier. Task | |
shaders do not support user-defined outputs declared outside interface | |
blocks or without "taskNV" and do not support more than one output. It is | |
a compile-time error to use the "taskNV" qualifier in output declarations | |
in any other shader stage. | |
Mesh shader output variables may be used to write per-vertex or | |
per-primitive data. Output variables qualified with "perprimitiveNV" | |
have separate instances for each primitive in the output mesh; all other | |
output variables have separate instances for each vertex in the output | |
mesh. It is a compile-time error to use the "perprimitiveNV" qualifier | |
in output declarations in any other shader stage. Both types of output | |
variables are arrayed (see "arrayed" under 4.3.4, Inputs) and each | |
per-vertex or per-primitive output variable (or output block, see | |
interface blocks below) needs to be declared as an array. For example, | |
out float vertexColor[]; // per-vertex color | |
perprimitiveNV out vec3 triangleNormal[]; // per-triangle normal | |
Each element of such an array corresponds to one vertex or primitive of | |
the output mesh. Each array can optionally have a size declared. The | |
array size will be set by (or if provided must be consistent with) the | |
output layout declaration(s) establishing the maximum number of vertices | |
and primitives in the output mesh. When checking a mesh shader against | |
implementation limits on the total number of output variable components, | |
the compiler adds the number of per-vertex outputs for a single vertex | |
instance and the number of per-primitive outputs for a single primitive | |
instance. Unlike tessellation control shaders, a mesh shader invocation | |
may write to outputs for any vertex or primitive. | |
Mesh shader outputs qualified with "perviewNV" are considered to be | |
per-view and arrayed with a second additional level of arrayness. Each | |
non-block output variable must to be declared as an array with at least | |
two dimensions. For output block members, one level of arrayness applies | |
to the block declaration and a second applies to the block member | |
declaration. For example, | |
perviewNV out float perViewVertexColor[][]; | |
out PerVertexBlock { | |
perviewNV vec2 perViewTextureCoord[]; | |
} v[]; | |
For non-block output variables, each element in the outer (leftmost) | |
dimension of such an array corresponds to one vertex or primitive of the | |
output mesh, as described immediately above. Each element in the second | |
(next-to-leftmost) dimension corresponds to a single view of the output | |
primitive or vertex. The array dimension corresponding to the view number | |
can optionally have a size declared. The array size will be set to (or if | |
provided must be consistent with) the maximum number of views supported by | |
the implementation given by the constant gl_MaxMeshViewCountNV. | |
When using per-view outputs, all view instances of per-view outputs count | |
separately against implementation limits on the total number of output | |
components. Additionally, values for extra views will be stored in the | |
upper end of the set of available locations for mesh shader outputs. A | |
compile- or link-time error will be generated if extra storage required | |
for extra per-view outputs leaves the compiler unable to assign locations | |
for all outputs or includes a location already consumed by an active | |
output variable with an associated "location" layout qualifier. | |
(modify the next-to-last and last paragraph, p. 50) | |
The order of execution of tessellation control, task, and mesh shader | |
invocations relative to the other invocations for the same input patch or | |
local work group is undefined unless the built-in function barrier() is | |
used to provide some control over relative execution order. When a shader | |
invocation calls barrier(), ... | |
Because tessellation control, task, and mesh shader invocations execute in | |
undefined order between barriers, the values of output variables will | |
sometimes be undefined. ... | |
Modify Section 4.3.8, Shared Variables (p. 52) | |
(modify first paragraph of the section, p. 52) | |
The shared qualifier is used to declare variables that have storage shared | |
between all work items in a compute, task, or mesh shader local work | |
group. Variables declared as shared may only be used in compute, task, or | |
mesh shaders. ... | |
(modify last paragraph of the section, p. 52) | |
There is a limit to the total size of all variables declared as shared in | |
a single shader stage. This limit, expressed in units of basic machine | |
units may be determined by using the OpenGL API to query the value of | |
MAX_COMPUTE_SHARED_MEMORY_SIZE (compute shaders), | |
MAX_TASK_SHARED_MEMORY_SIZE_NV (task shaders), or | |
MAX_MESH_SHARED_MEMORY_SIZE_NV (mesh shaders) | |
Modify Section 4.3.9, Interface Blocks, p. 52 | |
(rework grammar rules, p. 53, to allow "taskNV", "perprimitiveNV", and | |
"perviewNV" to qualify blocks) | |
interface-qualifier: | |
in-block-qualifiers(_opt) "in" | |
out-block-qualifiers(_opt) "out" | |
uniform | |
buffer | |
// Note: Not shown for simplicity, but memory qualifiers may also be used | |
in-block-qualifiers: | |
patch | |
taskNV | |
perprimitiveNV | |
out-block-qualifiers: | |
out-block-qualifier | |
out-block-qualifier out-block-qualifiers | |
out-block-qualifier: | |
patch | |
taskNV | |
perprimitiveNV | |
perviewNV | |
Modify Section 4.4, Layout Qualifiers, p. 57 | |
(modify the layout qualifier table, pp. 58-59) | |
Layout Qualifier | Qualifier | Individual | Block | Block | Allowed interfaces | |
| only | variable | | Member | | |
-------------------+-----------+------------+-------+--------+-------------------- | |
local_size_x = | | | | | compute in | |
local_size_y = | X | | | | mesh in | |
local_size_z = | | | | | task in | |
-------------------+-----------+------------+-------+--------+-------------------- | |
max_vertices = | X | | | | geometry out | |
| | | | | mesh out | |
-------------------+-----------+------------+-------+--------+-------------------- | |
max_primitives = | X | | | | mesh out | |
-------------------+-----------+------------+-------+--------+-------------------- | |
[ points ] | | | | | | |
[ lines ] | X | | | | mesh out | |
[ triangles ] | | | | | | |
Add new Section 4.4.1.5, Task Shader Inputs, p. 67 | |
(note: the content of this section is nearly identical to the content of | |
section 4.4.1.4, Compute Shader Inputs) | |
There are no layout location qualifiers for task shader inputs. | |
Layout qualifier identifiers for task shader inputs are the work group | |
size qualifiers: | |
layout-qualifier-id : | |
local_size_x = integer-constant-expression | |
local_size_y = integer-constant-expression | |
local_size_z = integer-constant-expression | |
These task shader input layout qualifers behave identically to the | |
equivalent compute shader qualifiers and specify a fixed local group size | |
used for each task shader work group. If no size is specified in any of | |
the three dimensions, a default size of one will be used. | |
If the fixed local group size of the shader in any dimension is greater | |
than the maximum size supported by the implementation for that dimension, | |
a compile-time error results. Also, if such a layout qualifier is | |
declared more than once in the same shader, all those declarations must | |
set the same set of local workgroup sizes and set them to the same values; | |
otherwise a compile-time error results. If multiple task shaders attached | |
to a single program object declare a fixed local group size, the | |
declarations must be identical; otherwise a link-time error results. | |
Furthermore, if a program object contains any task shaders, at least one | |
must contain an input layout qualifier specifying a fixed local group size | |
for the program, or a link-time error will occur. | |
Note that task shaders do not currently support multi-dimensional work | |
groups; the maximum value for local_size_y and local_size_z will be one. | |
Add new Section 4.4.1.6, Mesh Shader Inputs, p. 67 | |
(note: the content of this section is nearly identical to the content of | |
section 4.4.1.4, Compute Shader Inputs) | |
There are no layout location qualifiers for mesh shader inputs. | |
Layout qualifier identifiers for mesh shader inputs are the work group | |
size qualifiers: | |
layout-qualifier-id : | |
local_size_x = integer-constant-expression | |
local_size_y = integer-constant-expression | |
local_size_z = integer-constant-expression | |
These mesh shader input layout qualifers behave identically to the | |
equivalent compute shader qualifiers and specify a fixed local group size | |
used for each mesh shader work group. If no size is specified in any of | |
the three dimensions, a default size of one will be used. | |
If the fixed local group size of the shader in any dimension is greater | |
than the maximum size supported by the implementation for that dimension, | |
a compile-time error results. Also, if such a layout qualifier is | |
declared more than once in the same shader, all those declarations must | |
set the same set of local workgroup sizes and set them to the same values; | |
otherwise a compile-time error results. If multiple mesh shaders attached | |
to a single program object declare a fixed local group size, the | |
declarations must be identical; otherwise a link-time error results. | |
Furthermore, if a program object contains any mesh shaders, at least one | |
must contain an input layout qualifier specifying a fixed local group size | |
for the program, or a link-time error will occur. | |
Note that mesh shaders do not currently support multi-dimensional work | |
groups; the maximum value for local_size_y and local_size_z will be one. | |
Modify section 4.4.2.1, Transform Feedback Layout Qualifiers, p. 69 | |
(add a new paragraph at the end of the section, p. 71) | |
Transform feedback is not supported to capture the outputs of task and | |
mesh shaders. Use of transform feedback layout qualifiers in these shader | |
types will result in a compile-time error. | |
Add new Section 4.4.2.5, Mesh Shader Outputs, p. 75 | |
Mesh shaders can have three additional types of output layout identifiers: | |
an output primitive type, a maximum output vertex count, and a maximum | |
output primitive count. The primitive type, vertex and primitive count | |
identifiers are allowed only on the interface qualifier out, not on an | |
output block, block member, or variable declaration. | |
The layout qualifier identifiers for mesh shader outputs are | |
layout-qualifier-id : | |
points | |
lines | |
triangles | |
max_vertices = integer-constant-expression | |
max_primitives = integer-constant-expression | |
The primitive type identifiers "points", "lines", and "triangles" are used | |
to specify the type of output primitive produced by the mesh shader, and | |
only one of these is accepted. At least one mesh shader (compilation | |
unit) in a program must declare an output primitive type, and all mesh | |
shader output primitive type declarations in a program must declare the | |
same primitive type. It is not required that all mesh shaders in a | |
program declare an output primitive type. | |
The vertex count identifier "max_vertices" is used to specify the maximum | |
number of vertices the shader will ever emit for the invocation group. At | |
least one mesh shader (compilation unit) in a program must declare a | |
maximum output vertex count, and all mesh shader output vertex count | |
declarations in a program must declare the same count. It is not required | |
that all mesh shaders in a program declare a count. | |
The primitive count identifier "max_primitives" is used to specify the | |
maximum number of primitives the shader will ever emit for the invocation | |
group. At least one mesh shader (compilation unit) in a program must | |
declare a maximum output primitive count, and all mesh shader output | |
primitive count declarations in a program must declare the same count. It | |
is not required that all mesh shaders in a program declare a count. | |
The intrinsically declared output block gl_MeshVerticesNV[] and any user-defined | |
output variables or blocks not qualified with "perprimitiveNV" will be | |
sized by the "max_vertices" output declaration. The intrinsically | |
declared output block gl_MeshPrimitivesNV[] and any user-defined output | |
variables or blocks qualified with "perprimitiveNV" will be sized by the | |
"max_primitives" output declaration. The intrinsically declared array | |
gl_PrimitiveIndicesNV[] will be sized according to the primitive type and | |
"max_primitives" declarations, where the size is: | |
* the value of "max_primitives" if "points" is declared | |
* two times the value of "max_primitives" if "lines" is declared, or | |
* three times the value of "max_primitives" if "triangles" is declared. | |
For outputs declared without an array size, including intrinsically | |
declared outputs (e.g., gl_MeshVerticesNV), a layout must be declared before any use | |
of the method length() or other array use that requires its size to be | |
known. It is a compile-time error if an output array is declared with an | |
explicit size that does not match the array size derived from the layout | |
qualifier. | |
Modify Section 4.5, Interpolation Qualifiers, p. 83 | |
(modify first paragraph of the section, p. 83) | |
The presence of and type of interpolation is controlled by the above | |
interpolation qualifiers as well as the auxiliary storage qualifiers | |
centroid and sample. The auxiliary storage qualifiers "patch", "taskNV", | |
"perprimitiveNV" are not used for interpolation; it is a compile-time | |
error to use interpolation qualifiers with those auxillary storage | |
qualifiers. The auxillary storage qualifier "perviewNV" may not be used | |
when declaring fragment shader inputs, but can be used with interpolation | |
qualifiers in the declaration of mesh shader outputs. | |
(add a new paragraph at the end of the section, p. 84) | |
A variable qualified with the auxillary storage qualifier | |
"perprimitiveNV" will also not be interpolated. Instead, it will use | |
the same per-primitive value for all fragments generated by each | |
primitive. Such a variable can also qualified with an interpolation | |
qualifier with centroid or sample, but those qualifications will mean the | |
same thing as only qualifying with "perprimitiveNV". | |
Modify Section 7.1, Built-In Language Variables (p. 120) | |
(insert after the first paragraph and variable list, p. 123) | |
In the task language, built-in variables are intrinsically declared as: | |
const uvec3 gl_WorkGroupSize; | |
in uvec3 gl_WorkGroupID; | |
in uvec3 gl_LocalInvocationID; | |
in uvec3 gl_GlobalInvocationID; | |
in uint gl_LocalInvocationIndex; | |
in uint gl_MeshViewCountNV; | |
in uint gl_MeshViewIndicesNV[]; | |
out uint gl_TaskCountNV; | |
In the mesh language, built-in variables are intrinsically declared as: | |
const uvec3 gl_WorkGroupSize; | |
in uvec3 gl_WorkGroupID; | |
in uvec3 gl_LocalInvocationID; | |
in uvec3 gl_GlobalInvocationID; | |
in uint gl_LocalInvocationIndex; | |
in uint gl_MeshViewCountNV; | |
in uint gl_MeshViewIndicesNV[]; | |
out uint gl_PrimitiveCountNV; | |
out uint gl_PrimitiveIndicesNV[]; | |
out gl_MeshPerVertexNV { | |
vec4 gl_Position; | |
perviewNV vec4 gl_PositionPerViewNV[]; // NVX_multiview_per_view_attributes | |
float gl_PointSize; | |
float gl_ClipDistance[]; | |
perviewNV float gl_ClipDistancePerViewNV[][]; | |
float gl_CullDistance[]; | |
perviewNV float gl_CullDistancePerViewNV[][]; | |
} gl_MeshVerticesNV[]; | |
perprimitiveNV out gl_MeshPerPrimitiveNV { | |
int gl_PrimitiveID; | |
int gl_Layer; | |
perviewNV int gl_LayerPerViewNV[]; | |
int gl_ViewportIndex; | |
int gl_ViewportMask[]; // NV_viewport_array2 | |
perviewNV int gl_ViewportMaskPerViewNV[][]; | |
} gl_MeshPrimitivesNV[]; | |
(modify the discussion of the built-in variables shared with compute | |
shaders, which starts on p. 123) | |
The built-in constant gl_WorkGroupSize is a compute, task, or mesh shader | |
constant containing the local work-group size of the shader. The size ... | |
The built-in variable gl_WorkGroupID is a compute, task, or mesh shader | |
input variable containing the three-dimensional index of the global work | |
group that the current invocation is executing in. ... | |
The built-in variable gl_LocalInvocationID is a compute, task, or mesh | |
shader input variable containing the three-dimensional index of the local | |
work group within the global work group that the current invocation is | |
executing in. ... | |
The built-in variable gl_GlobalInvocationID is a compute, task, or mesh | |
shader input variable containing the global index of the current work | |
item. This value uniquely identifies this invocation from all other | |
invocations across all local and global work groups initiated by the | |
current DispatchCompute or DispatchMeshTasksNV call or by a previously | |
executed task shader. ... | |
The built-in variable gl_LocalInvocationIndex is a compute, task, or mesh | |
shader input variable that contains the one-dimensional representation of | |
the gl_LocalInvocationID. | |
(modify discussion of gl_PrimitiveID, gl_Layer, and gl_ViewportIndex to | |
allow as a mesh output, pp. 125-127) | |
The output variable gl_PrimitiveID is available only in the geometry and | |
mesh languages and provides a single integer that serves as a primitive | |
identifier. This is then available to fragment shaders as the fragment | |
input gl_PrimitiveID, which will select the written primitive ID from the | |
provoking vertex in the primitive being shaded when using a geometry | |
shader or from the appropriate per-primitive output value when using a | |
mesh shader. If a fragment shader using gl_PrimitiveID is active and a | |
geometry or mesh shader is also active, the geometry or mesh shader must | |
write to gl_PrimitiveID or the fragment shader input gl_PrimitiveID is | |
undefined. ... | |
The variable gl_Layer is available as an output variable in the geometry | |
and mesh languages and an input variable in the fragment language. In the | |
geometry and mesh languages, it is used to select a specific layer (or | |
face and layer of a cube map) of a multi-layer framebuffer attachment. | |
When using a geometry shader, the actual layer used will come from one of | |
the vertices in the primitive being shaded. Which vertex the layer comes | |
from is discussed in section 11.3.4.6 "Layer and Viewport Selection" of | |
the OpenGL Specification. It might be undefined, so it is best to write | |
the same layer value for all vertices of a primitive. When using a mesh | |
shader, the actual layer will come from the appropriate per-primitive | |
output value written by the mesh shader. ... | |
The input variable gl_Layer in the fragment language will have the same | |
value that was written to the output variable gl_Layer in the geometry or | |
mesh language. If the geometry or mesh stage does not dynamically assign | |
... If the geometry or mesh stage makes no static assignment to gl_Layer, | |
the input value... Otherwise, the fragment stage will read the same value | |
written by the geometry or mesh stage, even if... | |
The variable gl_ViewportIndex is available as an output variable in the | |
geometry and mesh languages and an input variable in the fragment | |
language. In the geometry and mesh language, it provides the ... | |
Primitives generated by the geometry or mesh shader will undergo viewport | |
transformation and scissor testing using the viewport transformation and | |
scissor rectangle selected by the value of gl_ViewportIndex. When using a | |
geometry shader, the viewport index used will come from one of the | |
vertices in the primitive being shaded. However, which vertex the | |
viewport index comes from is implementation-dependent, so it is best to | |
use the same viewport index for all vertices of the primitive. When using | |
a mesh shader, the viewport index used will come from the appropriate | |
per-primitive output value written by the mesh shader. If a geometry or | |
mesh shader does not assign a value to gl_ViewportIndex, ... If a | |
geometry or mesh shader statically assigns a value to gl_ViewportIndex... | |
The input variable gl_ViewportIndex in the fragment stage will have the | |
same value that was written to the output variable gl_ViewportIndex in the | |
geometry or mesh stage. If the geometry or mesh stage does not dynamically | |
assign... If the geometry or mesh stage makes no static assignment... | |
Otherwise, the fragment stage will read the same value written by the | |
geometry or mesh stage, even if... | |
(insert new paragraphs before the seventh paragraph, starting with | |
"Fragment shaders output values", p. 127, describing new task and mesh | |
built-in variables) | |
The input variable gl_MeshViewCountNV is only available in the mesh and | |
task languages and defines the number of views processed by the current | |
mesh and task shader invocations. When using the multi-view API feature, | |
the primitives emitted by the mesh shader will be processed separately for | |
each enabled view and sent to a different layer of a layered render | |
target. Mesh shader outputs qualified with "perviewNV" are declared as | |
arrays with separate values for each view. To ensure defined results, | |
mesh shaders must write values for array elements zero through | |
gl_MeshViewCountNV-1 for each such per-view output. | |
The input variable gl_MeshViewIndicesNV is only available in the mesh and | |
task languages. This variable is an array where each element holds the | |
view number of one of the views being processed by the current mesh and | |
task shader invocations. The array elements with indices greater than or | |
equal to the value of gl_MeshViewCountNV are undefined. If the value of | |
gl_MeshViewIndicesNV[i] is <j>, then any outputs qualified with | |
"perviewNV" will take on the value of array element <i> when processing | |
primitives for view index <j>. | |
The output variable gl_TaskCountNV is only available in the task language | |
and defines the number of subsequent mesh shader work groups to generate | |
upon completion of the task shader. | |
The output variable gl_PrimitiveCountNV is only available in the mesh | |
language and defines the number of primitives in the output mesh produced | |
by the mesh shader that should be processed by subsequent pipeline stages. | |
The output array variable gl_PrimitiveIndicesNV[] is only available in the | |
mesh language. Depending on the output primitive type declared using a | |
layout qualifier, each group of one (points), two (lines), three | |
(triangles) specifies the indices of the vertices making up the primitive. | |
All index values must be in the range [0, N-1], where N is the value of | |
the "max_vertices" layout qualifier. Out-of-bounds index values will | |
result in undefined behavior. | |
The mesh shader output block members gl_PositionPerViewNV[], | |
gl_ClipDistancePerViewNV[][], gl_CullDistancePerViewNV[], | |
gl_LayerPerViewNV[], and glViewportMaskPerViewNV[][] are per-view versions | |
of the single-view variables with equivalent names that lack the | |
"PerViewNV" suffix: | |
Per-View Variable Single-View Variable | |
---------------------------- -------------------- | |
gl_PositionPerViewNV[] gl_Position | |
gl_ClipDistancePerViewNV[][] gl_ClipDistance[] | |
gl_CullDistancePerViewNV[][] gl_CullDistance[] | |
gl_LayerPerViewNV[] gl_Layer | |
gl_ViewportMaskPerViewNV[][] gl_ViewportMask[] | |
All of these outputs are considered arrayed, with separate values for each | |
view. The view number is used to index in the first dimension of these | |
arrays. For all of these variables, if a shader statically assigns a | |
value to any element of a per-view array, it may not statically assign a | |
value to the equivalent single-view variable in any mesh shader | |
compilation unit. | |
As with the gl_ClipDistance[] and gl_CullDistance[] arrays, the second | |
dimension of gl_ClipDistancePerViewNV[] and gl_CullDistancePerViewNV[] is | |
predeclared as unsized and must be sized by the shader either redeclaring | |
it with a size or indexing it only with integral constant expressions. The | |
size determines the number and set of enabled clip or cull distances and | |
can be at most gl_MaxClipDistances or gl_MaxCullDistances, respectively. | |
The number of varying components consumed by these arrays will match the | |
size of the array, and shaders writing to either array must write all | |
enabled distances, or clipping/culling results will be undefined. | |
(modify the fifth paragraph, p. 129) | |
The gl_PerVertex, gl_MeshPerVertexNV, and gl_MeshPerPrimitiveNV blocks can | |
be redeclared in a shader to explicitly indicate what subset of the fixed | |
pipeline interface will be used. ... | |
(modify the sixth paragraph, p. 129) | |
This establishes the output interface the shader will use with the | |
subsequent pipeline stage. It must be a subset of the built-in members of | |
gl_PerVertex, gl_MeshPerVertexNV, or gl_MeshPerPrimitiveNV. ... | |
Modify Section 7.3, Built-In Constants (p. 136) | |
Add to the end of the long list of constants that makes up this section: | |
const int gl_MaxMeshViewCountNV = 4; | |
Add new Section 8.xx, Mesh Shader Functions, after section 8.15, p. 187 | |
These functions are only available in mesh shaders. | |
Insert a syntax/description table similar to the previous section. | |
Syntax: | |
void writePackedPrimitiveIndices4x8NV(uint indexOffset, | |
uint packedIndices) | |
Description: | |
Interprets the <packedIndices> as four 8 bit unsigned int values and | |
stores them into the gl_PrimitiveIndicesNV array starting from the | |
provided <indexOffset>, which must be a multiple of four. | |
Lower bytes are stored at lower addresses in the array. | |
The write operations must not exceed the size of the | |
gl_PrimitiveIndicesNV array. | |
Modify Section 8.16, Shader Invocation Control Functions, p. 186 | |
(modify first paragraph of the section, p. 186) | |
The shader invocation control function is available only in tessellation | |
control, compute, task, and mesh shaders and compute shaders. It is used | |
to control the relative execution order of multiple shader invocations | |
used to process a patch (in the case of tessellation control shaders) or a | |
local work group (in the case of compute, task, and mesh shaders), which | |
are otherwise executed with an undefined relative order. | |
(modify the last paragraph, p. 186) | |
For compute, task, and mesh shaders, the barrier() function may be placed | |
within flow control, but that flow control must be uniform flow control. | |
... | |
Modify Section 8.17, Shader Memory Control Functions, p. 187 | |
(modify table of functions, p. 187) | |
void memoryBarrierShared() | |
Control the ordering of memory transactions to shared variables issued | |
within a single shader invocation. | |
Only available in compute, task, and mesh shaders. | |
void groupMemoryBarrier() | |
Control the ordering of all memory transactions issued within a single | |
shader invocation, as viewed by other invocations in the same work | |
group. | |
Only available in compute, task, and mesh shaders. | |
(modify last paragraph, p. 187) | |
... all of the above variable types. The functions memoryBarrierShared() | |
and groupMemoryBarrier() are available only in compute, task, and mesh | |
shaders; the other functions are available in all shader types. | |
(modify last paragraph, p. 188) | |
... When using the function groupMemoryBarrier(), this ordering guarantee | |
applies only to other shader invocations in the same compute, task, or | |
mesh shader work group; all other memory barrier functions provide the | |
guarantee to all other shader invocations. ... | |
Interactions with GLSL 4.60 and KHR_vulkan_glsl | |
If GLSL 4.60 or KHR_vulkan_glsl is supported, the layout qualifiers | |
"local_size_x_id", "local_size_y_id", and "local_size_z_id" are supported | |
in mesh and task shaders, as in compute shaders. | |
In the big layout qualifier table in section 4.4, add: | |
Layout Qualifier | Qualifier | Individual | Block | Block | Allowed interfaces | |
| only | variable | | Member | | |
-------------------+-----------+------------+-------+--------+-------------------- | |
local_size_x_id = | | | | | compute in | |
local_size_y_id = | X | | | | mesh in | |
local_size_z_id = | | | | | task in | |
| | | | | (SPIR-V generation | |
| | | | | only) | |
No changes are required to the spec language describing these layout | |
qualifiers, since the language doesn't specifically reference compute | |
shaders and the mesh/task support should be identical. | |
Interactions with NV_viewport_array2 | |
If NV_viewport_array2 is not supported, remove gl_ViewportMask[] from the | |
gl_PerPrimitiveNV block declaration. | |
Interactions with NV_stereo_view_rendering | |
Mesh shaders support a fully generic set of per-view positions and | |
viewport masks, so we include no support for the more limited | |
gl_SecondaryPositionNV and gl_SecondaryViewportMaskNV[] built-ins from | |
NV_stereo_view_rendering. | |
Interactions with NVX_multiview_per_view_attributes | |
If NVX_multiview_per_view_attributes is not supported, remove | |
gl_PositionPerViewNV[] from the gl_PerVertex block declaration and remove | |
gl_ViewportMaskPerViewNV[] from the gl_PerPrimitiveNV block declaration. | |
If NVX_multiview_per_view_attributes is supported, it is a compile-time | |
error for a mesh shader to make a static assignment to | |
gl_PositionPerViewNV as well as to either of gl_Position or | |
gl_SecondaryPositionNV. | |
If NVX_multiview_per_view_attributes is supported, it is a compile-time | |
error for a mesh shader to make a static assignment to | |
gl_ViewportMaskPerViewNV[] as well as to either of glViewportMask[] or | |
gl_SecondaryViewportMaskNV[]. | |
Interactions with ARB_shader_draw_parameters | |
If ARB_shader_draw_parameters is supported, the task and mesh shaders | |
will also have the following built-in inputs: | |
in int gl_DrawIDARB; | |
The variable <gl_DrawIDARB> is a vertex, task and mesh language input | |
variable that holds the integer index of the drawing command to which the | |
current vertex belongs (see "Shader Inputs" in section 11.1.3.9 of the | |
OpenGL Graphics System Specification), or for the latter the current | |
task or mesh workgroup. If the vertex or workgroup is not invoked by a | |
Multi* form of a draw command, then the value of gl_DrawIDARB is zero. | |
Interactions with EXT_clip_cull_distance | |
If implemented with OpenGL ES ESSL and EXT_clip_cull_distance is not | |
supported, remove references to gl_ClipDistance, gl_CullDistance, | |
gl_ClipDistancePerViewNV and gl_CullDistancePerViewNV. | |
Issues | |
(1) What are the matching requirements between mesh outputs declared | |
with "perprimitiveNV" and fragment shader inputs? What should we do | |
with interpolation and other auxillary storage qualifiers on | |
per-primitive values? | |
RESOLVED: In the initial implementation of this extension, reading | |
per-primitive mesh shader outputs in a fragment shader would return | |
incorrect/undefined values if the fragment shader input has no special | |
qualification. As a result, we require that mesh shader outputs | |
qualified with "perprimitiveNV" be matched with fragment shader inputs | |
qualified with "perprimitiveNV" and vice versa. | |
We currently allow any of the interpolation and related auxillary | |
storage qualifiers (e.g, flat, centroid) on fragment shader inputs | |
qualified with "perprimitiveNV". These qualifiers have no effect. This | |
resolution is consistent with the core GLSL specification language that | |
allows (and ignores) auxilliary storage qualifiers such as "sample" or | |
"centroid" to be used on inputs qualified by "flat", despite the fact | |
that the storage qualifiers are meaningless for flat-shaded attributes. | |
(2) How do "arrayed" outputs and blocks work for mesh shaders? Do you | |
have to declare an array dimension? If you do declare an array | |
dimension, how is it checked? | |
RESOLVED: The rules for mesh shader outputs are the same as for arrayed | |
inputs and outputs in tessellation control, tessellation evaluation, and | |
geometry shaders. When declaring an "arrayed" block, the size is | |
optional. If omitted, the size is taken from the maximum vertex or | |
primitive counts declared using layout qualifiers ("max_vertices" and | |
"max_primitives"). If a size is provided, it must match the limits | |
specified by the layout qualifiers. | |
(3) How are location layout qualfiers handled in mesh and task shaders? | |
Do we support some sort of layout or offset qualifier for task memory? | |
RESOLVED: For mesh shader outputs, the "location" layout qualifier is | |
supported and is used for interface matching with the fragment shader. | |
Locations assigned to mesh shader outputs have the same semantics as | |
locations assigned to vertex, tessellation control, tessellation | |
evaluation, and geometry shader outputs. As with tessellation control | |
shaders, mesh shader outputs are "arrayed" with separate instances of | |
each variable or block for each output vertex or primitive. These | |
multiple instances do not consume separate locations for each | |
vertex/primitive. | |
For task shader outputs (used as mesh shader inputs), we've chosen not | |
to support any location or offset layout qualifiers. Instead, we limit | |
task and mesh shaders to use at most one block qualified by "taskNV" and | |
do not allow non-block variables to use "taskNV". With a single block | |
where member declarations need to match between stages, any internal | |
offsets/locations can be assigned by the compiler without any external | |
annotation. | |
(4) For mesh shaders supporting multiple views, how do applications | |
specify the set of views that should be produced? | |
RESOLVED: Ignoring mesh shaders, there are significant differences in | |
how multiple views are handled in OpenGL and Vulkan. OVR_multiview | |
(OpenGL ES) specifies the view count using the "num_views" layout | |
qualifier, where shaders will implicitly use views 0 through | |
num_views-1. VK_KHR_multiview (Vulkan) provides no view information in | |
the shader, other than references to a view index. Instead, the Vulkan | |
render pass specifies a bitfield identifying the set of views to | |
produce. In the Vulkan algorithm, there is no explicit notion of a view | |
count in the shader, and the view mask is not known at shader compile | |
time. | |
For mesh shaders in OpenGL, we use the same OVR_multiview "num_views" | |
layout qualifier to specify the view count. Unlike multiview vertex | |
shaders, multiview mesh shaders are not run separately for each view. | |
The "num_views" layout qualifier is used only to determine array sizes | |
for outputs qualified with "perviewNV". For mesh shaders in Vulkan, the | |
view mask of the render pass is used to determine the storage | |
requirements of per-view attributes and controls the values of the | |
gl_MeshViewCount and gl_MeshViewIndicesNV built-ins. | |
(5) For outputs declared with "perviewNV", which are arrays with separate | |
elements for each view, what are the rules for array sizing and | |
indexing? Do you have to declare an array dimension? If you do | |
declare an array dimension, how is it checked? | |
RESOLVED: The rules for per-view mesh shader outputs are the same as | |
for arrayed inputs and outputs in tessellation control, tessellation | |
evaluation, and geometry shaders, as well as the per-vertex and | |
per-primitive mesh shader output arrays. When declaring an output | |
qualified with "perviewNV", an extra array dimension needs to be used | |
for indexing across views. The array size in that dimension is | |
optional. If omitted, the size is taken from the implementation | |
dependent maximum view count. If provided, the size must match the | |
maximum view count. | |
Given that the view count on Vulkan is inferred at *run time* from the | |
view mask in the render pass, we can't use that derived view count for | |
SPIR-V code generation and compile-time error checking. Because of | |
this, we have chosen to use the *maximum* view count for sizing per-view | |
arrays, which is known at compile time. | |
(6) What built-ins should be provided for multi-view mesh shaders? | |
RESOLVED: We provide per-view versions of gl_Position, | |
gl_ClipDistance[], and gl_CullDistance[] in the built-in block | |
gl_MeshPerVertexNV: | |
perviewNV vec4 gl_PositionPerViewNV[]; | |
perviewNV float gl_ClipDistancePerViewNV[][]; | |
perviewNV float gl_CullDistancePerViewNV[][]; | |
Because these per-view built-ins refer to the same attributes as the | |
equivalent standard built-ins, we prohibit the static use of a per-view | |
built-in and its standard equivalent in a single shader. | |
We considered instead allowing shaders to redeclare output blocks to add | |
"perviewNV" qualification to existing built-ins, such as: | |
out gl_PerVertex { | |
perviewNV vec4 gl_Position[]; | |
} v[]; | |
This approach was rejected because modifying the basic types of built-in | |
variables could result in new declarations that consist with the basic | |
definitions built into the compiler. | |
(7) For multi-view, how do we broadcast mesh shader outputs to multiple | |
layers or viewports, where at least some outputs have per-view values? | |
RESOLVED: In the OpenGL and Vulkan multi-view extensions, the | |
programming model has logically separate shader invocations for each | |
view. These extensions have a view ID/index built-in that can be used | |
to determine which view is being processed by a given invocation. If a | |
hardware platform is capable of compiling a multi-view shader to | |
correctly process multiple views in a single shader invocation, the | |
implementation is free to perform such an optimization. | |
For mesh shaders, a transparent optimization that combines invocations | |
for N different views is significantly more problematic. Separate | |
invocations could produce structurally different output (e.g., different | |
primitive counts or different topology), which would be more difficult | |
to "broadcast". To simplify matters, we instead use a programming model | |
where there is a single work group that processes all views at once. | |
For per-view attributes, the mesh shader is responsible for computing | |
separate output values for each view. | |
(8) Should the gl_NumWorkGroups built-in be supported in task or mesh | |
shaders, as with compute shaders? | |
RESOLVED: No, this isn't worth the trouble. If required, an | |
application can pass a workgroup count manually via a uniform. | |
If we were to support such a thing, it would be necessary to figure out | |
how this built-in would interact with gl_NumWorkGroups. For compute | |
shaders, if you dispatched five workgroups with DispatchCompute, they | |
would always be numbered 0..4 and have values less than | |
gl_NumWorkGroups. If you called glDrawMeshTasksNV with <first> set to 3 | |
and <count> set to 5, the work groups would be numbered 3..7 and it | |
would be necessary to decide if gl_NumWorkGroups should be 5 or 8. | |
Revision History | |
Version 7, March 6, 2019 (pknowles) | |
- Added EXT_clip_cull_distance interactions. | |
Version 6, October 22, 2018 (sparmar) | |
- Fix typo for per-primitive fragment shader input example | |
Version 5, October 5, 2018 (pbrown) | |
- Add an interaction with GLSL 4.60 and GL_KHR_vulkan_glsl to allow the | |
use of "local_size_[xyz]_id" where applicable. | |
Version 4, October 4, 2018 (pbrown) | |
- Fix incorrect layout qualifier table entries. "local_size_[xyz]" is | |
legal in task shaders. | |
Version 3, September 18, 2018 (pbrown) | |
- Additional edits preparing for publication. | |
Version 2, September 11, 2018 (pbrown) | |
- Miscellaneous edits preparing for publication. | |
Version 1 (ckubisch, pbrown) | |
- NVIDIA internal revisions. |