Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for feedback on Provisional Vulkan Ray Tracing extensions #1206

Closed
dgkoch opened this issue Mar 17, 2020 · 38 comments
Closed

Call for feedback on Provisional Vulkan Ray Tracing extensions #1206

dgkoch opened this issue Mar 17, 2020 · 38 comments
Assignees

Comments

@dgkoch
Copy link

@dgkoch dgkoch commented Mar 17, 2020

In #1205 we announced the launch of Vulkan provisional ray tracing extensions.

Khronos welcomes feedback on the Vulkan Ray Tracing set of provisional specifications from the developer and content creation communities through the Khronos Developer Slack and in this issue. Developers are also encouraged to share comments with their preferred hardware vendors. A provisional release enables us to ship beta drivers and enable application prototyping to catalyze developer feedback. It also enables us to work on various open-source ecosystem artifacts in public, such as high-level compilers, validation layers, and debuggers, before spec finalization.

Although we do not have a specific timeframe for specification finalization to announce at this time, we want to move forward as quickly as we can, while ensuring the developer community is happy and we have a completed set of conformance tests and at least two implementations that can pass those tests.

Your feedback is critical to enable us to finalize the first version of Vulkan Ray Tracing and make it genuinely meet your needs!

To provide feedback: either leave a comment below (if minor), or create a new issue and include a link in the comments below.

@Jasper-Bekkers
Copy link

@Jasper-Bekkers Jasper-Bekkers commented Mar 17, 2020

I finished up a very rough port of the NV extension in my code-base. So far I think the only real surprise was that VkStridedBufferRegionKHR. While most other places have switched to DeviceAddress this still takes a Buffer + offset.

Another oddity was that now the raygen parameter for vkCmdTraceRays will also have a slightly confusing stride field. And on top it seemed a bit arbitrary what to put in the size fields there overall.

Once I get the Nvidia driver up and running I'll provide some more feedback wrt the rest of the toolchain if I have any.

@starbucksDave
Copy link

@starbucksDave starbucksDave commented Mar 19, 2020

Is there a reason why VkAccelerationStructureBuildOffsetInfoKHR doesn't have a structure type member and next pointer? Might be wise to add this to future proof the extension so we don't get more vkGetPhysicalDeviceProperties2 shenanigans.

@mbehm
Copy link

@mbehm mbehm commented Mar 20, 2020

Thank you so much, I've been experimenting with the NV extension waiting for this to happen 😍.

I'm currently using a huge (pixels * 1KB) scratch buffer* from which I reserve some space per pixel for per ray scratch information. The current 1KB per ray will probably be enough in most cases but not all, and even then it's a huge waste of memory to reserve it per pixel when it's only needed per ray (eg. ~1GB vs >1MB).

I've tried using gl_SubgroupInvocationID etc. to limit the memory allocation to "active subgroup" but seems without subgroup memory barriers they end up trampling over each other (get nice artifacts that show workgroups and subgroups if you squint).

Am I correct in assuming the new shadercallcoherent and GL_KHR_shader_subgroup interactions should help me with that? Ie. I can mark a buffer shadercallcoherent and/or use subgroupMemoryBarrierBuffer etc. to manually issue barriers** and hopefully it'll make my memory issues go away?

PS. I've also tried having the scratch data as part of the ray payload, but that gets slow fast (>20x difference, I assume all of it's copied by value between the stages) and exceedingly clunky since you can't get a reference to it in any way (I'm using GL_EXT_buffer_reference2 to replicate pointers).

* I'm actually currently using a hacky suffle between N buffers that hopefully have time to settle down before being bulldozed by another workgroup 😅.

** I know barriers are bad for performance 😉. If somebody has a better idea how to build and pass around complex structures (eg. for photon/beam mapping etc.) in the Vulkan RT shaders I'm all ears.

TL;DR; I have memory issues, I probably need some barriers.

@krOoze
Copy link
Contributor

@krOoze krOoze commented Mar 21, 2020

There are lot of broken VUIDs in the spec there. Lot of visible "VUID-{refpage}".

@oddhack
Copy link
Contributor

@oddhack oddhack commented Mar 21, 2020

@krOoze they will be fixed in the next spec update.

@SaschaWillems
Copy link

@SaschaWillems SaschaWillems commented Mar 22, 2020

Is it possible to expand the spec on details as to when the actual geometry for the acceleration structures is consumed and can be discarded? There is some wording on consuming e.g. triangles, but to me it's not clear what this actually means and when/if geometry is copied over to the AS no longer requiring the initial buffers, or if this is implementation dependant.

@krOoze
Copy link
Contributor

@krOoze krOoze commented Mar 22, 2020

The spirenv chapter says SPV_KHR_ray_tracing can be used with VK_NV_ray_tracing. Is that really true?

@expipiplus1
Copy link
Contributor

@expipiplus1 expipiplus1 commented Mar 28, 2020

Minor nit: despite it having a length, VkAccelerationStructureBuildGeometryInfoKHR::ppGeometries does not have a length attribute in the XML spec. I'm not sure what this attribute should be though, given the dependent type of this member.

(Why is this member not just a pointer to an array of VkAccelerationStructureGeometryKHR anyway?)

@krOoze
Copy link
Contributor

@krOoze krOoze commented Mar 28, 2020

In VkAccelerationStructureVersionKHR the Valid usage statement only says "TBD". As well as empty VU for VkAccelerationStructureGeometryTrianglesDataKHR and VkAccelerationStructureInstanceKHR.

PS: VkAccelerationStructureVersionKHR::version is (assumably) pointer to an array of two arrays of bytes (which are UUIDs, where the first one is device UUID, and the second is accelleration structure compatibility magic number UUID). Should it just be array instead of pointer? Should it include the Device UUID, which can already be obtained another way? Should the compatibility UUID instead be unique in of itself without need for device UUID?

@expipiplus1 Yea, that is not really expressible by len. It seems to be (correctly) marked as noautovalidity, but the explicit VUs seem to be missing.
And it seems it would be better if the two options were separate parameters, which would obviate the need for the VkBool32 ( could just use nullptr). For that matter it could just be a pointer to pointers to arrays, which would be best of both worlds (though also require array of counts).

@Lin20
Copy link

@Lin20 Lin20 commented Mar 30, 2020

I submitted a suggestion regarding adding level of detail tracing, or "cone" tracing, here: #1221

@dgkoch
Copy link
Author

@dgkoch dgkoch commented Apr 1, 2020

The spirenv chapter says SPV_KHR_ray_tracing can be used with VK_NV_ray_tracing. Is that really true?

It was originally, but probably not anymore since we changed the capability for provisional. Will remove any illusions of that in an upcoming spec update.

@ewerness-nv
Copy link

@ewerness-nv ewerness-nv commented Apr 3, 2020

TL;DR; I have memory issues, I probably need some barriers.

@mbehm, if you're going to be using subgroup operations be aware of the "invocation repack instruction" description in the shader in the "Shader Call Instructions" section of the spec.

Overall, I suspect that you probably either do need some more barriers, but it's hard to say with just the information given.

@mbehm
Copy link

@mbehm mbehm commented Apr 4, 2020

@ewerness-nv Thanks I'll try to keep that in mind. Yeah the description was a bit vague, a detailed version would've been way longer and reproducing would probably require actual code.

Basically if I understood correctly (and based on the memory issue/artifacts I'm having) the RT kernels are scheduled as workgroups/batches of X. So if I can barrier memory per group would probably solve my problems.

Currently without barriers to control memory access I need to allocate the scratch space per pixel which is few gigabytes more than I'd want to.

Another thing that'd solve at least my problem (and probably quite a few) easily... If one could declare a "local" buffer reference eg, buffer_reference data { uint v[256]; } as part of the per ray payload, which could then be used like other buffer references.

@expipiplus1
Copy link
Contributor

@expipiplus1 expipiplus1 commented Apr 10, 2020

@krOoze

Should it [VkAccelerationStructureVersionKHR::versionData] just be array instead of pointer?

Elsewhere in the spec anything sized to VK_.UID_SIZE is an array.

Why not split this into two arrays if there is an important distinction between device ID and acceleration structure compatability ID?

@expipiplus1 Yea, that is not really expressible by len. It seems to be (correctly) marked as noautovalidity, but the explicit VUs seem to be missing.
And it seems it would be better if the two options were separate parameters, which would obviate the need for the VkBool32 ( could just use nullptr). For that matter it could just be a pointer to pointers to arrays, which would be best of both worlds (though also require array of counts).

AFAIK array of counts isn't used anywhere so far in the spec, it might be weird to have to create an array of [1,1,1,...] to represent the array of pointers case there is at the moment.

Another solution would be to have a pointer to an array of struct VkAccelerationStructureGeometryGroup { uint32_t geometryCount; const VkAccelerationStructureGeometryKHR* geometries; }

That would avoid having to have an array of counts and the lengths of all the arrays are clearly specified.

@W4RH4WK
Copy link

@W4RH4WK W4RH4WK commented Apr 12, 2020

One thing that threw me off slightly is the firstVertex field of vkAccelerationStructureBuildOffsetInfoKHR. While the primitiveOffset is specified in bytes, the firstVertex is not. I think having a vertexOffset field taking the offset in bytes would be more consistent.

@qq553563608
Copy link

@qq553563608 qq553563608 commented Apr 14, 2020

my question is about AS host build. I found I can assign a VkDeviceOrHostAddressKHR to scratchData in VkAccelerationStructureBuildGeometryInfoKHR in host AS build, but for binding memory to AS,there is not a VkDeviceOrHostAddressKHR, instead only a VkDeviceMemory in the VkBindAccelerationStructureMemoryInfoKHR, does that mean we don't need to bind memory when use host AS build, or we bind device memory when using host AS build?

@Tobski
Copy link
Contributor

@Tobski Tobski commented Apr 23, 2020

my question is about AS host build. I found I can assign a VkDeviceOrHostAddressKHR to scratchData in VkAccelerationStructureBuildGeometryInfoKHR in host AS build, but for binding memory to AS,there is not a VkDeviceOrHostAddressKHR, instead only a VkDeviceMemory in the VkBindAccelerationStructureMemoryInfoKHR, does that mean we don't need to bind memory when use host AS build, or we bind device memory when using host AS build?

Whilst the host builds accept host pointers, the spec requires the use of device-mapped memory resources so they can be used (for example) with device copy commands. So you do need to bind a VkDeviceMemory to these AS builds, but it has to be from a type that includes host-visibility (and ideally host coherence). If you want to use a pre-existing host pointer for this purpose, you can import a host pointer using VK_EXT_external_memory_host.

@expipiplus1
Copy link
Contributor

@expipiplus1 expipiplus1 commented Apr 27, 2020

@krOoze were you able to consider my comment here: #1206 (comment), thanks.

@krOoze
Copy link
Contributor

@krOoze krOoze commented Apr 27, 2020

@expipiplus1 I have considered it, but I had not much to add. In the end I am not the one that needs to consider it.

The two UUIDs stuck together as dynamic array seems weird. They could just be uint8_t deviceUuid[UUID_SIZE]; uint8_t asUuid[UUID_SIZE];

VkAccelerationStructureBuildGeometryInfoKHR::ppGeometries seems weird, and feels like it could be structured better. The obvious cleaner way would be to instead have pStructGeometries and pPointersGeometries. Which would obviate the need for geometryArrayOfPointers, which would instead be expressed by whether given pointer is nullptr or not.

I was getting bit beside the point there, the pointer taking version should be able to handle more complicated cases too (e.g. if it is array, then one could spam pointers to individual elements). Though pointer is potentially 64 bits, count could be less, amortized over the array length, but the [1,1,1] case would indeed be weird. Depends what they were going for with this parameter. It feels like they were trying to optimize something there; it needs to be identified what.

@expipiplus1
Copy link
Contributor

@expipiplus1 expipiplus1 commented Apr 27, 2020

@krOoze I see, thanks. I think it would be most consistent with the spec to have an array of VkAccelerationStructureGeometryGroup, each containing a length and pointer (no array of lengths, still just as flexible).

While we're here. Shouldn't versionData be called pVersionData if things are going to stay as the weird dynamic array UUIDs?

@dgkoch
Copy link
Author

@dgkoch dgkoch commented Apr 29, 2020

Is there a reason why VkAccelerationStructureBuildOffsetInfoKHR doesn't have a structure type member and next pointer? Might be wise to add this to future proof the extension so we don't get more vkGetPhysicalDeviceProperties2 shenanigans.

Yes - this structure is also used on the GPU for indirect builds so it's not something that can easily change, and we don't want to be chasing pNext pointers on the GPU!
Any extensibility to the vk*BuildAccelerationStructureKHR commands should be done via the VkAccelerationStructureBuildGeometryInfoKHR structure instead.

@dgkoch
Copy link
Author

@dgkoch dgkoch commented May 1, 2020

@W4RH4WK wrote

One thing that threw me off slightly is the firstVertex field of vkAccelerationStructureBuildOffsetInfoKHR. While the primitiveOffset is specified in bytes, the firstVertex is not. I think having a vertexOffset field taking the offset in bytes would be more consistent.

firstVertex was chosen for consistency with drawing commands like VkCmdDrawIndexed (the AS build is analogous to drawing), and the documentation clearly states what it is:

firstVertex is the index of the first vertex to build from for triangle geometry.

so we don't plan to make a change for this.

@dgkoch
Copy link
Author

@dgkoch dgkoch commented May 7, 2020

@SaschaWillems wrote:

Is it possible to expand the spec on details as to when the actual geometry for the acceleration structures is consumed and can be discarded? There is some wording on consuming e.g. triangles, but to me it's not clear what this actually means and when/if geometry is copied over to the AS no longer requiring the initial buffers, or if this is implementation dependant.

This was addressed in the 1.2.140 spec update by adding the following language:

The input buffers passed to acceleration structure build commands will be referenced by the implementation for the duration of the command. After the command completes, the acceleration structure may hold a reference to any acceleration structure specified by an active instance contained therein. Apart from this referencing, acceleration structures must be fully self-contained. The application may re-use or free any memory which was used by the command as an input or as scratch without affecting the results of ray traversal.

Hopefully this helps to clarify.

@Karlovsky120
Copy link

@Karlovsky120 Karlovsky120 commented May 7, 2020

The values returned by vkGetAccelerationStructureMemoryRequirementsKHR which are to be ignored could be set to better default values. Certain values would make this special cases not actually special, making the specification easier to follow.

More info on that here: #1272

@expipiplus1
Copy link
Contributor

@expipiplus1 expipiplus1 commented May 8, 2020

@expipiplus1 I have considered it, but I had not much to add. In the end I am not the one that needs to consider it.

Thanks @krOoze, who would be the best person to ask? If there's a good reason for this geometry representation then it's all cool, but I'd like to know why.

@dgkoch
Copy link
Author

@dgkoch dgkoch commented May 8, 2020

@Jasper-Bekkers wrote:

I finished up a very rough port of the NV extension in my code-base. So far I think the only real surprise was that VkStridedBufferRegionKHR. While most other places have switched to DeviceAddress this still takes a Buffer + offset.

Yes. An oversight. We'll fix in a future API update.

Another oddity was that now the raygen parameter for vkCmdTraceRays will also have a slightly confusing stride field. And on top it seemed a bit arbitrary what to put in the size fields there overall.

We've improved the documentation for these (already released). Hopefully it's clearer now.

@dgkoch
Copy link
Author

@dgkoch dgkoch commented May 8, 2020

@krOoze wrote:

PS: VkAccelerationStructureVersionKHR::version is (assumably) pointer to an array of two arrays of bytes (which are UUIDs, where the first one is device UUID, and the second is accelleration structure compatibility magic number UUID). Should it just be array instead of pointer? Should it include the Device UUID, which can already be obtained another way? Should the compatibility UUID instead be unique in of itself without need for device UUID?

@expipiplus1 wrote:

Elsewhere in the spec anything sized to VK_UUID_SIZE is an array.
Why not split this into two arrays if there is an important distinction between device ID and acceleration structure compatability ID?

@krOoze wrote:

The two UUIDs stuck together as dynamic array seems weird. They could just be uint8_t deviceUuid[UUID_SIZE]; uint8_t asUuid[UUID_SIZE];

@expipiplus1 wrote:

While we're here. Shouldn't versionData be called pVersionData if things are going to stay as the weird dynamic array UUIDs?

We are making some changes to the signature of vkGetDeviceAccelerationStructureCompatibilityKHR and we have renamed versionData to pVersionData amongst other changes here.

Initially I agreed that the pointer to 2xUUID was weird, but after staring at it for a while (and improving the documentation), I realized why it is the way it is:

pVersionData is a pointer to an array of 2*VK_UUID_SIZE uint8_t values instead of two VK_UUID_SIZE arrays as the expected use case for this member is to be pointed at the header of an previously serialized acceleration structure (via vkCmdCopyAccelerationStructureToMemoryKHR or vkCopyAccelerationStructureToMemoryKHR) that is loaded in memory. Using arrays would necessitate extra memory copies of the UUIDs.

@expipiplus1
Copy link
Contributor

@expipiplus1 expipiplus1 commented May 8, 2020

Thanks for explaining, @dgkoch. Would it be possible to know the reasoning behind the unusual layout of ppGeometries too? I think krOoze and I spoke about it a little earlier too.

@dgkoch
Copy link
Author

@dgkoch dgkoch commented May 8, 2020

Would it be possible to know the reasoning behind the unusual layout of ppGeometries too? I think krOoze and I spoke about it a little earlier too.

It was trying to replicate the functionality of D3D12_ELEMENTS_LAYOUT_ARRAY / D3D12_ELEMENTS_LAYOUT_ARRAY_OF_POINTERS in D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INPUTS without using unions, and it has obviously failed to do that well (we've had a lot of confusion and questions about it, and it makes validation annoying). We are planning to change this for the final version.

@tomilov
Copy link

@tomilov tomilov commented May 8, 2020

Combining of TLAS and BLAS into the same structures is a mess. Just because they are both acceleration structures does not mean that it is necessary to combine everything together. I think it would be better, if there will be two separate set of structures and functions for TLAS and BLAS without that "unions" (triangles/aabbs/instances), until BLAS can be node of another BLAS and/or TLAS can be node of another TLAS. Currently it is even hard to write "hello world" using present API (I succeeded, but it was quite hard to rewrite one even from VK_NV to VK_KHR).

If type is VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR and compactedSize is 0,
maxGeometryCount must be 1

Above condition make it even more meaningless.

@tomilov
Copy link

@tomilov tomilov commented May 9, 2020

It would be good if semantics of rayPayloadInEXT and rayPayloadEXT will be exchanged. Currently it is counterintuitive. Or rayPayloadInEXT can be renamed to rayPayloadOutEXT

@dgkoch
Copy link
Author

@dgkoch dgkoch commented May 13, 2020

It would be good if semantics of rayPayloadInEXT and rayPayloadEXT will be exchanged. Currently it is counterintuitive. Or rayPayloadInEXT can be renamed to rayPayloadOutEXT

It seems intuitive to me as is. The "upstream" shader writes sends data to the "child" shader via the rayPayloadEXT variable (so it is out from that shaders' perspective). The child shader receives this as input via it's rayPayloadInEXT variable (which yes, can be modified and sent back to the upstream shader), but the child shader can also result in more child shaders which would be passed the ray payload via it's own rayPayloadEXT that gets provided it's downstream shaders. IMO reversing the naming would make it even more confusing.

@dgkoch
Copy link
Author

@dgkoch dgkoch commented May 13, 2020

Combining of TLAS and BLAS into the same structures is a mess. Just because they are both acceleration structures does not mean that it is necessary to combine everything together. I think it would be better, if there will be two separate set of structures and functions for TLAS and BLAS without that "unions" (triangles/aabbs/instances), until BLAS can be node of another BLAS and/or TLAS can be node of another TLAS.

If we separate the TLAS and BLAS now, it'll be harder to add multi-level AS support at some point in the future.

@multisample
Copy link

@multisample multisample commented May 15, 2020

Can we have some clarification on what buffers VK_BUFFER_USAGE_RAY_TRACING_BIT_KHR is needed on exactly and why ? Is it all buffer types in descriptor sets referenced during a trace, or just SBT + tlas/blas source. If it is the former, why is this needed for things like texel buffers ?

@krOoze
Copy link
Contributor

@krOoze krOoze commented Jun 3, 2020

8 bits in VkAccelerationStructureInstanceKHR for VkGeometryInstanceFlagBitsKHR seems pretty tight considering there are already 4 bits reserved. It might become a problem for extensions.

Generally, the use of C bit fields in VkAccelerationStructureInstanceKHR is bit unorthodox. I think those are not usually used except when working with actual HW registers of a CPU. It results in quite a copout hack in the spec itself:

If a compiler produces code that diverges from that pattern, applications must employ another method to set values according to the correct bit pattern.

@dgkoch
Copy link
Author

@dgkoch dgkoch commented Jun 3, 2020

8 bits in VkAccelerationStructureInstanceKHR for VkGeometryInstanceFlagBitsKHR seems pretty tight considering there are already 4 bits reserved. It might become a problem for extensions.

Generally, the use of C bit fields in VkAccelerationStructureInstanceKHR is bit unorthodox. I think those are not usually used except when working with actual HW registers of a CPU. It results in quite a copout hack in the spec itself:

If a compiler produces code that diverges from that pattern, applications must employ another method to set values according to the correct bit pattern.

This structure is consumed on the GPU so we want keep the layout the same as the corresponding structure in DXR. The NV extension originally took the approach of the not defining the structure but just defining the layout in the spec. The KHR extension added the struct with bitfields for ease-of-use, but it's really non-normative (as noted in the xml file: The bitfields in this structure are non-normative since bitfield ordering is implementation-defined in C. The specification defines the normative layout.)
We've already discussed this to death, and we don't plan to change it.

@expipiplus1
Copy link
Contributor

@expipiplus1 expipiplus1 commented Nov 21, 2020

Minor nit: In all of the acceleration structure building commands the len attribute for ppOffsetInfos is incomplete. It describes the first level length correctly, but each pointer in there has length equal to the geometryCount member of the corresponding pInfos element. I don't think that the spec XML schema is sophisticated enough to describe this, but it would be good to have something in the spec here to make sure that consumers who use the len information don't blindly assume it has a length of one.

Edit,162: I think this is still the case for ppBuildRangeInfos in those commands (I've only read the changes for the XML though, so could be wrong)


Regarding ppGeometries:

> We are planning to change this for the final version.

Instead of how it currently is as a SOA layout, an AOS makes more sense here, something like a list of geometryCount, geometries tuples.

Looks like this was fixed in 1.2.162! Thanks


pVersionData is a pointer to an array of 2*VK_UUID_SIZE uint8_t values instead of two VK_UUID_SIZE arrays as the expected use case for this member is to be pointed at the header of an previously serialized acceleration structure

Makes sense!

@dgkoch
Copy link
Author

@dgkoch dgkoch commented Nov 23, 2020

Final extensions have been released now. #1402

@dgkoch dgkoch closed this Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet