Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for feedback on Provisional Vulkan Ray Tracing extensions #1206

Open
dgkoch opened this issue Mar 17, 2020 · 13 comments
Open

Call for feedback on Provisional Vulkan Ray Tracing extensions #1206

dgkoch opened this issue Mar 17, 2020 · 13 comments
Assignees

Comments

@dgkoch
Copy link

@dgkoch dgkoch commented Mar 17, 2020

In #1205 we announced the launch of Vulkan provisional ray tracing extensions.

Khronos welcomes feedback on the Vulkan Ray Tracing set of provisional specifications from the developer and content creation communities through the Khronos Developer Slack and in this issue. Developers are also encouraged to share comments with their preferred hardware vendors. A provisional release enables us to ship beta drivers and enable application prototyping to catalyze developer feedback. It also enables us to work on various open-source ecosystem artifacts in public, such as high-level compilers, validation layers, and debuggers, before spec finalization.

Although we do not have a specific timeframe for specification finalization to announce at this time, we want to move forward as quickly as we can, while ensuring the developer community is happy and we have a completed set of conformance tests and at least two implementations that can pass those tests.

Your feedback is critical to enable us to finalize the first version of Vulkan Ray Tracing and make it genuinely meet your needs!

To provide feedback: either leave a comment below (if minor), or create a new issue and include a link in the comments below.

@Jasper-Bekkers

This comment has been minimized.

Copy link

@Jasper-Bekkers Jasper-Bekkers commented Mar 17, 2020

I finished up a very rough port of the NV extension in my code-base. So far I think the only real surprise was that VkStridedBufferRegionKHR. While most other places have switched to DeviceAddress this still takes a Buffer + offset.

Another oddity was that now the raygen parameter for vkCmdTraceRays will also have a slightly confusing stride field. And on top it seemed a bit arbitrary what to put in the size fields there overall.

Once I get the Nvidia driver up and running I'll provide some more feedback wrt the rest of the toolchain if I have any.

@starbucksDave

This comment has been minimized.

Copy link

@starbucksDave starbucksDave commented Mar 19, 2020

Is there a reason why VkAccelerationStructureBuildOffsetInfoKHR doesn't have a structure type member and next pointer? Might be wise to add this to future proof the extension so we don't get more vkGetPhysicalDeviceProperties2 shenanigans.

@mbehm

This comment has been minimized.

Copy link

@mbehm mbehm commented Mar 20, 2020

Thank you so much, I've been experimenting with the NV extension waiting for this to happen 😍.

I'm currently using a huge (pixels * 1KB) scratch buffer* from which I reserve some space per pixel for per ray scratch information. The current 1KB per ray will probably be enough in most cases but not all, and even then it's a huge waste of memory to reserve it per pixel when it's only needed per ray (eg. ~1GB vs >1MB).

I've tried using gl_SubgroupInvocationID etc. to limit the memory allocation to "active subgroup" but seems without subgroup memory barriers they end up trampling over each other (get nice artifacts that show workgroups and subgroups if you squint).

Am I correct in assuming the new shadercallcoherent and GL_KHR_shader_subgroup interactions should help me with that? Ie. I can mark a buffer shadercallcoherent and/or use subgroupMemoryBarrierBuffer etc. to manually issue barriers** and hopefully it'll make my memory issues go away?

PS. I've also tried having the scratch data as part of the ray payload, but that gets slow fast (>20x difference, I assume all of it's copied by value between the stages) and exceedingly clunky since you can't get a reference to it in any way (I'm using GL_EXT_buffer_reference2 to replicate pointers).

* I'm actually currently using a hacky suffle between N buffers that hopefully have time to settle down before being bulldozed by another workgroup 😅.

** I know barriers are bad for performance 😉. If somebody has a better idea how to build and pass around complex structures (eg. for photon/beam mapping etc.) in the Vulkan RT shaders I'm all ears.

TL;DR; I have memory issues, I probably need some barriers.

@krOoze

This comment has been minimized.

Copy link
Contributor

@krOoze krOoze commented Mar 21, 2020

There are lot of broken VUIDs in the spec there. Lot of visible "VUID-{refpage}".

@oddhack

This comment has been minimized.

Copy link
Contributor

@oddhack oddhack commented Mar 21, 2020

@krOoze they will be fixed in the next spec update.

@SaschaWillems

This comment has been minimized.

Copy link

@SaschaWillems SaschaWillems commented Mar 22, 2020

Is it possible to expand the spec on details as to when the actual geometry for the acceleration structures is consumed and can be discarded? There is some wording on consuming e.g. triangles, but to me it's not clear what this actually means and when/if geometry is copied over to the AS no longer requiring the initial buffers, or if this is implementation dependant.

@krOoze

This comment has been minimized.

Copy link
Contributor

@krOoze krOoze commented Mar 22, 2020

The spirenv chapter says SPV_KHR_ray_tracing can be used with VK_NV_ray_tracing. Is that really true?

@expipiplus1

This comment has been minimized.

Copy link
Contributor

@expipiplus1 expipiplus1 commented Mar 28, 2020

Minor nit: despite it having a length, VkAccelerationStructureBuildGeometryInfoKHR::ppGeometries does not have a length attribute in the XML spec. I'm not sure what this attribute should be though, given the dependent type of this member.

(Why is this member not just a pointer to an array of VkAccelerationStructureGeometryKHR anyway?)

@krOoze

This comment has been minimized.

Copy link
Contributor

@krOoze krOoze commented Mar 28, 2020

In VkAccelerationStructureVersionKHR the Valid usage statement only says "TBD". As well as empty VU for VkAccelerationStructureGeometryTrianglesDataKHR and VkAccelerationStructureInstanceKHR.

PS: VkAccelerationStructureVersionKHR::version is (assumably) pointer to an array of two arrays of bytes (which are UUIDs, where the first one is device UUID, and the second is accelleration structure compatibility magic number UUID). Should it just be array instead of pointer? Should it include the Device UUID, which can already be obtained another way? Should the compatibility UUID instead be unique in of itself without need for device UUID?

@expipiplus1 Yea, that is not really expressible by len. It seems to be (correctly) marked as noautovalidity, but the explicit VUs seem to be missing.
And it seems it would be better if the two options were separate parameters, which would obviate the need for the VkBool32 ( could just use nullptr). For that matter it could just be a pointer to pointers to arrays, which would be best of both worlds (though also require array of counts).

@Lin20

This comment has been minimized.

Copy link

@Lin20 Lin20 commented Mar 30, 2020

I submitted a suggestion regarding adding level of detail tracing, or "cone" tracing, here: #1221

@dgkoch

This comment has been minimized.

Copy link
Author

@dgkoch dgkoch commented Apr 1, 2020

The spirenv chapter says SPV_KHR_ray_tracing can be used with VK_NV_ray_tracing. Is that really true?

It was originally, but probably not anymore since we changed the capability for provisional. Will remove any illusions of that in an upcoming spec update.

@ewerness-nv

This comment has been minimized.

Copy link

@ewerness-nv ewerness-nv commented Apr 3, 2020

TL;DR; I have memory issues, I probably need some barriers.

@mbehm, if you're going to be using subgroup operations be aware of the "invocation repack instruction" description in the shader in the "Shader Call Instructions" section of the spec.

Overall, I suspect that you probably either do need some more barriers, but it's hard to say with just the information given.

@mbehm

This comment has been minimized.

Copy link

@mbehm mbehm commented Apr 4, 2020

@ewerness-nv Thanks I'll try to keep that in mind. Yeah the description was a bit vague, a detailed version would've been way longer and reproducing would probably require actual code.

Basically if I understood correctly (and based on the memory issue/artifacts I'm having) the RT kernels are scheduled as workgroups/batches of X. So if I can barrier memory per group would probably solve my problems.

Currently without barriers to control memory access I need to allocate the scratch space per pixel which is few gigabytes more than I'd want to.

Another thing that'd solve at least my problem (and probably quite a few) easily... If one could declare a "local" buffer reference eg, buffer_reference data { uint v[256]; } as part of the per ray payload, which could then be used like other buffer references.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.