feat(ludus-renderer): add Vulkan mesh-shader backend (LudusTimestampedContext)#135
Draft
wlewNV wants to merge 2 commits into
Draft
feat(ludus-renderer): add Vulkan mesh-shader backend (LudusTimestampedContext)#135wlewNV wants to merge 2 commits into
wlewNV wants to merge 2 commits into
Conversation
…dContext) Adds a parallel rendering backend that mirrors the public API of the existing CUDA software rasterizer. The new path uses VK_EXT_mesh_shader with CUDA-Vulkan external-memory interop so render uploads stay on the GPU, and is selected at construction via LudusTimestampedContext while LudusCudaTimestampedContext remains the default everywhere. New: Vulkan context (vkutil), pipelines for polyline/polygon/obstacle mesh+task+fragment shaders, NV->EXT GLSL converter and SPIR-V embed scripts, JIT plugin, Python context wrapper, and a CUDA-vs-Vulkan example/parity test. Multi-pool task shaders use a force_zero_tasks flag to keep over-dispatched workgroups' SSBO reads in-bounds, which is what prevents the giant cross-pool garbage triangles seen with the naive early-EmitMeshTasksEXT(0) pattern.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a parallel
LudusTimestampedContextrendering backend that mirrors the public API ofLudusCudaTimestampedContext. The new path usesVK_EXT_mesh_shaderwith CUDA-Vulkan external-memory interop so uploads stay on the GPU. The CUDA software rasterizer remains the default and is unchanged.Per-frame parity vs CUDA on
example_data/test_hdmap(10 frames sampled across the 201-frame sequence, 640×480, front:wide:120fov):What's in the change
vkutil(instance/device/queue/external buffer/image with CUDA import), pipelines and dispatch for polyline / polygon / obstacle task+mesh+fragment shaders, JIT plugin, pybind wrapper (LudusTimestampedVkStateWrapper).GL_NV_mesh_shader-style and translated toGL_EXT_mesh_shaderviashaders/nv_to_ext.py. Compiled SPIR-V is embedded in_cpp/render/shaders_spv.hso the shipped wheel does not needglslangValidator.LudusTimestampedContextmirrors the CUDA context. Importable even when Vulkan isn't installed;ImportErroronly surfaces on construction.examples/compare_vulkan_vs_cuda.pyfor CUDA-vs-Vulkan parity rendering andtests/test_vulkan_backend.pyfor smoke / pixel-count tests.numQueries * maxPools * MAX_VARRAYS_PER_POOLworkgroups are dispatched) used toEmitMeshTasksEXT(0, …); return;early, which on EXT mesh shaders did not prevent later SSBO reads from leaking into the next pool's prefix sums / vertices / translations. That leak produced huge polygon / polyline / "ghost cube" artifacts whenever more than one pool of the same family was uploaded. Fixed by routing those workgroups through the normal code path with a clamped index and aforce_zero_tasksflag that sets the final_task_count = 0.SetMeshOutputsEXT()now uses the actual emitted vertex count instead of the layout max (e.g.SetMeshOutputsEXT(num_verts, num_tris)for the small-polygon path), which avoids undefined mesh output behavior on drivers that treat unused declared slots as live.Try it
Defaults are:
--scene example_data/test_hdmap,--camera camera:front:wide:120fov,--width 640 --height 480. Output (cuda.png,vulkan.png,diff_10x.png,side_by_side.png) is written to./_vk_compare/.You can also drop the Vulkan backend into existing CUDA-backed code:
Test plan
pytest tests/test_vulkan_backend.pypasses (3/3) on a host withVK_EXT_mesh_shader.examples/compare_vulkan_vs_cuda.py --frame {0,12,25,75,150,199}shows lit-pixel ratio in[0.999, 1.001]and mean RGB diff< 0.10 / 255.import ludus_rendererworks on a host without Vulkan (the Vulkan import is lazy;LudusTimestampedContext()is the call that fails).Known v1 caveats
PRIM_DOT_*) are not yet plumbed through the Vulkan task/mesh pipeline (CUDA backend still handles them).LUDUS_VK_DEBUG=1enables[Vulkan] …traces;LUDUS_VK_CLEAR_RED=1clears the framebuffer to opaque red.