feat(ludus-renderer): add Vulkan mesh-shader backend (LudusTimestampedContext) by wlewNV · Pull Request #135 · NVIDIA/flashdreams

wlewNV · 2026-05-21T04:36:56Z

Summary

Adds a parallel LudusTimestampedContext rendering backend that mirrors the public API of LudusCudaTimestampedContext. The new path uses VK_EXT_mesh_shader with CUDA-Vulkan external-memory interop so uploads stay on the GPU. The CUDA software rasterizer remains the default and is unchanged.
Per-frame parity vs CUDA on example_data/test_hdmap (10 frames sampled across the 201-frame sequence, 640×480, front:wide:120fov):

frame	cuda lit	vk lit	ratio	diff%	mean RGB diff
0	35441	35448	1.000	2.43	0.057 / 255
12	37743	37740	1.000	2.38	0.069 / 255
25	45407	45396	1.000	3.15	0.073 / 255
75	76793	76795	1.000	1.85	0.066 / 255
100	92389	92383	1.000	1.95	0.065 / 255
150	93008	93008	1.000	2.64	0.075 / 255
199	23081	23093	1.001	1.18	0.048 / 255
Remaining diff is sub-pixel rasterization-edge noise; no geometry blow-up across the sequence.

What's in the change

C++ Vulkan backend: vkutil (instance/device/queue/external buffer/image with CUDA import), pipelines and dispatch for polyline / polygon / obstacle task+mesh+fragment shaders, JIT plugin, pybind wrapper (LudusTimestampedVkStateWrapper).
Shaders: full timestamped task/mesh/fragment set for the three primitive families. Authored as GL_NV_mesh_shader-style and translated to GL_EXT_mesh_shader via shaders/nv_to_ext.py. Compiled SPIR-V is embedded in _cpp/render/shaders_spv.h so the shipped wheel does not need glslangValidator.
Python: LudusTimestampedContext mirrors the CUDA context. Importable even when Vulkan isn't installed; ImportError only surfaces on construction.
Tooling: examples/compare_vulkan_vs_cuda.py for CUDA-vs-Vulkan parity rendering and tests/test_vulkan_backend.py for smoke / pixel-count tests.
Multi-pool task-shader guard: invalid over-dispatched workgroups (e.g. for smaller pools when numQueries * maxPools * MAX_VARRAYS_PER_POOL workgroups are dispatched) used to EmitMeshTasksEXT(0, …); return; early, which on EXT mesh shaders did not prevent later SSBO reads from leaking into the next pool's prefix sums / vertices / translations. That leak produced huge polygon / polyline / "ghost cube" artifacts whenever more than one pool of the same family was uploaded. Fixed by routing those workgroups through the normal code path with a clamped index and a force_zero_tasks flag that sets the final _task_count = 0.
EXT mesh output count: SetMeshOutputsEXT() now uses the actual emitted vertex count instead of the layout max (e.g. SetMeshOutputsEXT(num_verts, num_tris) for the small-polygon path), which avoids undefined mesh output behavior on drivers that treat unused declared slots as live.

Try it

cd integrations/alpadreams/ludus-renderer
# Build/JIT once (takes ~10s the first time)
python examples/compare_vulkan_vs_cuda.py --frame 12
# Sweep a few frames to confirm parity:
for f in 0 25 75 100 150 199; do
    python examples/compare_vulkan_vs_cuda.py --frame $f
done

Defaults are: --scene example_data/test_hdmap, --camera camera:front:wide:120fov, --width 640 --height 480. Output (cuda.png, vulkan.png, diff_10x.png, side_by_side.png) is written to ./_vk_compare/.
You can also drop the Vulkan backend into existing CUDA-backed code:

# Existing CUDA path is unchanged:
from ludus_renderer import LudusCudaTimestampedContext
ctx = LudusCudaTimestampedContext(device="cuda")
# Vulkan path — same public API:
from ludus_renderer import LudusTimestampedContext
ctx = LudusTimestampedContext(device="cuda")

Test plan

pytest tests/test_vulkan_backend.py passes (3/3) on a host with VK_EXT_mesh_shader.
examples/compare_vulkan_vs_cuda.py --frame {0,12,25,75,150,199} shows lit-pixel ratio in [0.999, 1.001] and mean RGB diff < 0.10 / 255.
CUDA backend behavior is unchanged: existing CUDA-only tests still pass.
import ludus_renderer works on a host without Vulkan (the Vulkan import is lazy; LudusTimestampedContext() is the call that fails).

Known v1 caveats

Dot primitives (PRIM_DOT_*) are not yet plumbed through the Vulkan task/mesh pipeline (CUDA backend still handles them).
CUDA→Vulkan interop uses opaque-FD external memory, which is Linux-only; the Vulkan plugin currently refuses to build on Windows. The CUDA backend remains cross-platform.
Diagnostics: LUDUS_VK_DEBUG=1 enables [Vulkan] … traces; LUDUS_VK_CLEAR_RED=1 clears the framebuffer to opaque red.

…dContext) Adds a parallel rendering backend that mirrors the public API of the existing CUDA software rasterizer. The new path uses VK_EXT_mesh_shader with CUDA-Vulkan external-memory interop so render uploads stay on the GPU, and is selected at construction via LudusTimestampedContext while LudusCudaTimestampedContext remains the default everywhere. New: Vulkan context (vkutil), pipelines for polyline/polygon/obstacle mesh+task+fragment shaders, NV->EXT GLSL converter and SPIR-V embed scripts, JIT plugin, Python context wrapper, and a CUDA-vs-Vulkan example/parity test. Multi-pool task shaders use a force_zero_tasks flag to keep over-dispatched workgroups' SSBO reads in-bounds, which is what prevents the giant cross-pool garbage triangles seen with the naive early-EmitMeshTasksEXT(0) pattern.

copy-pr-bot · 2026-05-21T04:36:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Apply Mukund's suggestions

67d3b69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ludus-renderer): add Vulkan mesh-shader backend (LudusTimestampedContext)#135

feat(ludus-renderer): add Vulkan mesh-shader backend (LudusTimestampedContext)#135
wlewNV wants to merge 2 commits into
mainfrom
dev/wlew/ludus_vk

wlewNV commented May 21, 2026

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wlewNV commented May 21, 2026

Summary

What's in the change

Try it

Test plan

Known v1 caveats

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant