Skip to content

feat(ludus-renderer): add Vulkan mesh-shader backend (LudusTimestampedContext)#135

Draft
wlewNV wants to merge 2 commits into
mainfrom
dev/wlew/ludus_vk
Draft

feat(ludus-renderer): add Vulkan mesh-shader backend (LudusTimestampedContext)#135
wlewNV wants to merge 2 commits into
mainfrom
dev/wlew/ludus_vk

Conversation

@wlewNV
Copy link
Copy Markdown
Collaborator

@wlewNV wlewNV commented May 21, 2026

Summary

Adds a parallel LudusTimestampedContext rendering backend that mirrors the public API of LudusCudaTimestampedContext. The new path uses VK_EXT_mesh_shader with CUDA-Vulkan external-memory interop so uploads stay on the GPU. The CUDA software rasterizer remains the default and is unchanged.
Per-frame parity vs CUDA on example_data/test_hdmap (10 frames sampled across the 201-frame sequence, 640×480, front:wide:120fov):

frame cuda lit vk lit ratio diff% mean RGB diff
0 35441 35448 1.000 2.43 0.057 / 255
12 37743 37740 1.000 2.38 0.069 / 255
25 45407 45396 1.000 3.15 0.073 / 255
75 76793 76795 1.000 1.85 0.066 / 255
100 92389 92383 1.000 1.95 0.065 / 255
150 93008 93008 1.000 2.64 0.075 / 255
199 23081 23093 1.001 1.18 0.048 / 255
Remaining diff is sub-pixel rasterization-edge noise; no geometry blow-up across the sequence.

What's in the change

  • C++ Vulkan backend: vkutil (instance/device/queue/external buffer/image with CUDA import), pipelines and dispatch for polyline / polygon / obstacle task+mesh+fragment shaders, JIT plugin, pybind wrapper (LudusTimestampedVkStateWrapper).
  • Shaders: full timestamped task/mesh/fragment set for the three primitive families. Authored as GL_NV_mesh_shader-style and translated to GL_EXT_mesh_shader via shaders/nv_to_ext.py. Compiled SPIR-V is embedded in _cpp/render/shaders_spv.h so the shipped wheel does not need glslangValidator.
  • Python: LudusTimestampedContext mirrors the CUDA context. Importable even when Vulkan isn't installed; ImportError only surfaces on construction.
  • Tooling: examples/compare_vulkan_vs_cuda.py for CUDA-vs-Vulkan parity rendering and tests/test_vulkan_backend.py for smoke / pixel-count tests.
  • Multi-pool task-shader guard: invalid over-dispatched workgroups (e.g. for smaller pools when numQueries * maxPools * MAX_VARRAYS_PER_POOL workgroups are dispatched) used to EmitMeshTasksEXT(0, …); return; early, which on EXT mesh shaders did not prevent later SSBO reads from leaking into the next pool's prefix sums / vertices / translations. That leak produced huge polygon / polyline / "ghost cube" artifacts whenever more than one pool of the same family was uploaded. Fixed by routing those workgroups through the normal code path with a clamped index and a force_zero_tasks flag that sets the final _task_count = 0.
  • EXT mesh output count: SetMeshOutputsEXT() now uses the actual emitted vertex count instead of the layout max (e.g. SetMeshOutputsEXT(num_verts, num_tris) for the small-polygon path), which avoids undefined mesh output behavior on drivers that treat unused declared slots as live.

Try it

cd integrations/alpadreams/ludus-renderer
# Build/JIT once (takes ~10s the first time)
python examples/compare_vulkan_vs_cuda.py --frame 12
# Sweep a few frames to confirm parity:
for f in 0 25 75 100 150 199; do
    python examples/compare_vulkan_vs_cuda.py --frame $f
done

Defaults are: --scene example_data/test_hdmap, --camera camera:front:wide:120fov, --width 640 --height 480. Output (cuda.png, vulkan.png, diff_10x.png, side_by_side.png) is written to ./_vk_compare/.
You can also drop the Vulkan backend into existing CUDA-backed code:

# Existing CUDA path is unchanged:
from ludus_renderer import LudusCudaTimestampedContext
ctx = LudusCudaTimestampedContext(device="cuda")
# Vulkan path — same public API:
from ludus_renderer import LudusTimestampedContext
ctx = LudusTimestampedContext(device="cuda")

Test plan

  • pytest tests/test_vulkan_backend.py passes (3/3) on a host with VK_EXT_mesh_shader.
  • examples/compare_vulkan_vs_cuda.py --frame {0,12,25,75,150,199} shows lit-pixel ratio in [0.999, 1.001] and mean RGB diff < 0.10 / 255.
  • CUDA backend behavior is unchanged: existing CUDA-only tests still pass.
  • import ludus_renderer works on a host without Vulkan (the Vulkan import is lazy; LudusTimestampedContext() is the call that fails).

Known v1 caveats

  • Dot primitives (PRIM_DOT_*) are not yet plumbed through the Vulkan task/mesh pipeline (CUDA backend still handles them).
  • CUDA→Vulkan interop uses opaque-FD external memory, which is Linux-only; the Vulkan plugin currently refuses to build on Windows. The CUDA backend remains cross-platform.
  • Diagnostics: LUDUS_VK_DEBUG=1 enables [Vulkan] … traces; LUDUS_VK_CLEAR_RED=1 clears the framebuffer to opaque red.

…dContext)

Adds a parallel rendering backend that mirrors the public API of the
existing CUDA software rasterizer. The new path uses VK_EXT_mesh_shader
with CUDA-Vulkan external-memory interop so render uploads stay on the
GPU, and is selected at construction via LudusTimestampedContext while
LudusCudaTimestampedContext remains the default everywhere.

New: Vulkan context (vkutil), pipelines for polyline/polygon/obstacle
mesh+task+fragment shaders, NV->EXT GLSL converter and SPIR-V embed
scripts, JIT plugin, Python context wrapper, and a CUDA-vs-Vulkan
example/parity test. Multi-pool task shaders use a force_zero_tasks
flag to keep over-dispatched workgroups' SSBO reads in-bounds, which
is what prevents the giant cross-pool garbage triangles seen with the
naive early-EmitMeshTasksEXT(0) pattern.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant