Skip to content

[Batch 3] Integrate GPU frustum culling into WorldRenderer #379

@MichaelFisher1997

Description

@MichaelFisher1997

Summary

Replace the CPU-side frustum culling loop in WorldRenderer.render() with the GPU compute shader from #375. The visible chunk list will be determined on GPU, and draw commands will be dispatched from the GPU-generated indirect buffer.

Depends on: #375 (GPU frustum compute shader + AABB buffer)

Current Behavior

WorldRenderer.render() (line 131):

  1. Lock chunks_mutex shared
  2. Iterate all chunk positions within render distance (nested x/z loop, line 159-173)
  3. For each: check if chunk exists, check if renderable, call frustum.intersectsChunkRelative()
  4. Append visible chunks to visible_chunks list
  5. For each visible chunk: set model matrix, call drawOffset() per pass

At 128 chunks render distance, step 2 touches 65K+ positions. At 256 chunks, 260K+. This CPU loop becomes the bottleneck.

Target Behavior

  1. Each frame: upload dirty chunk AABBs to GPU storage buffer
  2. Dispatch compute shader with current frustum planes
  3. Compute shader writes DrawIndirectCommand buffer for visible chunks
  4. Read back visible count (or use vkCmdDrawIndirectCount if available)
  5. Execute draw commands from GPU-generated buffer
  6. No CPU iteration over chunk positions at all

Implementation Plan

Step 1: AABB buffer management

Step 2: Replace render() frustum loop

pub fn render(self: *WorldRenderer, view_proj: Mat4, camera_pos: Vec3, ...) void {
    // Old: iterate chunks, CPU cull, append to visible_chunks
    // New: upload frustum planes, dispatch compute, sync barrier
    self.culling_system.dispatch(view_proj, chunk_count);
    
    // Execute draw commands from GPU buffer
    // Need per-chunk model matrix somehow — either:
    //   a) Instance data buffer (from MDI #371)
    //   b) Push model matrix per draw in indirect loop
}

Step 3: Model matrix handling

  • With MDI ([Batch 1] Wire up Multi-Draw Indirect in WorldRenderer #371), model matrices are in the instance buffer already
  • GPU culling shader can write the correct firstInstance index to map each visible chunk to its instance data
  • Or: compute shader writes model-relative offsets directly into per-draw push constants

Step 4: Shadow pass

  • Same GPU culling applied to shadow pass
  • Different frustum (light-space), different render distance
  • May need separate dispatch or shader parameterization

Step 5: Fallback path

  • Keep CPU culling as fallback for devices without compute shaders
  • Runtime detection: if (culling_system.available) gpu_cull else cpu_cull

Files to Modify

  • src/world/world_renderer.zig — main render loop, integrate CullingSystem
  • src/engine/graphics/vulkan/culling_system.zig — wire into frame lifecycle
  • src/world/chunk_storage.zig — expose iteration for AABB updates

Testing

  • Visual parity: same chunks rendered as CPU culling
  • Shadow pass produces same results
  • Draw call count unchanged (should match CPU approach)
  • No validation errors
  • Performance improvement measurable in timing overlay
  • Fallback works on devices without compute

Roadmap: docs/PERFORMANCE_ROADMAP.md — Batch 3, Issue 2A-2

Metadata

Metadata

Assignees

No one assigned

    Labels

    batch-3Batch 3: IntegrationbugSomething isn't workingdocumentationImprovements or additions to documentationenhancementNew feature or requestperf/renderingRendering pipeline performancequestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions