Skip to content

perf ticket 008: visibility buffer (Nanite-style) replaces 4-MRT G-buffer #27

@proggeramlug

Description

@proggeramlug

Deferred perf ticket — see docs/perf/008-visibility-buffer.md.

Summary

Replace Bloom's current 4-MRT G-buffer (18 bytes/pixel written per fragment) with a Nanite-style visibility buffer: store only (triangle_id, u, v, mesh_id) at ~8 bytes/pixel, defer full PBR shading to a second pass that fetches vertex data from shared storage buffers. Expected gain: ≥ 50% fragment-bandwidth reduction, plus "every visible pixel shades exactly once" when combined with depth prepass.

Why deferred

Real GPU bandwidth win (~14 MB/frame saved at 1600×900 × overdraw factor, on a benchmark that currently writes 26 MB/pass) but invisible behind the vsync cap on Sponza. Main perf target (60 fps at full visual quality) is already met; any further bandwidth reduction just gives headroom we can't measure on the current benchmark machine.

Reopen criteria

  • A target scene pushes past the 16.7 ms vsync ceiling on the benchmark machine. Remaining GPU-side lever for bandwidth-bound scenes.
  • Integrated / mobile GPUs become a priority. Bandwidth matters disproportionately more on tile-based and integrated hardware; this ticket is the single biggest available reduction.
  • Overdraw-heavy scenes (foliage, hair, transparent-dense particles) become the target.

Prerequisites

  • Ticket 009 (unified vertex + index buffers + per-mesh descriptor buffer) is a hard prerequisite — the shading pass needs a single bindless-style fetch across all meshes.
  • Ticket 005 (depth prepass) becomes useful again at that point; land alongside.

Effort

~2+ weeks for the baseline redesign: main_hdr_pass output becomes Rgba32Uint (tri_id, u, v, mesh_id) only, new shading pass evaluates PBR from storage-buffer vertex fetches, downstream MRT consumers (SSR / SSGI / SSAO / post-FX) rewired to read from the rebuilt material channels.

Quick-win intermediate (still deferred, ~2 days)

The ticket also documents a simpler intermediate step: drop unused MRTs when the dependent post-FX is disabled (velocity_rt only needed with TAA / motion blur; albedo_rt only needed with SSGI / SSR; material_rt only needed for SSR). That's a 30-50 % MRT bandwidth cut specifically for low-quality modes on integrated hardware — worth doing when targeting those adapters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions