perf ticket 008: visibility buffer (Nanite-style) replaces 4-MRT G-buffer

Deferred perf ticket — see [docs/perf/008-visibility-buffer.md](../blob/main/docs/perf/008-visibility-buffer.md).

## Summary

Replace Bloom's current 4-MRT G-buffer (18 bytes/pixel written per fragment) with a Nanite-style visibility buffer: store only `(triangle_id, u, v, mesh_id)` at ~8 bytes/pixel, defer full PBR shading to a second pass that fetches vertex data from shared storage buffers. Expected gain: **≥ 50% fragment-bandwidth reduction**, plus "every visible pixel shades exactly once" when combined with depth prepass.

## Why deferred

Real GPU bandwidth win (~14 MB/frame saved at 1600×900 × overdraw factor, on a benchmark that currently writes 26 MB/pass) but **invisible behind the vsync cap on Sponza**. Main perf target (60 fps at full visual quality) is already met; any further bandwidth reduction just gives headroom we can't measure on the current benchmark machine.

## Reopen criteria

- **A target scene pushes past the 16.7 ms vsync ceiling** on the benchmark machine. Remaining GPU-side lever for bandwidth-bound scenes.
- **Integrated / mobile GPUs become a priority.** Bandwidth matters disproportionately more on tile-based and integrated hardware; this ticket is the single biggest available reduction.
- **Overdraw-heavy scenes** (foliage, hair, transparent-dense particles) become the target.

## Prerequisites

- Ticket 009 (unified vertex + index buffers + per-mesh descriptor buffer) is a **hard prerequisite** — the shading pass needs a single bindless-style fetch across all meshes.
- Ticket 005 (depth prepass) becomes useful again at that point; land alongside.

## Effort

**~2+ weeks** for the baseline redesign: main_hdr_pass output becomes `Rgba32Uint (tri_id, u, v, mesh_id)` only, new shading pass evaluates PBR from storage-buffer vertex fetches, downstream MRT consumers (SSR / SSGI / SSAO / post-FX) rewired to read from the rebuilt material channels.

## Quick-win intermediate (still deferred, ~2 days)

The ticket also documents a simpler intermediate step: drop unused MRTs when the dependent post-FX is disabled (velocity_rt only needed with TAA / motion blur; albedo_rt only needed with SSGI / SSR; material_rt only needed for SSR). That's a 30-50 % MRT bandwidth cut specifically for low-quality modes on integrated hardware — worth doing when targeting those adapters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf ticket 008: visibility buffer (Nanite-style) replaces 4-MRT G-buffer #27

Summary

Why deferred

Reopen criteria

Prerequisites

Effort

Quick-win intermediate (still deferred, ~2 days)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

perf ticket 008: visibility buffer (Nanite-style) replaces 4-MRT G-buffer #27

Description

Summary

Why deferred

Reopen criteria

Prerequisites

Effort

Quick-win intermediate (still deferred, ~2 days)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions