perf ticket 010: async compute for post-FX (blocked on wgpu multi-queue)

Deferred perf ticket — see [docs/perf/010-async-compute.md](../blob/main/docs/perf/010-async-compute.md).

## Summary

Run post-FX passes (SSAO / SSR / SSGI / bloom) on a dedicated compute queue in parallel with the next frame's graphics work (shadow + main HDR). UE5 uses this pattern to hide ~20% of post-FX latency. Expected gain on Sponza: ~1.3 ms of the 16.7 ms vsync budget.

## Why deferred — upstream blocker

Audited wgpu 29 (current pin): `Adapter::request_device` returns exactly one `(Device, Queue)`. There is no `Device::get_compute_queue()` / `Instance::request_multiple_queues` / queue-family API. The only concrete paths to actually implement this today:

1. **Drop to wgpu-hal directly** for the second queue. wgpu-hal has per-backend `Queue` abstractions (Metal / DX12 / Vulkan each support multiple queues at the hal level), but mixing wgpu-core and wgpu-hal in the same renderer is fragile — lifetime and submission-ordering guarantees differ, and we'd lose the safe wgpu-core API for every resource the compute queue touches. Effectively rewrites the post-FX layer on a different abstraction. **~2-3 weeks.**
2. **Native per-platform:** `metal-rs` on macOS, `windows-rs` / DX12 on Windows, `ash` / Vulkan on Linux + Android. Three separate implementations, each with its own sync primitives. **~3+ weeks.**
3. **Wait for wgpu upstream.** Multi-queue support has been discussed but is not on a near-term roadmap as of wgpu 29.

The ticket's own sub-suggestion — "prototype serial-equivalent ordering first" (split encoders, same queue) — doesn't help: on a single queue, one big encoder + one submit generally outperforms multiple smaller submits because every submit introduces driver overhead without enabling parallelism.

## Reopen criteria

- **A target scene pushes past the 16.7 ms vsync ceiling** and post-FX is the bottleneck.
- **wgpu lands a stable multi-queue API** upstream (track wgpu releases).

## Why not worth a multi-week redesign today

Estimated gain is ~1.3 ms of a 16.7 ms budget on a scene that's already vsync-capped at 60 fps. The saved milliseconds would be invisible behind the vsync cap. Not worth the cross-platform correctness risk until we have a scene that actually pushes past the ceiling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf ticket 010: async compute for post-FX (blocked on wgpu multi-queue) #29

Summary

Why deferred — upstream blocker

Reopen criteria

Why not worth a multi-week redesign today

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

perf ticket 010: async compute for post-FX (blocked on wgpu multi-queue) #29

Description

Summary

Why deferred — upstream blocker

Reopen criteria

Why not worth a multi-week redesign today

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions