Make hierarchical Z buffer generation properly conservative. #22603

pcwalton · 2026-01-20T01:19:46Z

The single-pass downsampling (SPD) shader is properly conservative only for depth buffers with size lengths that are powers of two. This is because it assumes that, for any texel in mip level N+1, all texels in mip level N that contribute to that texel are contained within at most a 2×2 square, which is only true for textures that have side lengths that have powers of two. (For textures that have side lengths that aren't powers of two, proper conservative downsampling may require sampling up to a 3×3 square.)

This PR solves the problem in a conservative way, by conceptually rounding up the side lengths of the depth buffer to the next power of two and scaling the depth buffer appropriately before performing downsampling. This ensures that the SPD shader only sees textures with side lengths that are powers of two at every step of the operation. Note "conceptually"; in reality this patch doesn't actually generate such an intermediate scaled texture. Instead, it changes the load_mip_0 function in the shader to return the value that would have been produced by sampling such a scaled depth buffer. This is obviously more efficient than actually performing such a scaling operation.

The sampling operations in the mesh preprocessing occlusion culling code required no changes, as they simply use textureDimensions on the hierarchical Z buffer to determine its size. I did, however, have to change the meshlet code to use textureDimensions like the mesh preprocessing code does. The meshlet culling indeed seems less broken now (albeit still broken); the rabbits on the right side don't flicker anymore in my testing.

Note that this approach, while popular (e.g. in zeux's Niagara), is more conservative than a single-pass downsampler that properly handles 3×3 texel blocks would be. However, such a downsampler would be complex, and I figured it was better to make our occlusion culling correct, simple, and fast rather than possibly-complex and slow.

This fix allows us to move occlusion culling out of experimental status. I opted not to do that in this PR in order to make it easier to review, but a follow-up PR should do that.

The single-pass downsampling (SPD) shader is properly conservative only for depth buffers with size lengths that are powers of two. This is because it assumes that, for any texel in mip level N+1, all texels in mip level N that contribute to that texel are contained within at most a 2×2 square, which is only true for textures that have side lengths that have powers of two. (For textures that have side lengths that aren't powers of two, proper conservative downsampling may require sampling up to a 3×3 square.) This PR solves the problem in a conservative way, by conceptually rounding up the side lengths of the depth buffer to the *next* power of two and scaling the depth buffer appropriately before performing downsampling. This ensures that the SPD shader only sees textures with side lengths that are powers of two at every step of the operation. Note "conceptually"; in reality this patch doesn't actually generate such an intermediate scaled texture. Instead, it changes the `load_mip_0` function in the shader to return the value that *would* have been produced by sampling such a scaled depth buffer. This is obviously more efficient than actually performing such a scaling operation. The sampling operations in the mesh preprocessing occlusion culling code required no changes, as they simply use `textureDimensions` on the hierarchical Z buffer to determine its size. I did, however, have to change the meshlet code to use `textureDimensions` like the mesh preprocessing code does. The meshlet culling indeed seems less broken now (albeit still broken); the rabbits on the right side don't flicker anymore in my testing. Note that this approach, while popular (e.g. in zeux's [Niagara]), is more conservative than a single-pass downsampler that properly handles 3×3 texel blocks would be. However, such a downsampler would be complex, and I figured it was better to make our occlusion culling correct, simple, and fast rather than possibly-complex and slow. This fix allows us to move occlusion culling out of experimental status. I opted not to do that in this PR in order to make it easier to review, but a follow-up PR should do that. [Niagara]: zeux/niagara#15 (comment)

JMS55 · 2026-01-20T02:08:46Z

crates/bevy_pbr/src/meshlet/meshlet_cull_shared.wgsl


    // note: add 1 before max because the unsigned overflow behavior is intentional
    // it wraps around firstLeadingBit(0) = ~0 to 0
    // TODO: we actually sample a 4x4 block, so ideally this would be `max(..., 3u) - 3u`.


@atlv24 should we change this to 3u now?

ill look over and debug meshlets stuff once this merges, but yeah probably. id leave it as is for now though

tychedelia

Really elegant solution. Excited to move occlusion culling out of experimental.

atlv24

looks good :)

With bevyengine#22603 landed, all known issues that could cause Bevy to cull meshes that shouldn't have been culled are fixed, so there now seems to be consensus that we can remove occlusion culling from the `experimental` namespace. This patch does that (and in fact removes the `experimental` module from `bevy_render` entirely, as it's now empty).

With #22603 landed, all known issues that could cause Bevy to cull meshes that shouldn't have been culled are fixed, so there now seems to be consensus that we can remove occlusion culling from the `experimental` namespace. This patch does that (and in fact removes the `experimental` module from `bevy_render` entirely, as it's now empty).

pcwalton requested review from JMS55, atlv24 and tychedelia January 20, 2026 01:19

pcwalton added the A-Rendering Drawing game state to the screen label Jan 20, 2026

github-project-automation bot added this to Rendering Jan 20, 2026

pcwalton added the C-Bug An unexpected or incorrect behavior label Jan 20, 2026

alice-i-cecile requested a review from aevyrie January 20, 2026 01:23

JMS55 reviewed Jan 20, 2026

View reviewed changes

tychedelia approved these changes Jan 20, 2026

View reviewed changes

atlv24 approved these changes Jan 20, 2026

View reviewed changes

atlv24 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jan 20, 2026

superdump approved these changes Jan 20, 2026

View reviewed changes

superdump added this pull request to the merge queue Jan 20, 2026

Merged via the queue into bevyengine:main with commit 2d4bf0c Jan 20, 2026
45 checks passed

github-project-automation bot moved this to Done in Rendering Jan 20, 2026

pcwalton deleted the hi-z-pot branch January 21, 2026 08:05

pcwalton mentioned this pull request Jan 21, 2026

Move occlusion culling out of the experimental namespace. #22631

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make hierarchical Z buffer generation properly conservative. #22603

Make hierarchical Z buffer generation properly conservative. #22603

Uh oh!

pcwalton commented Jan 20, 2026

Uh oh!

JMS55 Jan 20, 2026

Uh oh!

atlv24 Jan 20, 2026

Uh oh!

tychedelia left a comment

Uh oh!

atlv24 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Make hierarchical Z buffer generation properly conservative. #22603

Make hierarchical Z buffer generation properly conservative. #22603

Uh oh!

Conversation

pcwalton commented Jan 20, 2026

Uh oh!

JMS55 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

atlv24 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

tychedelia left a comment

Choose a reason for hiding this comment

Uh oh!

atlv24 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants