Skip to content

Conversation

@pcwalton
Copy link
Contributor

The single-pass downsampling (SPD) shader is properly conservative only for depth buffers with size lengths that are powers of two. This is because it assumes that, for any texel in mip level N+1, all texels in mip level N that contribute to that texel are contained within at most a 2×2 square, which is only true for textures that have side lengths that have powers of two. (For textures that have side lengths that aren't powers of two, proper conservative downsampling may require sampling up to a 3×3 square.)

This PR solves the problem in a conservative way, by conceptually rounding up the side lengths of the depth buffer to the next power of two and scaling the depth buffer appropriately before performing downsampling. This ensures that the SPD shader only sees textures with side lengths that are powers of two at every step of the operation. Note "conceptually"; in reality this patch doesn't actually generate such an intermediate scaled texture. Instead, it changes the load_mip_0 function in the shader to return the value that would have been produced by sampling such a scaled depth buffer. This is obviously more efficient than actually performing such a scaling operation.

The sampling operations in the mesh preprocessing occlusion culling code required no changes, as they simply use textureDimensions on the hierarchical Z buffer to determine its size. I did, however, have to change the meshlet code to use textureDimensions like the mesh preprocessing code does. The meshlet culling indeed seems less broken now (albeit still broken); the rabbits on the right side don't flicker anymore in my testing.

Note that this approach, while popular (e.g. in zeux's Niagara), is more conservative than a single-pass downsampler that properly handles 3×3 texel blocks would be. However, such a downsampler would be complex, and I figured it was better to make our occlusion culling correct, simple, and fast rather than possibly-complex and slow.

This fix allows us to move occlusion culling out of experimental status. I opted not to do that in this PR in order to make it easier to review, but a follow-up PR should do that.

The single-pass downsampling (SPD) shader is properly conservative only
for depth buffers with size lengths that are powers of two. This is
because it assumes that, for any texel in mip level N+1, all texels in
mip level N that contribute to that texel are contained within at most a
2×2 square, which is only true for textures that have side lengths that
have powers of two. (For textures that have side lengths that aren't
powers of two, proper conservative downsampling may require sampling up
to a 3×3 square.)

This PR solves the problem in a conservative way, by conceptually
rounding up the side lengths of the depth buffer to the *next* power of
two and scaling the depth buffer appropriately before performing
downsampling. This ensures that the SPD shader only sees textures with
side lengths that are powers of two at every step of the operation. Note
"conceptually"; in reality this patch doesn't actually generate such an
intermediate scaled texture. Instead, it changes the `load_mip_0`
function in the shader to return the value that *would* have been
produced by sampling such a scaled depth buffer. This is obviously more
efficient than actually performing such a scaling operation.

The sampling operations in the mesh preprocessing occlusion culling code
required no changes, as they simply use `textureDimensions` on the
hierarchical Z buffer to determine its size. I did, however, have to
change the meshlet code to use `textureDimensions` like the mesh
preprocessing code does. The meshlet culling indeed seems less broken
now (albeit still broken); the rabbits on the right side don't flicker
anymore in my testing.

Note that this approach, while popular (e.g. in zeux's [Niagara]), is
more conservative than a single-pass downsampler that properly handles
3×3 texel blocks would be. However, such a downsampler would be complex,
and I figured it was better to make our occlusion culling correct,
simple, and fast rather than possibly-complex and slow.

This fix allows us to move occlusion culling out of experimental status.
I opted not to do that in this PR in order to make it easier to review,
but a follow-up PR should do that.

[Niagara]: zeux/niagara#15 (comment)
@pcwalton pcwalton added the A-Rendering Drawing game state to the screen label Jan 20, 2026
@pcwalton pcwalton added the C-Bug An unexpected or incorrect behavior label Jan 20, 2026

// note: add 1 before max because the unsigned overflow behavior is intentional
// it wraps around firstLeadingBit(0) = ~0 to 0
// TODO: we actually sample a 4x4 block, so ideally this would be `max(..., 3u) - 3u`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atlv24 should we change this to 3u now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill look over and debug meshlets stuff once this merges, but yeah probably. id leave it as is for now though

Copy link
Member

@tychedelia tychedelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really elegant solution. Excited to move occlusion culling out of experimental.

Copy link
Contributor

@atlv24 atlv24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good :)

@atlv24 atlv24 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jan 20, 2026
@superdump superdump added this pull request to the merge queue Jan 20, 2026
Merged via the queue into bevyengine:main with commit 2d4bf0c Jan 20, 2026
45 checks passed
@pcwalton pcwalton deleted the hi-z-pot branch January 21, 2026 08:05
pcwalton added a commit to pcwalton/bevy that referenced this pull request Jan 21, 2026
With bevyengine#22603 landed, all known issues that could cause Bevy to cull
meshes that shouldn't have been culled are fixed, so there now seems to
be consensus that we can remove occlusion culling from the
`experimental` namespace. This patch does that (and in fact removes the
`experimental` module from `bevy_render` entirely, as it's now empty).
github-merge-queue bot pushed a commit that referenced this pull request Jan 21, 2026
With #22603 landed, all known issues that could cause Bevy to cull
meshes that shouldn't have been culled are fixed, so there now seems to
be consensus that we can remove occlusion culling from the
`experimental` namespace. This patch does that (and in fact removes the
`experimental` module from `bevy_render` entirely, as it's now empty).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants