-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Make hierarchical Z buffer generation properly conservative. #22603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The single-pass downsampling (SPD) shader is properly conservative only for depth buffers with size lengths that are powers of two. This is because it assumes that, for any texel in mip level N+1, all texels in mip level N that contribute to that texel are contained within at most a 2×2 square, which is only true for textures that have side lengths that have powers of two. (For textures that have side lengths that aren't powers of two, proper conservative downsampling may require sampling up to a 3×3 square.) This PR solves the problem in a conservative way, by conceptually rounding up the side lengths of the depth buffer to the *next* power of two and scaling the depth buffer appropriately before performing downsampling. This ensures that the SPD shader only sees textures with side lengths that are powers of two at every step of the operation. Note "conceptually"; in reality this patch doesn't actually generate such an intermediate scaled texture. Instead, it changes the `load_mip_0` function in the shader to return the value that *would* have been produced by sampling such a scaled depth buffer. This is obviously more efficient than actually performing such a scaling operation. The sampling operations in the mesh preprocessing occlusion culling code required no changes, as they simply use `textureDimensions` on the hierarchical Z buffer to determine its size. I did, however, have to change the meshlet code to use `textureDimensions` like the mesh preprocessing code does. The meshlet culling indeed seems less broken now (albeit still broken); the rabbits on the right side don't flicker anymore in my testing. Note that this approach, while popular (e.g. in zeux's [Niagara]), is more conservative than a single-pass downsampler that properly handles 3×3 texel blocks would be. However, such a downsampler would be complex, and I figured it was better to make our occlusion culling correct, simple, and fast rather than possibly-complex and slow. This fix allows us to move occlusion culling out of experimental status. I opted not to do that in this PR in order to make it easier to review, but a follow-up PR should do that. [Niagara]: zeux/niagara#15 (comment)
|
|
||
| // note: add 1 before max because the unsigned overflow behavior is intentional | ||
| // it wraps around firstLeadingBit(0) = ~0 to 0 | ||
| // TODO: we actually sample a 4x4 block, so ideally this would be `max(..., 3u) - 3u`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atlv24 should we change this to 3u now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ill look over and debug meshlets stuff once this merges, but yeah probably. id leave it as is for now though
tychedelia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really elegant solution. Excited to move occlusion culling out of experimental.
atlv24
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good :)
With bevyengine#22603 landed, all known issues that could cause Bevy to cull meshes that shouldn't have been culled are fixed, so there now seems to be consensus that we can remove occlusion culling from the `experimental` namespace. This patch does that (and in fact removes the `experimental` module from `bevy_render` entirely, as it's now empty).
With #22603 landed, all known issues that could cause Bevy to cull meshes that shouldn't have been culled are fixed, so there now seems to be consensus that we can remove occlusion culling from the `experimental` namespace. This patch does that (and in fact removes the `experimental` module from `bevy_render` entirely, as it's now empty).
The single-pass downsampling (SPD) shader is properly conservative only for depth buffers with size lengths that are powers of two. This is because it assumes that, for any texel in mip level N+1, all texels in mip level N that contribute to that texel are contained within at most a 2×2 square, which is only true for textures that have side lengths that have powers of two. (For textures that have side lengths that aren't powers of two, proper conservative downsampling may require sampling up to a 3×3 square.)
This PR solves the problem in a conservative way, by conceptually rounding up the side lengths of the depth buffer to the next power of two and scaling the depth buffer appropriately before performing downsampling. This ensures that the SPD shader only sees textures with side lengths that are powers of two at every step of the operation. Note "conceptually"; in reality this patch doesn't actually generate such an intermediate scaled texture. Instead, it changes the
load_mip_0function in the shader to return the value that would have been produced by sampling such a scaled depth buffer. This is obviously more efficient than actually performing such a scaling operation.The sampling operations in the mesh preprocessing occlusion culling code required no changes, as they simply use
textureDimensionson the hierarchical Z buffer to determine its size. I did, however, have to change the meshlet code to usetextureDimensionslike the mesh preprocessing code does. The meshlet culling indeed seems less broken now (albeit still broken); the rabbits on the right side don't flicker anymore in my testing.Note that this approach, while popular (e.g. in zeux's Niagara), is more conservative than a single-pass downsampler that properly handles 3×3 texel blocks would be. However, such a downsampler would be complex, and I figured it was better to make our occlusion culling correct, simple, and fast rather than possibly-complex and slow.
This fix allows us to move occlusion culling out of experimental status. I opted not to do that in this PR in order to make it easier to review, but a follow-up PR should do that.