-
Notifications
You must be signed in to change notification settings - Fork 855
Volumetric optimizations #2272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volumetric optimizations #2272
Conversation
Fixed the second pass in the denoising being done on the wrong axis. Removing the reprojection frame setting. Making the volumetric scene view reprojection work. Reduce the cost of the volumetric lighting gaussian denoising. Adding the option to have only the directional light contributing to the volumetrics. Adding a new configuration mode for screen resolution and volume slices. Changing the maximal values for the volumetric resolution and slices. Adding a more intuitive way of defining the denoising process of the volumetrics. Changing what is in the advanced mode for volumetrics Renaming user-facing parameter names. Displaying an info message when the user triggers anistropy Restoring the right values for the tests
It appears that you made a non-draft PR! |
With @TomasKiniulis we found issues related to the light list pruning in the Amelienborg Palace project. Looking into it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now!
- Validated that no warnings appear in console when project is opened
- Checked different editor layouts with split/not split game and scene view, different game view resolutions
- Checked local fog Volumes with QA FTP project
- Checked Amelienborg project
No issues found anymore, also no difference doing fog comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good changes, but I don't recommend spending any time on the code that touches the light lists since it will be rewritten in the future.
We already have a shader that generates a depth pyramid. It just generates min-z rather than max-z. We should generate both in a single pass (it is more efficient and avoids duplication of code). |
The problem with that is that it is built to generate the full pyramid, hence it goes in single downsample pass of 2x2 and output to a bigger texture. The access pattern is different. No need to store the intermediate maxZ (the 2x2 and 4x4). I definitely plan to piggy back one of the other passes tho (see PR description), first downsample pass is done by the low res translucency depth downsample not the depth pyramid and I think will put this there. But I want to do as a second PR as I did not want to add complexity to the testing of this PR. |
# Conflicts: # com.unity.render-pipelines.high-definition/CHANGELOG.md # com.unity.render-pipelines.high-definition/Runtime/Lighting/AtmosphericScattering/Fog.cs # com.unity.render-pipelines.high-definition/Runtime/Lighting/VolumetricLighting/VolumetricLighting.compute # com.unity.render-pipelines.high-definition/Runtime/Lighting/VolumetricLighting/VolumetricLighting.cs # com.unity.render-pipelines.high-definition/Runtime/Lighting/VolumetricLighting/VolumetricLightingFiltering.compute
* Fixed a Nan issue in the volumetric lighting filtering. Fixed the second pass in the denoising being done on the wrong axis. Removing the reprojection frame setting. Making the volumetric scene view reprojection work. Reduce the cost of the volumetric lighting gaussian denoising. Adding the option to have only the directional light contributing to the volumetrics. Adding a new configuration mode for screen resolution and volume slices. Changing the maximal values for the volumetric resolution and slices. Adding a more intuitive way of defining the denoising process of the volumetrics. Changing what is in the advanced mode for volumetrics Renaming user-facing parameter names. Displaying an info message when the user triggers anistropy Restoring the right values for the tests * Fixing compilation after bad merge * super hacky proof of concept * Kinda work, yay * Finalize max Z * Optional gradient * Optimize filtering of vbuffer (2x boost) * Starting to move to RG (committing to switch branch) * Finish port to render graph * Small cleanup and dilation width variable. * Simpler PCF for directional in volumetrics * Local light list * Do dither transition on ultra low * Added comments * Separate define for the light list trim * Remove unused old PCF test * Tentative fix for rthandle issue. * Tentative fix for issue with tiling artifacts. * Fix shader warnings * Fix issue when fixing warning. * Fix for warning on vulkan * Temptative fix for vr issues? * Try revert asset to what is in master * Push update of resource asset * Filtering filtering in XR * Fix issue on metal. * Update references Co-authored-by: Anis <anis@unity3d.com> Co-authored-by: Sebastien Lagarde <sebastien@unity3d.com>
TL;DR Done various optimizations to volumetric lighting pass, in the template scene this lead to 4x times faster volumetric lighting pass. 6ms win on the template scene on PS4.
Note Anis' PR (#1806) needs to be merged before this, here I am targeting that to have a clean diff.
With this change, the area of the template that looked the worse perf wise went from 8.3 ms down to 2.3 ms so close to 4x speed. Measurements took on PS4 with Async compute disabled to avoid overlaps that are unrelated to volumetrics.
This is the scene mentioned above:
Visually is pretty much identical before/after (If desired can post the before after)
Description
There are few optimizations here and I have the feeling a bit more can be done, but I wanted to keep the main algorithm untouched for this round. Also, even the optimizations done here are improvable, wanted to do a first pass here.
The following is happening in the PR:
Rewritten the filtering pass: kept filtering kernel the same, however I changed it so that we process one slice per thread (better scheduling of waves) and made use of LDS to do the filtering in one pass and with fewer samples. Before we had two passes with 3 samples each, now only one with 2 samples per thread. This changed the cost from 0.72ms to 0.36ms.
Max Z culling: In this PR a Max Z texture is generated, the resolution of this depends on the screen resolution and VBuffer resolution. In all cases it is generated so it is a bit conservative in size. Moreover, the maxZ is then dilated a bit more to avoid issues at edges and to be even more conservative [This can be made tighter in the future].
Once this is generated, we can safely skip lighting computations for all voxels that are fully behind the maxZ as they are surely not visible as we check in a very conservative way.
The PR also includes something I was experimenting with, which are a gradient mask to detect edges that is further dilated. This was used to be even more conservative and avoid the skipping where strong edges were present. However I never observed a situation in which that would actually be needed. Left the code as it is something I want to experiment if we go around trying to be more aggressive on the MaxZ case.
This maxZ generation can be optimized further and possibly shared with other passes (if for example we move the low res downsample of the depth buffer to compute). Planning to do this in future.
Cheaper PCF for directional light: Because the filtering we were using was already pretty low quality and because we operate in low res and very often we blur the result anyway, I changed it to still have PCF, but on the results of samples from a single gather. This saves quite a bit of ALU and bandwidth, while looking pretty much identical in all uses cases I found. I am sure we can find a case in which a slight difference is visible, but I couldn't :-)
"Pre-filter" light list: This is the optimization that is the most raw and widely improvable in this PR, but since light list generation is being reworked I did not want to spend too much time on this. I generate a list in LDS of lights that actually impact the volumetrics so that light data of irrelevant lights is loaded only once per thread. This is a very big win in scenes like the template where a lot of punctual lights are set to not affect volumetric. Current downside is that to keep occupancy not horrible, we have a limit of 48 lights that are considered per voxel. Now one could argue above 48 lights you'll probably run really slow anyway, but it is a limitation which is easily removed by improving the pass (which I plan to do as soon as we have the new light culling in)
As I said, this can be done better, we should probably have the filtered list generated in a separate pass and we could filter even more using density of the volumetric + light intensity (low intensity with low density is essentially invisible). Removing the max light list limitation is also easy if we do it like that.
Testing: The local fog tests pass (I tested before latest merging of the base PR, will check again soon), but I'd like a thorough check from QA to verify if there are regressions in other kind of scenes.