-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Divide large constant buffer into subsets and implement push constants for performance #855
base: master
Are you sure you want to change the base?
Conversation
…s for Vulkan and DX12 (cherry picked from commit 5320c57)
…ename helper arrays for clarity (cherry picked from commit 54f5ffb)
(cherry picked from commit 509b196)
… constants (cherry picked from commit 5a76607)
…default on) (cherry picked from commit 2063c72)
…fers / push constants (cherry picked from commit c1f712a)
…ts not enabled (cherry picked from commit ea6d698)
(cherry picked from commit ee3b6f9)
…h DX12 and avoid merge conflict
Now updated to your latest master with retro rendering modes and crt post processing. The merge was large given the number of changes to master, but fairly straight forward with respect to the new shaders. Fortunately they all fell into the Here is the updated mapping spreadsheet: Binding to Shader Mapping v5.xlsx At first I was not sure about the new modes, but now I really like the PSX + Newpixie CRT setting. Feels very 90s! |
Yeah the retro modes are something I had in mind for a longer time because there are enough people who lurk into this engine for indie dev and I personally grew up as a kid with the C64, CPC 6128 and Amiga 600. |
Thanks for the heads up re your PSX branch and a new renderparm. Is there any way I could have a look at that to understand the implications for this branch and future subset handing? |
Updated to be compatible with nvrhi + ShaderMake rebase.
|
Also set ShaderMake |
@RobertBeckebans I have updated this branch/PR to be compatible with your new PSX vertex jitter and affine rendering modes. It was a bit more work than last time since the renderparms had changed for many shaders. I had to do a bit of refactoring to make as much as possible fit into Vulkan push constant limits: 128 bytes for AMD on Windows, 256 bytes for nVidia on Windows/linux plus AMD on linux. macOS was not an issue since the push constant limit there is 4096 bytes, but smaller is still faster and helps with performance on Apple. I have attached the new renderparm to shader mapping spreadsheet here for anyone who cares to see how it works. Binding to Shader Mapping v6.xlsx I am wondering what to do with this branch going forward. I know you will likely not merge this into main, but I am putting it out there for other macOS, linux, and Windows users who want a bit more performance out of Vulkan. Do you have the concept of optional components or branches (e.g. VR) that you may want to distribute in the future? Perhaps this is something that could fit into that kind of optionality framework, at least for a fixed release version. In any case, great work on the PSX rendermode stuff! I find the PSX retro look combined with the New Pixie crt filter to be pretty cool. Should be excellent as an engine for developing retro games with the original Tomb Raider look/style. Here is a screen grab of PSX mode running on macOS Ventura x86 with an AMD 6600XT card at full speed 120 fps with this PR. You can see the affine warping very well on the floor tiles: |
Ah well I made a backup of your new shader table in my GDrive and it is interesting what kind of an impact it has to just reference a few more renderparms besides the new rpPSXDistortions. I'm glad that you could track the necessary changes. I would tend to keep your changes into a macOS branch. As you saw I also fixed a math problem with the SSAO shader. More precisely the reconstruction of a world position vector from any non-linear depth value which is usually necessary for all kinds of screen space related raytracing / marching effects. Therefore I would like to give #498 a try before freezing RBDoom 1.6 and that feature will not only add a new shader but will also require changes in the gbuffer shader and the indirect lighting / lightgrid shaders. However I will give that feature a week and that does not work out then it will be post poned to 1.7. I also would like to update NVRHI & ShaderMake again to have the latest bugfixes available. |
This is mainly due to the heavy restriction on Vulkan push constant sizes in Windows and linux: 128 bytes for some implementations and 256 for others. That's not a lot of data when you are doing innovative shader calculations that require many float parameters, matrices, etc. It's a struggle to fit the parameters into these small limits.
Interesting thought. Up to now I have been thinking about this problem as a general performance solution for Vulkan across all platforms. If I were to look at a solution for macOS only, I might approach it differently since push constant limits are not a concern on that platform, and juggling to fit things into 128 or 256 bytes would not be necessary. Size optimization is still useful there, but eliminating these low threshold limits would relax the need to worry about every single shader parm addition when doing new work. In addition, if I were solving this for macOS only, the number of render parm sets could be reduced or eliminated and the general-purpose code for dynamic activation of push constant sets at runtime could be simplified. If I were to come up with a new macOS-only design for this would you be interested? |
This PR replaces the performance part of #818, which will be closed and not merged.
It has one dependency on nvrhi changes: RobertBeckebans/nvrhi#6 for relaxing nvrhi push constant limits to permit platform-specific runtime checks.(UPDATE: dependency now merged into nvrhi)This fixes the performance part of #763.
Details are as follows:
BINDING_LAYOUT_GBUFFER, BINDING_LAYOUT_GBUFFER_SKINNED, BINDING_LAYOUT_TEXTURE, BINDING_LAYOUT_TEXTURE_SKINNED, BINDING_LAYOUT_WOBBLESKY, BINDING_LAYOUT_SSGI, BINDING_LAYOUT_SSGI_SKINNED, BINDING_LAYOUT_POST_PROCESS
).r_useDX12PushConstants
cvar which is turned off by default. This can optionally be turned on using autoexec.cfg for experimentation.r_useVulkanPushConstants
(default on) which is useful for performance comparisons.Tested on Windows 10 (AMD and Nvidia), Linux Manjaro, and macOS Ventura 13.5
Performance timings for this PR vs. current master, generated using a simple home-made timedemo:
Windows Nvidia System (1070 Ti)
DX12: 263 fps before, 360 fps after (with r_useDX12PushConstants = 0) --> significant improvement
Vulkan: 218 fps before, 333 fps after --> significant improvement
Windows AMD System (6600 XT)
DX12: 295 fps before, 305 fps after (with r_useDX12PushConstants = 0) --> neutral/positive improvement
Vulkan: 150 fps before, 160 fps after --> neutral/positive improvement
Linux AMD System (6600 XT)
Vulkan: 150 fps before, 270 fps after --> large improvement
macOS AMD System (6600 XT)
Vulkan: 77 fps before, 245 fps after --> very large improvement
macOS Apple Silicon System (M1 Air)
Vulkan: 6 fps before, 85 fps after --> massive improvement
See on-screen HUD statistics (FPS, GPU Memory, CPU/GPU Relative Usage % for
com_fixedTic = 1
) in the following screenshots showing the independent impact of: a) uniforms buffer subsetting, and b) push constants.Notes re test setup:
macOS Vulkan: Baseline using current master + PR #854 but without this PR:
macOS Vulkan: Impact of uniforms buffer subsetting with push constants disabled:
macOS Vulkan: Impact of uniforms buffer subsetting with push constants enabled:
linux Vulkan: Baseline using current master + PR #854 but without this PR:
linux Vulkan: Impact of uniforms buffer subsetting with push constants disabled:
linux Vulkan: Impact of uniforms buffer subsetting with push constants enabled:
Windows Vulkan: Baseline using current master + PR #854 but without this PR:
Windows Vulkan: Impact of uniforms buffer subsetting with push constants disabled:
Windows Vulkan: Impact of uniforms buffer subsetting with push constants enabled:
Windows DX12: Baseline using current master + PR #854 but without this PR:
Windows DX12: Impact of uniforms buffer subsetting with push constants disabled:
Windows DX12: Impact of uniforms buffer subsetting with push constants enabled: