-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bloom causes massive performance hit (>50% framerate reduction) #13109
Comments
RenderDoc is not really a great GPU profiling tool. You'll want to use NSight/RGP/Xcode/GPA/etc (Nvidia, AMD, Apple, Intel) depending on your GPU manufacturer. CPU metrics are also important for rendering - maybe recording so many render passes is expensive. Tracy will show you that. I can take a look later this week and figure out why it's expensive. |
Thanks for the tip. Will definitely check them out and even give Tracy a go next time I do some profiling. The performance hit happens without profiling and can be seen in the first two bloom on/off pictures. Those pictures are without profiling. There I am just using bevy frame diagnostics to track the frames per second. Sorry for not clarifying that when using those pictures to explain the issue. In this case - all the profiling is doing is confirming the >50% reduction is in fact coming from the bloom pass. The bloom pass taking up just over 50% of the render time in the profiling lines up 1:1 with the framerate reduction seen when not profiling. Currently I'm taking a look into the first two downsampling and last two upsampling passes in the code to see if I can get more information or optimize anything. Please also ignore me open/closing the issue. Misclick! |
I reduced bloom.wgsl to its simplest form and only noticed a negligible increase in performance with bevy frame diagnostics and the profiler (somewhere between 5-10 fps improvement?). Here is the reduced code:
|
After moving on to the render code.. So far my only clue has been this: When I divide into the mip dimensions in bloom/mod.rs to reduce it, I get a good portion of the the frames back. Obviously this isn't a solution or anything. Just sharing what I found before throwing my hands in the air for the day. |
On my system, at 1080p, it takes 355 microseconds on the CPU (according to Tracy) to encode rendering commands for bloom, and 0.20ms of GPU time (according to NSight) to execute those commands.
|
I'm probably a bit newer to graphics programming than some of you @JMS55. Can you go into detail into what that means for you? Are you not seeing a huge hit to your framerate using that Nvidia card with bloom enabled? Here are my AMD profiler results (1200p 16:10) Bloom/hdr/tonemapping on (250fps): Here is off (700fps): If you aren't seeing the same issue perhaps it's just another AMD "feature" and can be marked as a driver bug for now? |
Yes I don't see much of a slowdown between bloom on/off. Just to check are you only toggling bloom? You're leaving tonemapping and Camera::hdr the same between runs? |
I'm turning all three on yes, but tonemapping itself can be on/off without a difference. I'll provide the code just to fully clarify. 250fps (hdr on w/ bloom on and tonemapping (optional):
700-750+fps (nothing on):
500fps (hdr on w/ bloom off):
And this happens in every case I've tried running it. Different version, local fresh bevy deps, using bevys examples etc. |
Could you use frame times instead of frame rates? 700 fps: 1.43ms |
@superdump sure np i'll make sure to convert to ms whenever I can moving forward. Sorry about that. |
It might also help to note that this occurs even without any actual bloom in the scene. Blank screens get the same hit when changing the camera settings. |
Just to confirm what @JMS55 was saying, I tested it on an older Nvidia machine with a different display (1050 ti, 1080p display) also using Vulkan and the performance loss was a tad bit less, but for me it was still around 25-30% on average. I'd be interested in seeing benchmarks on different resolutions. For now I'm just going to take the node out, as I don't really need it in my graphics pipeline atm anyways. If there's anything else I can share that will help, just let me know. |
I also get this 50% performance hit on bloom_2d.
|
I've also noticed this with simple scenes. It should be possible to rewrite bloom to use a compute shader for down- and upscaling like SPD. |
OS = "Win11"
Bevy version = "0.14.0-dev" && "0.13.2"
AdapterInfo { name: "AMD Radeon RX 7700S", vendor: 4098, device: 29824, device_type: DiscreteGpu, driver: "AMD proprietary driver", driver_info: "24.3.1 (LLPC)", backend: Vulkan }
[profile.dev.package."*"]
opt-level = 3
[profile.dev]
opt-level = 3
[profile.release]
lto = true
opt-level = 3
codegen-units = 1
incremental = false
debug = false
What you did
Enabled bloom
What went wrong
Enabling bloom gives a massive hit to performance in both debug and release. In these examples I'll use debug (as nothing changed between them in RenderDoc except overall framerate).
In other game engines I've never personally experienced a hit this large from bloom. This was tested with both a personal example and the bloom_2d example in 0.14, as well as a simple bloom setup in 0.13.2. I tested bloom in Unity with a similar scene for a quick comparison and the performance impact in Unity is around 10% give or take - which iirc from using other engines is about the standard hit.
In Bevy issues I did a quick search for issues related to bloom and didn't see any mention of a performance profile on the bloom pass. With that in mind I figured opening up an issue would be best. If I missed an open issue relating to this please point me in the right direction - as I plan to explore the bloom passes code more in the future.
Bloom off:
Bloom on:
RenderDoc info
Full time:
Bloom pass:
Conclusion:
Based on this information from RenderDoc - it would seem that the bloom pass is heavier on rendering than everything else combined. Bloom takes up more than 50% of the render time when enabled.
The text was updated successfully, but these errors were encountered: