-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GS/Vulkan: Use attachment clear for ONE stencil #10872
Conversation
66% faster in Persona 3 in DATE-heavy scenes.
We manually clear the drawn region when it's needed, in all other cases it's pre-filled with the setup. Therefore, the two load actions should be preserve and don't care.
Some benchmarks on the big changes at 8x internal. AFL 2007 61 fps to 158 fps |
Tested Scarface on native and on 8x. For 8x, I couldn't tell any sort of difference without some sort of benchmarking software, as they both fell within the 75 to 82 FPS range. The PR started off faster on average, but it fell a bit, and at this point I'm attributing that to noise. However, for the native testing, I started with 950 VFPS on the main build, while the PR build dipped the VFPS to 650, and I don't know why. Both consistently hovered around that 950/650 range, so there's about a -300 FPS delta. Scarface - The World is Yours_SLUS-21111_20240301055150.zip This is the dump I made for testing. |
Update: I forgot that I'd already tested an AppImage previously and thus I had graphics settings non-default for the PR. VFPS now seems to be most concentrated around 83–85 for Scarface on PR, while it's more around 79 on main. Basically, the range for main seems to be 75 to 81, whereas here it seems to be 79 to 86. May or may not have some positive delta on native resolution, but certainly it's not a drop. Sorry about that. TL;DR: Small but noticeable improvement for Scarface at 8x. |
Hi, is there any possibility of a speed improvements on mortal kombat: shaolin monks, onimusha: dawn of dreams, nano breaker and shadow of the colossus? |
Please do not hijack issues with requests. If we find something to make things faster, we'll do it, otherwise have a go yourself. |
Hello, i'm not hijacking this thread, i only asked because i saw the post of jordan about speed improvements and got curious if it could improve other games. Also, i dont know how to compile a build. I'm just a common user. |
okay, I mean if you want to check those games, then I suggest you grab the PR build and give it a go. We can only test the games we have. But It's unlikely it's going to help those games. That said we have GS dumps of most of those games (I think all except nano breaker), so if they weren't listed above, it's unlikely they made much difference. |
All those games run fine, and AFAIK don't have excessive statistics, so no reason to investigate them. Unlike the original Persona 3 issue that prompted this change in the first place. |
Description of Changes
While investigating a Persona 3 dump, I noticed that we were fully clearing the buffer when initializing the stencil for first-write-wins (ONE). This was about 20us of GPU time at 8x upscaling on my GPU, as seen below:
Instead, since we only care about a tiny region of the framebuffer, we can use an attachment clear (which I'm guessing will get lowered to a draw that sets stencil in the driver), and only write the region that we actually need to load. This reduces the cost of the draw by approximately 85%.
Overall, this increases performance by approximately 66% in the Persona 3 dump at 16x upscaling.
Rationale behind Changes
Lots of render pass reductions. The main ones:
Suggested Testing Steps
Runner says it's okay. But @JordanTheToaster pls do some performance measurements.