Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GS/Vulkan: Use attachment clear for ONE stencil #10872

Merged
merged 2 commits into from
Mar 2, 2024
Merged

Conversation

stenzek
Copy link
Member

@stenzek stenzek commented Mar 1, 2024

Description of Changes

While investigating a Persona 3 dump, I noticed that we were fully clearing the buffer when initializing the stencil for first-write-wins (ONE). This was about 20us of GPU time at 8x upscaling on my GPU, as seen below:
image

Instead, since we only care about a tiny region of the framebuffer, we can use an attachment clear (which I'm guessing will get lowered to a draw that sets stencil in the driver), and only write the region that we actually need to load. This reduces the cost of the draw by approximately 85%.
image

Overall, this increases performance by approximately 66% in the Persona 3 dump at 16x upscaling.

Rationale behind Changes

Lots of render pass reductions. The main ones:

AFL Premiership 2007_SCES-54639 ['Render Passes: -809 [829=>20]']
Dragon_Ball_Z_Budokai_Tenkaichi_3_SLES-54945-splitscreen ['Render Passes: -12 [58=>46]']
EA_Sports_Rugby_2004_SLES-51732_20220817213743 ['Render Passes: -869 [876=>7]']
Gran Turismo 4_SCUS-97328_20221122171141 ['Render Passes: -28 [49=>21]']
gs_20220409061650_Alpine Racer 3 _PAL-M5__SCES-50887 ['Render Passes: -56 [67=>11]']
Persona_3_FES_fps_drop ['Render Passes: -95 [514=>419]']
Sega Superstars Tennis_SLES-54946_20230113102008 ['Render Passes: -217 [235=>18]']
Stuntman_SLUS-20250_20230326220301 ['Render Passes: -1748 [1766=>18]']

Suggested Testing Steps

Runner says it's okay. But @JordanTheToaster pls do some performance measurements.

66% faster in Persona 3 in DATE-heavy scenes.
We manually clear the drawn region when it's needed, in all other cases
it's pre-filled with the setup.

Therefore, the two load actions should be preserve and don't care.
@JordanTheToaster
Copy link
Contributor

Some benchmarks on the big changes at 8x internal.

AFL 2007 61 fps to 158 fps
EA Sport Rugby 2004 34 fps to 103 fps
DBZ BT3 110 fps to 110 fps (Readback limited)
GT4 160 fps to 177 fps
Sega Superstars Tennis 128 fps to 271 fps
Stuntman 34 fps to 105 fps
Persona 3 FES 287 fps to 370 fps

@TheTechnician27
Copy link
Contributor

Tested Scarface on native and on 8x. For 8x, I couldn't tell any sort of difference without some sort of benchmarking software, as they both fell within the 75 to 82 FPS range. The PR started off faster on average, but it fell a bit, and at this point I'm attributing that to noise.

However, for the native testing, I started with 950 VFPS on the main build, while the PR build dipped the VFPS to 650, and I don't know why. Both consistently hovered around that 950/650 range, so there's about a -300 FPS delta.

Scarface - The World is Yours_SLUS-21111_20240301055150.zip

This is the dump I made for testing.

@TheTechnician27
Copy link
Contributor

TheTechnician27 commented Mar 1, 2024

Update: I forgot that I'd already tested an AppImage previously and thus I had graphics settings non-default for the PR. VFPS now seems to be most concentrated around 83–85 for Scarface on PR, while it's more around 79 on main. Basically, the range for main seems to be 75 to 81, whereas here it seems to be 79 to 86. May or may not have some positive delta on native resolution, but certainly it's not a drop. Sorry about that.

TL;DR: Small but noticeable improvement for Scarface at 8x.

@Uzarkis
Copy link

Uzarkis commented Mar 1, 2024

Hi, is there any possibility of a speed improvements on mortal kombat: shaolin monks, onimusha: dawn of dreams, nano breaker and shadow of the colossus?

@refractionpcsx2
Copy link
Member

Hi, is there any possibility of a speed improvements on mortal kombat: shaolin monks, onimusha: dawn of dreams, nano breaker and shadow of the colossus?

Please do not hijack issues with requests. If we find something to make things faster, we'll do it, otherwise have a go yourself.

@Uzarkis
Copy link

Uzarkis commented Mar 1, 2024

Please do not hijack issues with requests. If we find something to make things faster, we'll do it, otherwise have a go yourself.

Hello, i'm not hijacking this thread, i only asked because i saw the post of jordan about speed improvements and got curious if it could improve other games. Also, i dont know how to compile a build. I'm just a common user.

@refractionpcsx2
Copy link
Member

refractionpcsx2 commented Mar 1, 2024

okay, I mean if you want to check those games, then I suggest you grab the PR build and give it a go. We can only test the games we have. But It's unlikely it's going to help those games.

That said we have GS dumps of most of those games (I think all except nano breaker), so if they weren't listed above, it's unlikely they made much difference.

@stenzek
Copy link
Member Author

stenzek commented Mar 2, 2024

All those games run fine, and AFAIK don't have excessive statistics, so no reason to investigate them. Unlike the original Persona 3 issue that prompted this change in the first place.

@stenzek stenzek merged commit 875fdc4 into PCSX2:master Mar 2, 2024
12 checks passed
@stenzek stenzek deleted the vk branch March 2, 2024 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants