Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsx/vk: Optimizations #10938

Merged
merged 15 commits into from
Sep 28, 2021
Merged

rsx/vk: Optimizations #10938

merged 15 commits into from
Sep 28, 2021

Conversation

kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 27, 2021

Applies some optimizations to RSX (shared) and Vulkan.

  • Refactors some large RSX headers to be easier to work with. This started to be a problem during implementation of this task so I took the opportunity to handle it here.
  • Avoids processing unnecessary vertex streams for draw batches. Given that we have several tens of thousands of calls of this method in one frame in some titles, even small saves can stack up quite a bit.
  • Avoid calling get_system_time() when we do not need to know the actual time/timediff. A monotonic counter is already provided by rsx::get_shared_tag() for the purposes of determining if events occurred earlier or later relative to each other and is much cheaper to use.
  • Avoids using std::this_thread::thread_id() which has unexpectedly high cost on some platforms. thread_ctrl provides most of what we need using tls to determine thread identity.
  • Batch descriptor allocation requests instead of making one allocation request for each drawcall. Now up to 64 descriptors are allocated in one go. The cost of allocating one descriptor is virtually the same as all 64.
  • Enable use of VK_EXT_descriptor_indexing extension to update bound descriptors in a deferred manner. Updating thousands of descriptors in one go is orders of magnitude faster than updating each descriptor individually. Wins here are limited by hardware and drivers, with the batch sizes determined by descriptor usage and update-after-bind support provided by the hw/driver combo.

With this set of changes up to 30% more performance can be observed in purely RSX-constrained situations. The optimizations improve the speed of command recording and not command execution. As long as command execution takes longer than command recording (high end CPU paired with low end GPU) only a minor speedup will be observed. I still have more optimizations in the works, but I decided to get it out in smaller batches for ease of integration.

NOTE: The shader cache itself does not have altered structure, but some fields have been removed internally which will trigger a shader cache rebuild. Some more drastic changes are expected on that front, so I refrained from bumping the cache version for now.

@kd-11 kd-11 added RSX Render: Vulkan Optimization Optimizes existing code labels Sep 27, 2021
@@ -1991,6 +1991,21 @@
<Filter>Emu</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<None Include="Emu\RSX\Common\Interpreter\FragmentInterpreter.glsl">
<ClInclude Include="Emu\Audio\audio_device_listener.h">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already present in the filters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this is likely an issue with msvc editing the files + rebasing the branch. Will fix.

@DefaltBR
Copy link

Improvements indeed! Uncharted 2 was the perfect test for me, because it was, somehow, GPU intensive when i used any resolution scale above 100% (5600x & RTX 2060 Super)... And i saw improvements right from the main menu! TLoU also had 1~2 fps more...

PR
UC2_PR

Master
UC2_Master

PR.log.gz
Master.log.gz

@MsDarkLow
Copy link
Contributor

MsDarkLow commented Sep 27, 2021

8700k @ 4.8GHz
EDIT: Since I had enabled debug overlay enabled, the fps are slightly lower! I'll edit the values without debug overlay when I have time.

Yakuza Kenzan - Test: 36.2 fps -> 44.8 fps
Yakuza Dead Souls - Test 1: 73.5 fps -> 77.5 fps;
Yakuza Dead Souls - Test 2: 88.7 fps -> 91.0 fps:
Yakuza 3 -> Test: 17.3 -> 22.8 fps (Over 10k drawcalls!) | Without debug overlay: 18.9 -> 24.4 fps
Yakuza Ishin spots are harder to tell since there is a lot of variance per boot but it may still be worth noting.
Test 1: 159.4 fps -> 168.6 fps
Test 2: 31.0 fps -> 32.9 fps

Yakuza Kenzan

Master
KenzanMaster

PR
KenzanPR

Yakuza Dead Souls

Master
DS2Master
DS1Master

PR
DS2PR
DS1PR

Yakuza Ishin

Master
Ishin1Master
Ishin2Master

PR
Ishin1PR
Ishin2PR

Yakuza 3

Master
Yakuza3Master

PR
Yakuza3PR

@cipherxof
Copy link
Contributor

cipherxof commented Sep 27, 2021

CPU: 3700x

Master:
master

PR:
pr

@Jonathan44062
Copy link

Ratchet and Clank Collection have some improves in this PR
CPU: i7 8700

Ratchet & Clank 1
Master
R C1 Master

PR
R C1 PR

Ratchet & Clank: Going Commando
Master
R C2 Master

PR
R C2 PR

Ratchet & Clank: Up Your Arsenal
Master
R C3 Master

PR
R C3 PR

@AphelionWasTaken
Copy link

I should have used a more detailed performance overlay for these, but this PR improves performance quite a bit in Jak & Daxter.

Using the combuster in ToD also murders performance a bit less with this PR.

Master:
image

PR:
image

Master:
image

PR:
image

@kd-11 kd-11 force-pushed the final-descriptors-rebased branch from f0b107a to 30b3e4a Compare September 28, 2021 14:13
@kd-11 kd-11 changed the title [TESTERS NEEDED] rsx/vk: Optimizations rsx/vk: Optimizations Sep 28, 2021
@kd-11 kd-11 merged commit 3d49976 into RPCS3:master Sep 28, 2021
@DefaltBR

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants