New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPU reservations: Multithreaded concurrent reservation op completion #8462
Conversation
Possible to have options to enable/disable it? cause it might be like RSX Multithread, Good in some games while bad performance in others |
e22ccaa
to
778cfb8
Compare
Maybe, but first we'll have to see if there's a performance degradation at all with low threads count CPUs. |
Good news is there is no regression in BLUS31437, but nothing has improved either. FPS at 43-44 for both. This vs #8358 I might have missed to activate a setting? Testing from default vulkan with fma disabled CPU is EP-2670 v1 non-tsx. |
I noticed something interesting when testing this PR on Resistance: Fall of Man. When ever I had TSX "ON" the game would boot normally with the same Frame rate as the master version. But every time I tested with TSX "OFF" after entering the main menu and loading the game, the screen would get stuck in black screen and trying to close the game would give error which closes the entire emulator. Also with comparison to #8358 the frame rate has no improvements or losses. Its the same as master version with TSX "ON". Cannot compare with TSX "OFF" as I am not able to enter the gameplay due to black screen. |
Yeah, something like that happens to me. Only for me the black screen goes after this: With the latest 10524 build it's fine |
GT5 refuses to boot on this PR. Just shows a white screen. |
Fixed, I expect SPU performance to improve as this bug indicated an optimization flaw in master. |
Btw, if your game already hits the maximum fps it can get (30/60 fps) you may want to try doubling Vblank Rate in case your game supports it (double fps cap in some games). |
TLoU and GoW 3 have (more) stability issues on this PR. TLoU freezes before the menu at random points, as if Accurate RSX Reservations wasn't being used. On master, the game is pretty stable for me with it, haven't had a freeze in a while. GoW 3 freezes after the menu, when loading into the game. Sometimes it freezes the menu though. Master runs fine. FWIW, framerate on GoW 3's menu seemed better than master. |
Try retesting with SPU interpreter. |
Both games seem to work fine with SPU Fast, at slideshow speeds of course, but they go ingame just fine. |
Well great! On my i7-7700 and rtx 2060 super, this PR gives a huge speedup (more than 20%) with Uncharted 3 (in Francis Drake museum, at the very beginning of Chapter 2, without touching/moving controls) : master (10524): 16 fps The Last of Us has similirar results (a bit less); haven't tested for stability, but on master on my system it crashes every 10-15 minutes so I don't think it could get worse...or not? ;-) |
Also a reminder: test TSX as well. |
MASSIVE 140% performance increase vs master in BCES00510! Well done! But I get frequent deadlocks, here is the log RPCS3.zip non-tsx cpu |
Those are not stable games even on master, in theory it happens because SPU performance has increased in comparison to PPU/RSX, which means it needs #8464 . |
Fixed the bug, I dont expect performance regressions but it's worth testing anyways. |
TSX regressed with latest PR build... Edit: as reported by @psennermann, it may not be regressed, or maybe it just regressed on my hardware... I will retest with a fully fresh PR build/folder when I go back home, in an hour or so, just to make sure... So ignore my "regression" as of now! (I'll also test with TSX disabled) i7-4790 (with TSX enabled) - RTX 2060 Super |
On my system (i7-7700 and rtx 2060 super) with the new PR version I can confirm the previous testing (made with TSX on) and The Last of Us is, as before, still faster than master using TSX, but when setting TSX option to off the frame rate get almost halved (instead in Uncharted 3 non-TSX is 0,5 fps faster than TSX) |
Yeah, there was something wrong in my previous test, here are the new test results: Results are too close (For BCUS98174 at least)... Has it been merged into master already? If so, then that's why! Because i updated the master before the new tests... |
1f622d4
to
4d22220
Compare
on my system (i7-7700 and rtx 2060 super) Naughty Dog games with this (updated) PR and TSX enabled are still remarkably faster (around 10-20%) than master and #8358 |
When could this get merged? |
3c1a42d
to
5582784
Compare
bc747ab
to
f63824e
Compare
8221bbb
to
60393da
Compare
Worked on in #12598 |
This feature makes it possible for more than one SPU thread to execute SPU MFC PUTLLC command at a time. The benefits of this feature includes massively reduced total time the reservation store because of there is reduced waiting on SPU PUTLLC and efficient "multi-pause" on PPU which means if one SPU thread paused the PPUs the next SPU thread does not really need to wait for PPUs to be paused as they are already paused.
All processors with non-tsx should benefit from this feature theoretically, but processors with 6+ threads should be the most benefited.
At first of implementing this feature it caused a massive performance drop which made me scratch my head for a while, until I found out that shared_mutex::lock_unlock() method was using inefficient optimizations to detect lock-releasing especially when readers were acquiring the lock as they are used in my pr instead of the writer lock that was previously being used. After optimizing it the benefits appeared in Yakuza Ishin where I got a few more fps and less severe frame rate drops on my CPU (i7-6700k).
Because TSX path is also using shared_mutex::lock_unlock() as one of its core features it is important to test it as well.
Note for testers: remember to compare against #8358 as it includes it.