Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPU reservations: Multithreaded concurrent reservation op completion #8462

Closed
wants to merge 3 commits into from

Conversation

elad335
Copy link
Contributor

@elad335 elad335 commented Jun 19, 2020

  • This feature makes it possible for more than one SPU thread to execute SPU MFC PUTLLC command at a time. The benefits of this feature includes massively reduced total time the reservation store because of there is reduced waiting on SPU PUTLLC and efficient "multi-pause" on PPU which means if one SPU thread paused the PPUs the next SPU thread does not really need to wait for PPUs to be paused as they are already paused.
    All processors with non-tsx should benefit from this feature theoretically, but processors with 6+ threads should be the most benefited.
    At first of implementing this feature it caused a massive performance drop which made me scratch my head for a while, until I found out that shared_mutex::lock_unlock() method was using inefficient optimizations to detect lock-releasing especially when readers were acquiring the lock as they are used in my pr instead of the writer lock that was previously being used. After optimizing it the benefits appeared in Yakuza Ishin where I got a few more fps and less severe frame rate drops on my CPU (i7-6700k).

  • Because TSX path is also using shared_mutex::lock_unlock() as one of its core features it is important to test it as well.

Note for testers: remember to compare against #8358 as it includes it.

@jobs-git
Copy link

Possible to have options to enable/disable it? cause it might be like RSX Multithread, Good in some games while bad performance in others

@elad335 elad335 force-pushed the soon-tm branch 2 times, most recently from e22ccaa to 778cfb8 Compare June 19, 2020 07:35
@elad335
Copy link
Contributor Author

elad335 commented Jun 19, 2020

Maybe, but first we'll have to see if there's a performance degradation at all with low threads count CPUs.

@jobs-git
Copy link

jobs-git commented Jun 19, 2020

Good news is there is no regression in BLUS31437, but nothing has improved either. FPS at 43-44 for both. This vs #8358

I might have missed to activate a setting? Testing from default vulkan with fma disabled

CPU is EP-2670 v1 non-tsx.

@web1018
Copy link

web1018 commented Jun 19, 2020

I noticed something interesting when testing this PR on Resistance: Fall of Man. When ever I had TSX "ON" the game would boot normally with the same Frame rate as the master version. But every time I tested with TSX "OFF" after entering the main menu and loading the game, the screen would get stuck in black screen and trying to close the game would give error which closes the entire emulator.
My Specs: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz | 8 Threads | 15.90 GiB RAM | TSC: 2.592GHz | AVX+ | FMA3 | TSX-FA

tsx ON
TSX "ON"

TSX OFF
TSX "OFF"

Also with comparison to #8358 the frame rate has no improvements or losses. Its the same as master version with TSX "ON". Cannot compare with TSX "OFF" as I am not able to enter the gameplay due to black screen.

@Emogop
Copy link

Emogop commented Jun 19, 2020

Yeah, something like that happens to me. Only for me the black screen goes after this:
Screenshot_585
PR 8358: the game's working fine
Screenshot_586

With the latest 10524 build it's fine
RPCS3.zip

@xddxd
Copy link
Contributor

xddxd commented Jun 19, 2020

GT5 refuses to boot on this PR. Just shows a white screen.
RPCS3.log

@elad335
Copy link
Contributor Author

elad335 commented Jun 19, 2020

Fixed, I expect SPU performance to improve as this bug indicated an optimization flaw in master.

@elad335
Copy link
Contributor Author

elad335 commented Jun 19, 2020

Btw, if your game already hits the maximum fps it can get (30/60 fps) you may want to try doubling Vblank Rate in case your game supports it (double fps cap in some games).

@RainbowCookie32
Copy link
Contributor

TLoU and GoW 3 have (more) stability issues on this PR.

TLoU freezes before the menu at random points, as if Accurate RSX Reservations wasn't being used. On master, the game is pretty stable for me with it, haven't had a freeze in a while.

GoW 3 freezes after the menu, when loading into the game. Sometimes it freezes the menu though. Master runs fine.

Tlou_PR.zip
GoW3_PR.zip

FWIW, framerate on GoW 3's menu seemed better than master.

@elad335
Copy link
Contributor Author

elad335 commented Jun 19, 2020

Try retesting with SPU interpreter.

@RainbowCookie32
Copy link
Contributor

Both games seem to work fine with SPU Fast, at slideshow speeds of course, but they go ingame just fine.

@psennermann
Copy link

Well great! On my i7-7700 and rtx 2060 super, this PR gives a huge speedup (more than 20%) with Uncharted 3 (in Francis Drake museum, at the very beginning of Chapter 2, without touching/moving controls) :

master (10524): 16 fps
PR #8358: 16,5 fps
PR #8462: 19,5 fps

The Last of Us has similirar results (a bit less); haven't tested for stability, but on master on my system it crashes every 10-15 minutes so I don't think it could get worse...or not? ;-)

@elad335
Copy link
Contributor Author

elad335 commented Jun 20, 2020

Also a reminder: test TSX as well.

@jobs-git
Copy link

jobs-git commented Jun 20, 2020

MASSIVE 140% performance increase vs master in BCES00510! Well done!

But I get frequent deadlocks, here is the log RPCS3.zip

non-tsx cpu

@elad335
Copy link
Contributor Author

elad335 commented Jun 20, 2020

Those are not stable games even on master, in theory it happens because SPU performance has increased in comparison to PPU/RSX, which means it needs #8464 .

@elad335 elad335 changed the title [Preview] SPU reservations: Multithreaded concurrent reservation op compilition [Preview] SPU reservations: Multithreaded concurrent reservation op completion Jun 20, 2020
@AkagiShiroe
Copy link

AkagiShiroe commented Jun 20, 2020

there's slight improvement between 5-35% performances boost and in some rare case from 40-47 became 60fps (or more), i've tested it with intended testbed an Ryzen 1200 (4C/4T) even though therotically this will make it goes slower actually it make the game more stabler especially if the game corporate something that like to do multi-threaded? i dunno how to call it but run in several SPU like CRIWare audio not cracking/broken when the CPU are busy handling stuff, but with downside that some case that might result in much more lower fps (this only happen because of CPU bottleneck), overall i support the option to enable/disable this just like RSX Multithreaded.. it help playing most games that i had rn but some title are better without it.

Master (10524) (average 50-60fps)
master

this PR: (average 55-60)
PR

both are standing for a 10-15sec before taken as screenshot, also both had their SPU/Shader cache cleared before running.
edit:wrong picture xD

also more test
Master (10524) (average 56-60fps)
Base Profile Screenshot 2020 06 20 - 15 05 24 92

this PR: (drop as low as 45-60fps)
Base Profile Screenshot 2020 06 20 - 15 07 46 82

overall it depend on the scenario, enabling this as default might be a kinda demerit for people who've used mobile CPU or something old and low-end like this Ryzen 1200

@elad335
Copy link
Contributor Author

elad335 commented Jun 20, 2020

Fixed the bug, I dont expect performance regressions but it's worth testing anyways.

@DefaltBR
Copy link

DefaltBR commented Jun 20, 2020

TSX regressed with latest PR build...

Edit: as reported by @psennermann, it may not be regressed, or maybe it just regressed on my hardware... I will retest with a fully fresh PR build/folder when I go back home, in an hour or so, just to make sure... So ignore my "regression" as of now! (I'll also test with TSX disabled)

PR:
image

Master:
image

i7-4790 (with TSX enabled) - RTX 2060 Super

@psennermann
Copy link

On my system (i7-7700 and rtx 2060 super) with the new PR version I can confirm the previous testing (made with TSX on) and The Last of Us is, as before, still faster than master using TSX, but when setting TSX option to off the frame rate get almost halved (instead in Uncharted 3 non-TSX is 0,5 fps faster than TSX)

@DefaltBR
Copy link

DefaltBR commented Jun 20, 2020

Yeah, there was something wrong in my previous test, here are the new test results:

PR_TSX-On:
PR_TSX

Master_TSX-On:
Master_TSX


PR_TSX-Off:
PR_Non-TSX

Master_TSX-Off:
Master_non-TSX


PR_8358_TSX-Off:
PR_8358

Results are too close (For BCUS98174 at least)... Has it been merged into master already? If so, then that's why! Because i updated the master before the new tests...

@elad335 elad335 changed the title [Preview] SPU reservations: Multithreaded concurrent reservation op completion [Preview][TESTERS NEEDED] SPU reservations: Multithreaded concurrent reservation op completion Jun 21, 2020
@elad335 elad335 force-pushed the soon-tm branch 2 times, most recently from 1f622d4 to 4d22220 Compare June 27, 2020 07:04
@psennermann
Copy link

on my system (i7-7700 and rtx 2060 super) Naughty Dog games with this (updated) PR and TSX enabled are still remarkably faster (around 10-20%) than master and #8358

@jobs-git
Copy link

When could this get merged?

@elad335
Copy link
Contributor Author

elad335 commented Sep 21, 2022

Worked on in #12598

@elad335 elad335 closed this Sep 21, 2022
@elad335 elad335 deleted the soon-tm branch September 21, 2022 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants