rsx/spu: Performance optimizations and other improvements #3026

kd-11 · 2017-07-17T15:36:20Z

Highlights

RSX

Avoid aggressive resource create/delete cycles when using vulkan. Moderate speedup in some games and fixes flickering in some cases
Fix a vulkan crash about undeclared fog_c. Also fixes default parameter initialization that should fix some games that have broken graphics with vulkan but work fine with openGL
Fix an rsx crash when a null address is provided for the fragment shader location
Improved multithreaded vertex processing so that the penalty is much lower if there are no resources available. Its also tunable now allowing the threshold to be set

SPU

AsmJit: Avoid aggressively locking the asmjit db and make compilation step multithreaded. It takes a much shorter time to compile a function than it does to wait on a lock especially with non-spurs-type kernels
AsmJit: Cache compiled functions and avoid calling database analyse function when not needed
Add loop condition detection by triggering an OS scheduler update on RdDec - this function can only be sensibly used as part of a loop. [Disabled by default for now]
Add concurrent execution analysis. This only affects spurs-type kernels where multiple threads are executing the exact same code at the exact same time. Introduces a small delay to racing threads so that they are effectively desynchronized whenever they enter a sensitive function. [Disabled by default]

Other

CMake fix to not double-compile qrc_resources.cpp as it is a tmp generated file. Also adds it to list of files to clean

TODO before merge:

Reduce watchdog memory footprint by using the segment address as a base + 256K
Investigate tales of vesperia regression

zminhquanz · 2017-07-17T16:03:10Z

You need to fix random timing of the lower left corner of the note is not full of black texture error, and some 2d material can not be read on Project Diva F

AniLeo · 2017-07-17T16:12:59Z

Is that related to this PR at all? If so, what song?

Lemiru · 2017-07-17T19:29:21Z

For Deception IV: The Nightmare Princess this PR fixed this crash on Vulkan:

E {rsx::thread} RSX: ERROR: 0:65: 'tc9' : undeclared identifier 
ERROR: 0:65: '' : compilation terminated 
ERROR: 2 compilation errors.  No code generated.


E {rsx::thread} RSX: 
F {rsx::thread} class std::runtime_error thrown: Failed to compile vertex shader
(in file C:\rpcs3\rpcs3\Emu\RSX\VK\VKVertexProgram.cpp:401)

Xcedf · 2017-07-17T20:08:22Z

Assassin's Creed and Prince of Persia now go ingame

but Resident Evil Revelations now has flickering issues under OGL

Xcedf · 2017-07-18T18:18:33Z

Resident Evil Revelations issue i've reported now fixed on latest rev
Thanks

Xcedf · 2017-07-18T22:13:34Z

After the last commit noticed that performance gains from SPU threads were killed, but reverting it gave me nothing, strange but reverting d7a9643 returns the speed, the problem didn't even existed before 86bd6b6
here's the difference
Heavy Rain 4 SPU Threads last commit

and with d7a9643 reverted

Same for RDR 5 SPU Threads
The last

and with d7a9643 reverted

kd-11 · 2017-07-19T00:52:22Z

I know it's faster without the reader lock but I cant guarantee non msvc compilers like gcc and clang will not crash so I added it back for now. A better solution will be more stable. If building for yourself on windows you can comment out the reader lock

kd-11 · 2017-07-19T00:54:33Z

Also spu threads does not work the way you think. Its either better performance with it at 1 or 2 or it works better disabled. If you're using 4 threads you will likely only benefit from loop detection

Xcedf · 2017-07-19T07:29:03Z

Thanks for the info

danilaml · 2017-07-19T13:28:45Z

rpcs3/Emu/Cell/SPUAnalyser.cpp

+			const auto limit = std::min(max_size, func->size) >> 2;
+
+			bool failed = false;
+			for (u32 dword = 0; dword < limit; dword++)


Are you sure this is better than simple memcmp? AFAIK it's usually vectorized as well (certainly on GCC/Clang).

Nope. I'm undoing most of this commit actually. The compiled blocks are actually so small that its not worth it

kd-11 · 2017-07-19T16:45:42Z

@Xcedf Performance issues due to the locks are resolved now in a much cleaner way. You may retest.

Xcedf · 2017-07-19T17:12:10Z

@kd-11 Confirm. Performance problems resolved, things even slightly faster now

Xcedf · 2017-07-19T17:23:11Z

not tested T6 for long time
it now hits fullspeed sometimes
avg fps 54-58 on this branch on many areas and characters
game is pretty much Playable now

- Significant gains due to avoiding aggressive create-delete cycles every frame

…iting to the db

- Delays threads by a predetermined amount to 'desync' spurs kernels. Largely reduces lock contention issues as well as making spurs kernels play nice with reservations - Also reduces number of lost notifications (SPU_EVENT_LR)

- Improvements to framebuffer usage; Avoid creating new resources every frame - Handle null fragment program properly - Collect vertex upload statistics - vk: Pre-initialize 'unused' varying registers in the vertex shader in case it gets matched with a fs that consumes it -- Fixes a crash about fog_c not being declared gl/dx12/vk: Handle null fragment program - cleanup - use yield semantic instead of sleep(0) as yield is more cross-platform -- sleep(0) is a windows specific scheduler hint

…n code - spus run a tight gpu-style kernel with no multitasking on the cores themselves -- this does not map well to PC processor cores because they never sleep even when doing nothing -- the poll detection hack tries to find a good place to insert a scheduler yield -- RdDec is a good spot as it signifies the spu kernel is waiting on a timer

… kernel space only, max 256K)

- Properly handle data 'transfer' when recycling frame buffer images - Clear 'recycled' surfaces before use

- Gets around the locking issues when fetching from the shared db

zminhquanz · 2017-07-20T02:37:48Z

It's improve perfomance on Project Diva F too

kd-11 force-pushed the master branch from 03e8d42 to 6f9b5c6 Compare July 17, 2017 15:48

kd-11 force-pushed the master branch from f79887e to 1b5e729 Compare July 17, 2017 22:16

GeniusMage mentioned this pull request Jul 18, 2017

GL/VK: Tales of Vesperia - Rendering issues #2707

Closed

9 tasks

kd-11 changed the title ~~[WIP/Testing Needed] rsx/spu: Performance optimizations and other improvements~~ rsx/spu: Performance optimizations and other improvements Jul 18, 2017

danilaml reviewed Jul 19, 2017

View reviewed changes

AniLeo added Bugfix CPU Enhancement Miscellaneous RSX Render: Vulkan labels Jul 19, 2017

kd-11 force-pushed the master branch from 86bd6b6 to 9f6e60d Compare July 19, 2017 16:39

kd-11 mentioned this pull request Jul 19, 2017

rsx: Bug fixes #2861

Merged

2 tasks

kd-11 added 5 commits July 19, 2017 23:00

rsx/vk: Optimize framebuffer lifetime management

8140729

- Significant gains due to avoiding aggressive create-delete cycles every frame

asmjit: Minimal locking when reading, also only lock when actually wr…

7ca51d4

…iting to the db

kd-11 added 4 commits July 19, 2017 23:00

fix build; restore asmjit reader_lock for now

3e28ebf

spu: Simplify watchdog design (PC is purely HLE and occupies SPU code…

736e5fb

… kernel space only, max 256K)

rsx: Surface cache bug fixes

560b9e9

- Properly handle data 'transfer' when recycling frame buffer images - Clear 'recycled' surfaces before use

spu: Clean up asmjit - avoid touching the shared db whenever possible

36014aa

- Gets around the locking issues when fetching from the shared db

kd-11 force-pushed the master branch from 9f6e60d to 36014aa Compare July 19, 2017 20:00

kd-11 merged commit 99828a8 into RPCS3:master Jul 19, 2017

Xcedf mentioned this pull request Jul 19, 2017

spu_interpreter::STOPD-Unimplemented instruction #2420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rsx/spu: Performance optimizations and other improvements #3026

rsx/spu: Performance optimizations and other improvements #3026

kd-11 commented Jul 17, 2017 •

edited

Loading

zminhquanz commented Jul 17, 2017

AniLeo commented Jul 17, 2017

Lemiru commented Jul 17, 2017 •

edited

Loading

Xcedf commented Jul 17, 2017

Xcedf commented Jul 18, 2017

Xcedf commented Jul 18, 2017 •

edited

Loading

kd-11 commented Jul 19, 2017

kd-11 commented Jul 19, 2017

Xcedf commented Jul 19, 2017

danilaml Jul 19, 2017 •

edited

Loading

kd-11 Jul 19, 2017

kd-11 commented Jul 19, 2017

Xcedf commented Jul 19, 2017

Xcedf commented Jul 19, 2017

zminhquanz commented Jul 20, 2017

rsx/spu: Performance optimizations and other improvements #3026

rsx/spu: Performance optimizations and other improvements #3026

Conversation

kd-11 commented Jul 17, 2017 • edited Loading

zminhquanz commented Jul 17, 2017

AniLeo commented Jul 17, 2017

Lemiru commented Jul 17, 2017 • edited Loading

Xcedf commented Jul 17, 2017

Xcedf commented Jul 18, 2017

Xcedf commented Jul 18, 2017 • edited Loading

kd-11 commented Jul 19, 2017

kd-11 commented Jul 19, 2017

Xcedf commented Jul 19, 2017

danilaml Jul 19, 2017 • edited Loading

Choose a reason for hiding this comment

kd-11 Jul 19, 2017

Choose a reason for hiding this comment

kd-11 commented Jul 19, 2017

Xcedf commented Jul 19, 2017

Xcedf commented Jul 19, 2017

zminhquanz commented Jul 20, 2017

kd-11 commented Jul 17, 2017 •

edited

Loading

Lemiru commented Jul 17, 2017 •

edited

Loading

Xcedf commented Jul 18, 2017 •

edited

Loading

danilaml Jul 19, 2017 •

edited

Loading