-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rsx/spu: Performance optimizations and other improvements #3026
Conversation
You need to fix random timing of the lower left corner of the note is not full of black texture error, and some 2d material can not be read on Project Diva F |
Is that related to this PR at all? If so, what song? |
For Deception IV: The Nightmare Princess this PR fixed this crash on Vulkan:
|
Resident Evil Revelations issue i've reported now fixed on latest rev |
After the last commit noticed that performance gains from SPU threads were killed, but reverting it gave me nothing, strange but reverting d7a9643 returns the speed, the problem didn't even existed before 86bd6b6 |
I know it's faster without the reader lock but I cant guarantee non msvc compilers like gcc and clang will not crash so I added it back for now. A better solution will be more stable. If building for yourself on windows you can comment out the reader lock |
Also spu threads does not work the way you think. Its either better performance with it at 1 or 2 or it works better disabled. If you're using 4 threads you will likely only benefit from loop detection |
Thanks for the info |
rpcs3/Emu/Cell/SPUAnalyser.cpp
Outdated
const auto limit = std::min(max_size, func->size) >> 2; | ||
|
||
bool failed = false; | ||
for (u32 dword = 0; dword < limit; dword++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is better than simple memcmp
? AFAIK it's usually vectorized as well (certainly on GCC/Clang).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. I'm undoing most of this commit actually. The compiled blocks are actually so small that its not worth it
@Xcedf Performance issues due to the locks are resolved now in a much cleaner way. You may retest. |
@kd-11 Confirm. Performance problems resolved, things even slightly faster now |
- Significant gains due to avoiding aggressive create-delete cycles every frame
- Delays threads by a predetermined amount to 'desync' spurs kernels. Largely reduces lock contention issues as well as making spurs kernels play nice with reservations - Also reduces number of lost notifications (SPU_EVENT_LR)
- Improvements to framebuffer usage; Avoid creating new resources every frame - Handle null fragment program properly - Collect vertex upload statistics - vk: Pre-initialize 'unused' varying registers in the vertex shader in case it gets matched with a fs that consumes it -- Fixes a crash about fog_c not being declared gl/dx12/vk: Handle null fragment program - cleanup - use yield semantic instead of sleep(0) as yield is more cross-platform -- sleep(0) is a windows specific scheduler hint
…n code - spus run a tight gpu-style kernel with no multitasking on the cores themselves -- this does not map well to PC processor cores because they never sleep even when doing nothing -- the poll detection hack tries to find a good place to insert a scheduler yield -- RdDec is a good spot as it signifies the spu kernel is waiting on a timer
… kernel space only, max 256K)
- Properly handle data 'transfer' when recycling frame buffer images - Clear 'recycled' surfaces before use
- Gets around the locking issues when fetching from the shared db
It's improve perfomance on Project Diva F too |
Highlights
TODO before merge: