Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] SPU performance optimizations #4108

Closed
wants to merge 22 commits into from
Closed

Conversation

Farseer2
Copy link
Contributor

WIP for stability reasons

This PR includes some optimizations, mostly to the SPU, being the current bottleneck of the emulator:

  • vm::waiters are removed lazily if a lock can't be achieved at the moment of destruction to prevent the threads from locking up too much
  • Rewrote some heavily used stub so the compiled assembly would be faster (Best example would be try_pop)
  • Added fast path to try_lock
  • Code invalidation is only checked if a SYNC instruction was executed
  • SPU function DB is only searched once (instead of twice). It almost never caught a new function, yet it added quite the long search
  • Aggressive inlining of some function. While considered bad practice, the compiler didn't do it by itself, and, at worse, the epilogue and prologue of the function took 0.9% runtime while the function itself took 0.1%
  • Releasing locks earlier when processing mfc commands should reduce the time SPU threads are waiting on locks.
  • DMA's are now done by a REP MOVSQ, which, at least on Ivy Bridge processors, is faster (Maybe because of the instruction cache having an easier time, idk)
  • Probably some other stuff I forgot

So, now for the bad parts:

  1. Performance gains may vary - while some reported almost double the FPS, some reported no change at all. Can't think of a reason for that yet.
  2. Some parts were written before kd11 explained some of the SPU stuff to me, so take them with a grain of salt, might affect stability, so if you can, please test and report crashes and the likes.
  3. As TSX has a different flow in some of the edited areas, I couldn't really test it

Thanks to whymsical, haico1992, digitaldude555 and Ani for helping me test this.

@tabnk
Copy link

tabnk commented Jan 25, 2018

Nice :)

@Illynir
Copy link

Illynir commented Jan 25, 2018

It's promising, but it's very unstable for me. Most games freeze without error message or with "F {MFC Thread} MEM: Access violation reading location 0x0" randomly.

That being said I spotted big gains on several games, especially Ni no kuni which becomes fullspeed with OpenGL (before of freeze totally like the others).

@Xcedf
Copy link

Xcedf commented Jan 25, 2018

Same situation most game are stoped working, but some are still working
GoW 1/2 HD, Persona 5, House of The Dead Overkill have nice speed boost now

@tabnk
Copy link

tabnk commented Jan 25, 2018

Ni no kuni stuck at loading with Vulkan

@greentop
Copy link

The gcc build failed on Ubuntu 16.04. (Line 503)
PR 4108 buid.log.gz

@M220
Copy link

M220 commented Jan 25, 2018

P5 now have locked 30FPS in sometimes!!!
but RDR now stop in the intro with vulkan and disabling ( or enabling ) thread schedular not helped too ( in openGL it show Rockstar logo but can't go in menu ) ( it's going ingame in master ) and here is the log :
RPCS3_SPU.log.gz
sorry for my bad English 😄

@@ -121,13 +121,13 @@ void spu_recompiler::compile(spu_function_t& f)
compiler.alloc(vec_vars[5], asmjit::x86::xmm5);

// Initialize labels
std::vector<Label> pos_labels{ 0x10000 };
this->labels = pos_labels.data();
this->labels = std::unique_ptr<Label[]>(reinterpret_cast<Label*>(new char[0x10000 * sizeof(Label)]()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use u8 instead of char

@@ -5,16 +5,18 @@

const spu_decoder<spu_itype> s_spu_itype;

spu_function_t* SPUDatabase::find(const be_t<u32>* data, u64 key, u32 max_size)
std::shared_ptr<spu_function_contents_t> SPUDatabase::find(const be_t<u32>* data, u64 key, u32 max_size, void * ignore)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void* ignore contains extra spacing

@@ -33,16 +35,22 @@ SPUDatabase::~SPUDatabase()
// TODO: serialize database
}

spu_function_t* SPUDatabase::analyse(const be_t<u32>* ls, u32 entry, u32 max_limit)
std::shared_ptr<spu_function_contents_t> SPUDatabase::analyse(const be_t<u32>* ls, u32 entry, void * ignore /*=nullptr*/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@spyropt
Copy link

spyropt commented Jan 25, 2018

On my end all games just crash

cpu x5675 (westmere)
RPCS3.log.gz

@@ -352,7 +366,7 @@ spu_function_t* SPUDatabase::analyse(const be_t<u32>* ls, u32 entry, u32 max_lim
m_db.emplace(key, func);
}

LOG_NOTICE(SPU, "Function detected [0x%05x-0x%05x] (size=0x%x)", func->addr, func->addr + func->size, func->size);
LOG_FATAL(SPU, "Function detected [0x%05x-0x%05x] (size=0x%x)", func->addr, func->addr + func->size, func->size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure this should be fatal? also LOG_* macros are deprecated, see Log.h


// For internal use
spu_function_t* find(const be_t<u32>* data, u64 key, u32 max_size);
std::shared_ptr<spu_function_contents_t> find(const be_t<u32>* data, u64 key, u32 max_size, void * ignore=nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing, void* ignore = nullptr

waiter.stamp = rtime;
waiter.data = rdata.data();
waiter.init();
vm::waiter * waiter = new vm::waiter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing

@@ -1189,28 +1141,33 @@ bool SPUThread::get_ch_value(u32 ch, u32& out)
return true;
}

vm::waiter waiter;

vm::waiter * waiter = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing

@@ -173,7 +173,7 @@ void shared_mutex::imp_lock(s64 _old)

for (int i = 0; i < 10; i++)
{
busy_wait();
if (i != 0) { busy_wait(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

braces should be on newlines

@@ -14,7 +14,7 @@ void shared_mutex::imp_lock_shared(s64 _old)

for (int i = 0; i < 10; i++)
{
busy_wait();
if (i != 0) busy_wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add braces

@@ -237,6 +237,7 @@ void shared_mutex::imp_lock_degrade()
bool shared_mutex::try_lock_shared()
{
// Conditional decrement
if (m_value < c_min) return false; // Fast path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

braces

@Farseer2
Copy link
Contributor Author

Added CR fixes.
The F {MFC Thread} MEM: Access violation reading location 0x0 repeats in a lot of games - no fix yet, but I'm aware of it, thanks everyone

@Farseer2
Copy link
Contributor Author

Found the cause for the crashes, will push a fix later today

@greentop
Copy link

greentop commented Jan 25, 2018

The gcc build did complete for me on Ubuntu as of 6b677b9.

@VelocityRa
Copy link
Member

@Kravickas This seems related to the known bug, I'd say wait for the fix before posting logs.

@raven02
Copy link
Contributor

raven02 commented Jan 25, 2018

P5 boosts quite obvious and nice PR . 1942 Joint Strike also no more lag in-game .

@tge-was-taken
Copy link
Contributor

Can confirm, 5-10+ FPS boost in P5 depending on the situation. Haven't been able to test other games yet due to it being very unstable.

@GeniusMage
Copy link
Contributor

Eternal Sonata: Transitions are slightly faster, new SPU Halt errors here and there.

Tales of Vesperia: Doesn't start on SPU Recompiler. Has MEM Access Violations on the PPU thread with Interpreter, before reaching the main menu.

Tales of Graces f: Similar issues to Vesperia, it doesn't reach the menu on Recompiler, goes ingame on Interpreter but tends to freeze.

Tales of Xillia: tends to throw a Fatal Error from the start on Recompiler, with no actual message (just a blank window), or just stops before the main menu. Goes Ingame with Interpreter but audio is very choppy.

Tales of Xillia 2: Hangs upon reaching the main menu on Recompiler. Gets stuck before the intro on Interpreter (though it seems like it's still running, it just doesn't do anything).

One thing I've noticed, things go wrong when "Branch-to-next with $LR" happen.

@Illynir
Copy link

Illynir commented Jan 25, 2018

Drakengard 3 ==> Freeze on loading
Ni no kuni ==> + 10 FPS boost on OGL with random freeze, not tested on Vulkan because freeze on loading
Tales of vesperia ==> Freeze sur main menu
Journey ==> +5 FPS, freeze random
Nier/Nier Replicant ==> No difference but this game is special on Windows
Atelier Escha and Logy ==> Freeze on loading.
Tales of graces F ==> Freeze
Persona 5 ==> +5/10 FPS

@Xperie
Copy link

Xperie commented Jan 25, 2018

This has worked really well for P5 I get almost double my fps in some situations and a solid 30 in most areas now. I have one problem thats happened twice where the audio crashes during a load screen and then a few minutes later the game will lock up. I attached the log of what happens.
crashlog.txt

@tge-was-taken
Copy link
Contributor

You should probably wait before you start testing games, there's a bad breaking bug that's causing hangs. Some games don't even boot with this.

@Lemiru
Copy link

Lemiru commented Jan 25, 2018

Deception IV: The Nightmare Princess - on recompiler freezes after starting new game or continuing and music is missing
The Witch and the Hundred Knight - Freezes on loading screen before menu

@Farseer2
Copy link
Contributor Author

Closing until I upload the fix, many of these reports are the same problem over and over- thanks everyone

@Farseer2 Farseer2 closed this Jan 25, 2018
@Farseer2
Copy link
Contributor Author

So, didn't get to fix it today, so I implemented a quick solution that will degrade performance, just to know if there are other crashes in there.
I'll try to implement the faster solution over the weekend

@Farseer2 Farseer2 reopened this Jan 25, 2018
@Illynir
Copy link

Illynir commented Jan 25, 2018

Quick test with the temporary crash fix:

Drakengard 3: Freeze on loading
Ni no kuni: freeze on loading
Tales of graces F: Freeze on loading

Log show:
E {SPU[0x2000000] Thread (SpursTLGraphicsCellSpursKernel0)} SPU: [0x0504c] Branch-to-next with $LR
E {SPU[0x2000001] Thread (SpursTLGraphicsCellSpursKernel1)} SPU: [0x031a4] Branch-to-next with $LR
E {SPU[0x2000003] Thread (SpursTLGraphicsCellSpursKernel3)} SPU: [0x08844] Branch-to-next with $LR
E {SPU[0x2000004] Thread (SpursTLGraphicsCellSpursKernel4)} SPU: [0x05994] Branch-to-next with $LR
E {SPU[0x2000002] Thread (SpursTLGraphicsCellSpursKernel2)} SPU: [0x04394] Branch-to-next with $LR
E {SPU[0x2000004] Thread (SpursTLGraphicsCellSpursKernel4)} SPU: [0x067d4] Branch-to-next with $LR
E {SPU[0x2000000] Thread (SpursTLGraphicsCellSpursKernel0)} SPU: [0x0494c] Branch-to-next with $LR
E {SPU[0x2000000] Thread (SpursTLGraphicsCellSpursKernel0)} SPU: [0x03344] Branch-to-next with $LR

@VelocityRa
Copy link
Member

"Branch-to-next with $LR" errors are completely irrelevant.
They're not even "errors" really, just a registers being used in a way that's not standardized. It's not even that unusual, many games do it.
I've suggested making them notices or something, but it didn't happen and I'm certainly not opening a PR just for that.

@Illynir
Copy link

Illynir commented Jan 25, 2018

Well, I don't know, but these messages appear on this PR and not on the Master for me, so I indicate it in case.

If it is useless then the games freeze without displaying any noticeable error message in this case.

@Farseer2
Copy link
Contributor Author

PR will be closed soon for some maintenance

@Farseer2 Farseer2 closed this Jan 31, 2018
@Luffykun007
Copy link

@Farseer2 reopen please!

@Asinin3
Copy link
Contributor

Asinin3 commented Feb 11, 2018

Why? just because the PR is closed doesn't mean you cant use the builds. (see screenshot below) also I'm guessing he is busy trying to fix it up. It's not ready for a merge yet obviously and closing the PR was probably to help stop spam comments of issues that are known.

image

@mateus-santiago
Copy link

How can I actually use this build? Sorry, I'm stupid and new to RPCS3.

@tanpro260196
Copy link

tanpro260196 commented Aug 6, 2018

Don't use this build. Master already surpass it in term of performance. Just enable debug mode and disable zcull.
And next time please use discord for these type of questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet