Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoCommon/Fifo: Move SConfig::GetInstance() outside the GPU loop. #9837

Merged

Conversation

AdmiralCurtiss
Copy link
Contributor

@AdmiralCurtiss AdmiralCurtiss commented Jun 23, 2021

This fixes https://bugs.dolphin-emu.org/issues/12558 for reasons that I don't fully understand myself.

Well okay, that's not true. I understand why this fix works. I don't understand why this issue happens in the first place.

Basically we're moving a mov rdi,qword ptr [SConfig::m_Instance] out of the GPU loop and instead pass the value as a parameter to the lambda. This avoids a memory load of SConfig::m_Instance each loop which is the cause of the slowdown... well, sort of.

See, this issue only happens on a current MSVC and only in specific circumstances. I'm not fully confident in this, but basically, there appears to be some false sharing or something like that between the SConfig::m_Instance and another thing. The second thing that shows as hot in the profiler is a load from the stack pointer in PowerPC::JitCache_TranslateAddress(), which gets called very often in JitBaseBlockCache::InvalidateICache() in games that do that... but I'll be honest, I don't understand how this interplay really works. My working theory is that the hash function or whatever that decides where in the cache to stuff which area of memory happens to result in the same cacheline for these two instances, but I don't really know if and how I can prove that. To be fair, this might be red herring too, because that function is already pretty hot in a build without the performance issue.

As a fun sidenote though, this bug magically goes away if you build in a directory that has a much longer path in the filesystem than what our buildbot uses. I suspect that is because that changes the length of some debugging strings embedded in the executable, which in turn changes the memory locations of some code and globals.

I dunno. This stuff is weird. Modern CPUs are complicated.

Copy link
Member

@leoetlino leoetlino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes a severe perf regression and sounds better than the alternatives we have:

  • adding a dummy Core::RunAsCPUThread
  • praying that things line up in a way that prevents this issue from showing up
  • making users compile from a different folder with a longer path
  • making users switch to Linux

so this looks good to me

@leoetlino leoetlino merged commit b66e88e into dolphin-emu:master Jun 23, 2021
11 checks passed
@AdmiralCurtiss AdmiralCurtiss deleted the mysterious-voodoo-performance-fix branch June 23, 2021 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants