VideoCommon/Fifo: Move SConfig::GetInstance() outside the GPU loop. #9837
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
This fixes https://bugs.dolphin-emu.org/issues/12558 for reasons that I don't fully understand myself.
Well okay, that's not true. I understand why this fix works. I don't understand why this issue happens in the first place.
Basically we're moving a
mov rdi,qword ptr [SConfig::m_Instance]out of the GPU loop and instead pass the value as a parameter to the lambda. This avoids a memory load ofSConfig::m_Instanceeach loop which is the cause of the slowdown... well, sort of.See, this issue only happens on a current MSVC and only in specific circumstances. I'm not fully confident in this, but basically, there appears to be some false sharing or something like that between the
SConfig::m_Instanceand another thing. The second thing that shows as hot in the profiler is a load from the stack pointer inPowerPC::JitCache_TranslateAddress(), which gets called very often inJitBaseBlockCache::InvalidateICache()in games that do that... but I'll be honest, I don't understand how this interplay really works. My working theory is that the hash function or whatever that decides where in the cache to stuff which area of memory happens to result in the same cacheline for these two instances, but I don't really know if and how I can prove that. To be fair, this might be red herring too, because that function is already pretty hot in a build without the performance issue.As a fun sidenote though, this bug magically goes away if you build in a directory that has a much longer path in the filesystem than what our buildbot uses. I suspect that is because that changes the length of some debugging strings embedded in the executable, which in turn changes the memory locations of some code and globals.
I dunno. This stuff is weird. Modern CPUs are complicated.