Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit64 codegen space reuse. #8765

Merged
merged 3 commits into from Sep 9, 2020
Merged

Conversation

AdmiralCurtiss
Copy link
Contributor

@AdmiralCurtiss AdmiralCurtiss commented Apr 26, 2020

This makes some minor alterations to the Jit64 codegen so it keeps track of free memory in the near and far code cache and reuses it when possible. This is to avoid full cache clears, which are costly due to having to recompile everything after the clear.

To do this, I've done the following things:

  • The x64Emitter now knows the end of the memory region it's allowed to write into, and checks this. If a write is requested past the end, it does not actually write and instead sets a failure flag.
  • Jit64::Jit() checks this flag before finalizing a Jit block. If it is set, it invokes a cache clear and retries. This replaces the 'early' cache clear when we're almost out of space, since we now detect when we actually are.
  • Every JIT block now knows its own code area in the near and far code. When a JIT block is invalidated, these areas are marked as free-to-use for the code generation again.
  • Before every code generation, we find the largest free near and far code blocks, and generate at that place.

And that's about it, fairly simple really. I'm somewhat surprised this worked out as well as it did, to be honest. That said,

Known issues:

  • Performance in the code emitter is probably slightly worse due to the bounds checking. We should test if this is actually noticeable in practice. If so, we can try to write a variant of this without the bounds checking, which I think is possible but definitely feels sketchier. Although no large-scale testing has been done, this seems to not actually be a problem in practice according to JMC47's and my testing.
  • It's entirely possible, and probably even likely, that this runs into severe memory fragmentation over time. I don't think this is fixable without making code relocatable. As such, this will still eventually invoke a cache clear due to lack of a singular large block of code space.

@JMC47
Copy link
Contributor

JMC47 commented Apr 28, 2020

So, after some more exclusive testing in a few games, this does improve the situation a bit. It isn't a complete fix, but it does make code flushing a lot rarer in games like True Crime: New York and N64 VC games.

However, I haven't done a great deal of general performance testing (this is probably slower?) and I think it'd be worth getting more people to do tests than just me, so we can see how it impacts more games.

@AdmiralCurtiss
Copy link
Contributor Author

AdmiralCurtiss commented Apr 28, 2020

Fully agreed. I'd recommend waiting on performance tests though, since at the very least the current rangeset needs to be changed, which could and probably will impact that.

edit: Rangeset has been replaced, feel free to test.

Copy link
Member

@lioncash lioncash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a very cursory review

Source/Core/Core/PowerPC/JitCommon/JitBase.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/Jit64/rangeset.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/Jit64/rangeset.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/JitCommon/JitBase.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/JitCommon/JitBase.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/JitCommon/JitBase.h Outdated Show resolved Hide resolved
@AdmiralCurtiss AdmiralCurtiss force-pushed the jit-reuse-memory branch 3 times, most recently from 68506a3 to b14d13c Compare May 1, 2020 23:34
@AdmiralCurtiss
Copy link
Contributor Author

AdmiralCurtiss commented May 1, 2020

Alright, so I've replaced the rangeset with a custom one that keeps track of the sizes of the ranges to minimize the time spent searching for the largest free block -- see https://github.com/AdmiralCurtiss/rangeset/blob/master/rangesizeset.h

This still needs a bit of refactoring: Should probably put the rangeset into externals, need to make sure the ARM Jit still builds, also I'm not really happy with m_free_ranges_near etc being in the JitBase when only the 64bit one uses it. But in general I think this is now about where I want it to be.

If someone wants to do some performance testing, they can do so now.

@AdmiralCurtiss AdmiralCurtiss changed the title [DO NOT MERGE / WIP / RFC] Jit64 codegen space reuse. [RFC] Jit64 codegen space reuse. May 1, 2020
@autofire372
Copy link
Contributor

Tested a few N64 VC titles, all NTSC-U.

Super Mario 64 - 120 FPS ingame in master/no change in PR

Kirby 64 - 137 FPS ingame in master/no change in PR

Super Smash Bros. - crashes Dolphin with this PR.

Copy link
Member

@lioncash lioncash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What necessitates the x64 stuff leaking into the base class of the JIT, out of curiosity?

I also left some review comments on the range set code, though it may be moved over to the externals later.

Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
Source/Core/Common/RangeSizeSet.h Outdated Show resolved Hide resolved
@AdmiralCurtiss
Copy link
Contributor Author

re JitBase leak: It's been a week but I believe it was the additions in JitBaseBlockCache::DestroyBlock(). I guess we could make that virtual and override it in the Jit64's JitBlockCache instead?

@AdmiralCurtiss AdmiralCurtiss force-pushed the jit-reuse-memory branch 3 times, most recently from 08c0137 to bca3db0 Compare May 5, 2020 09:53
@AdmiralCurtiss
Copy link
Contributor Author

Alright, I've moved the rangeset to externals (along with fixes, appreciate the comments!) and this now builds on Android/ARM, although I have no good way to actually test if it still runs there. I've also moved the code around a bit to remove the dependency in the JitBaseBlockCache. I think I'll open this for general review.

@AdmiralCurtiss AdmiralCurtiss marked this pull request as ready for review May 5, 2020 10:02
@JMC47
Copy link
Contributor

JMC47 commented May 6, 2020

Super Smash Bros VC. works fine here, it may have the texture cache crash (at higher IRs, happens in master)

I tested:
Rogue Squadron II: no flushes, but crashed due to fifo issues
True Crime New York: Less Flushes, but still runs like shit.
Metal Gear Solid: Twin Snakes: No Flushes but MMU still can't be re-enabled due to ISI exceptions causing slowdowns that aren't there when MMU is off.
Mario Kart 64: No flushes
The Legend of Zelda: Ocarina of Time (VC): Less Flushes
The Legend of Zelda: Ocarina of Time MQ: Less Flushes
Metroid Prime: No Flushes, but I don't know how long I'm supposed to play. I played for a half hour and got bored of just going through doors.

Performance wise, I did not notice any significant drop in overall performance.

@@ -13,6 +15,11 @@ class JitBlockCache : public JitBaseBlockCache
public:
explicit JitBlockCache(JitBase& jit);

void DestroyBlock(JitBlock& block) override;

std::vector<std::pair<u8*, u8*>> m_ranges_to_free_on_next_codegen_near;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be private and instead have a class interface built around these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I change this, I'm looking at this again and I'm not 100% sure when to exactly clear those vectors. I would have just put the clear into JitBlockCache::Clear() but it looks like that can be called without the Jit64's Clear() being called, which could desync what the Jit64 thinks is free versus what actually is. Am I missing something here? Does it ever make sense to clear the block cache without clearing the entire Jit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I think I'm thinking about this in the wrong direction. A block cache clear should actually add all of the destroyed blocks to the ranges-to-free vectors, and then the next Jit() will pick that up correctly. I think that works, let's see...

Source/Core/Core/PowerPC/Jit64/Jit.h Outdated Show resolved Hide resolved
Source/Core/Common/x64Emitter.h Outdated Show resolved Hide resolved
Source/Core/Common/x64Emitter.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/Jit64/Jit.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/Jit64Common/BlockCache.cpp Outdated Show resolved Hide resolved
@AdmiralCurtiss AdmiralCurtiss force-pushed the jit-reuse-memory branch 2 times, most recently from db641dc to b62e130 Compare May 8, 2020 01:30
@AdmiralCurtiss AdmiralCurtiss changed the title [RFC] Jit64 codegen space reuse. Jit64 codegen space reuse. Jun 21, 2020
@Rumi-Larry
Copy link

The comments in the code lack capitalization and some have confusing grammar.

@AdmiralCurtiss
Copy link
Contributor Author

While I'm not sure I agree about the comments being problematic as-is, I suppose it wouldn't hurt to do another pass on them, and maybe add a few more for details.

Copy link

@Rumi-Larry Rumi-Larry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just submitted two suggestions, so it's clearer what I mean.

Externals/rangeset/include/rangeset/rangeset.h Outdated Show resolved Hide resolved
Externals/rangeset/include/rangeset/rangeset.h Outdated Show resolved Hide resolved
@delroth
Copy link
Member

delroth commented Jun 26, 2020

Please update Externals/licenses.md when adding a new external library.

Copy link
Member

@BhaaLseN BhaaLseN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems good for the most part, and the changes make sense to me. Untested though.

@JMC47 JMC47 merged commit a31c204 into dolphin-emu:master Sep 9, 2020
10 checks passed
@AdmiralCurtiss AdmiralCurtiss deleted the jit-reuse-memory branch September 9, 2020 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants