New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jit64 codegen space reuse. #8765
Conversation
7e2ce7f
to
5f14f00
Compare
|
So, after some more exclusive testing in a few games, this does improve the situation a bit. It isn't a complete fix, but it does make code flushing a lot rarer in games like True Crime: New York and N64 VC games. However, I haven't done a great deal of general performance testing (this is probably slower?) and I think it'd be worth getting more people to do tests than just me, so we can see how it impacts more games. |
|
Fully agreed. I'd recommend waiting on performance tests though, since at the very least the current edit: Rangeset has been replaced, feel free to test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a very cursory review
68506a3
to
b14d13c
Compare
|
Alright, so I've replaced the rangeset with a custom one that keeps track of the sizes of the ranges to minimize the time spent searching for the largest free block -- see https://github.com/AdmiralCurtiss/rangeset/blob/master/rangesizeset.h This still needs a bit of refactoring: Should probably put the rangeset into externals, need to make sure the ARM Jit still builds, also I'm not really happy with If someone wants to do some performance testing, they can do so now. |
|
Tested a few N64 VC titles, all NTSC-U. Super Mario 64 - 120 FPS ingame in master/no change in PR Kirby 64 - 137 FPS ingame in master/no change in PR Super Smash Bros. - crashes Dolphin with this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What necessitates the x64 stuff leaking into the base class of the JIT, out of curiosity?
I also left some review comments on the range set code, though it may be moved over to the externals later.
|
re JitBase leak: It's been a week but I believe it was the additions in |
08c0137
to
bca3db0
Compare
|
Alright, I've moved the rangeset to externals (along with fixes, appreciate the comments!) and this now builds on Android/ARM, although I have no good way to actually test if it still runs there. I've also moved the code around a bit to remove the dependency in the JitBaseBlockCache. I think I'll open this for general review. |
bca3db0
to
f34d980
Compare
|
Super Smash Bros VC. works fine here, it may have the texture cache crash (at higher IRs, happens in master) I tested: Performance wise, I did not notice any significant drop in overall performance. |
| @@ -13,6 +15,11 @@ class JitBlockCache : public JitBaseBlockCache | |||
| public: | |||
| explicit JitBlockCache(JitBase& jit); | |||
|
|
|||
| void DestroyBlock(JitBlock& block) override; | |||
|
|
|||
| std::vector<std::pair<u8*, u8*>> m_ranges_to_free_on_next_codegen_near; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be private and instead have a class interface built around these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I change this, I'm looking at this again and I'm not 100% sure when to exactly clear those vectors. I would have just put the clear into JitBlockCache::Clear() but it looks like that can be called without the Jit64's Clear() being called, which could desync what the Jit64 thinks is free versus what actually is. Am I missing something here? Does it ever make sense to clear the block cache without clearing the entire Jit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I think I'm thinking about this in the wrong direction. A block cache clear should actually add all of the destroyed blocks to the ranges-to-free vectors, and then the next Jit() will pick that up correctly. I think that works, let's see...
db641dc
to
b62e130
Compare
|
The comments in the code lack capitalization and some have confusing grammar. |
|
While I'm not sure I agree about the comments being problematic as-is, I suppose it wouldn't hurt to do another pass on them, and maybe add a few more for details. |
b62e130
to
91cd762
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just submitted two suggestions, so it's clearer what I mean.
|
Please update Externals/licenses.md when adding a new external library. |
91cd762
to
0bba137
Compare
0bba137
to
2e028a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code seems good for the most part, and the changes make sense to me. Untested though.
2e028a6
to
306a5e6
Compare
This makes some minor alterations to the Jit64 codegen so it keeps track of free memory in the near and far code cache and reuses it when possible. This is to avoid full cache clears, which are costly due to having to recompile everything after the clear.
To do this, I've done the following things:
Jit64::Jit()checks this flag before finalizing a Jit block. If it is set, it invokes a cache clear and retries. This replaces the 'early' cache clear when we're almost out of space, since we now detect when we actually are.And that's about it, fairly simple really. I'm somewhat surprised this worked out as well as it did, to be honest. That said,
Known issues:
Performance in the code emitter is probably slightly worse due to the bounds checking. We should test if this is actually noticeable in practice. If so, we can try to write a variant of this without the bounds checking, which I think is possible but definitely feels sketchier.Although no large-scale testing has been done, this seems to not actually be a problem in practice according to JMC47's and my testing.