New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement PowerPC data cache #11183
Implement PowerPC data cache #11183
Conversation
|
I can test the DCBZ low games, will give it a try. Edit: Can confirm this allows Disney Infinity and Cars 2 to run without the DCBZ low hack, however performance is a bit problematic. Not to say the games run well ever. Honestly enabling it doesn't cause them to run much worse than the hack, which is rather surprising. Overall, versus the hack it's between 10 and 30%... which isn't nearly as catastrophic as I expected. Great job! |
|
Good work! I haven't reviewed this in detail yet, but I want to point out the existence of a few related PRs on my end: #10818 (which fixes an accuracy bug with I also see that you removed the comments relating to L2 cache emulation, but I'm not sure if you've actually implemented the L2 cache, or only the L1 data cache. It looks like you did implement the locked L1 data cache at least (though I haven't looked over that in detail). |
I just tested removing the lookup table to see if the slight performance boost still applies to the data cache and it looks like it doesn't, at least on my end. Just testing with Mario Kart Wii, where one of the most CPU intensive things on Dolphin is the THP video processing. The main menu went from a consistent 45% to 35% speed.
I didn't implement the L2 cache, but I don't see a situation where that really would be necessary with L1 cache emulation, but I could be wrong
I don't think I touched anything related to that actually, that's just something that already existed in Dolphin |
I think some of the other refactoring in my PR made up for the difference, but I'm not 100% sure. I'll need to experiment at some point.
I don't think there's any situation where it matters apart from trying to accurately handle timing for cache hits/misses, which probably isn't worth it. It still would be useful to have TODO comments though.
You have the Dolphin does have a hack where it maps extra memory that games can use for the locked cache since they tend to have it at a specific fake address ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor things
|
Because 3/6 known dcache sensitive titles only use the dcache at startup, is it possible we could make it so this could be enabled/disabled during emulation, or perhaps through using savestates? |
|
Disabling through save states works, it flushes the cache on load if the setting is disabled |
|
interesting, I tried that and performance was still really bad. |
|
Okay, so performance does improve when I disable dcache, it's just that the mod performs way worse than the vanilla game even in the menus for some odd reason. It might be worth investigation as I'm not entirely sure what's different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I can't exactly review the code itself, I tested the 6 games I know are dcache games and they all work without patches with this so far. Performance is pretty bad, but hey, I thought it was going to be way worse.
I don't think most casual users will appreciate the feature nearly as much and we probably don't need to rush to add this to the Android GUI where CPU power is a bit more limited.
|
Tested a few more games with "unknown" issues to see if dcache was the problem, unfortunately it wasn't. The good news is that dcache support works and I get more testing on it. The two games I tested was Active Life: Magical Carnival and Summoner: A Goddess Reborn. Neither game benefited from dcache emulation and their long standing crashes remain. Just noting that to document it was tested. |
|
Still needs a rebase due to state.cpp conflicts. |
|
I GOT ONE! Ten Pin Alley 2 is our first newly discovered dcache game! It crashes with tons of problems, but with write back cache enabled, it runs fine (albeit slowly.) Unfortunately, the trick for turning off the dcache to speed things up won't work here - it'll crash as soon as you get to the next menu transition. |
|
I'm not sure exactly how to proceed, but I do think some thought should be given to the games that have dcache audio issues and currently use patches (Dead to rights and I think Resident Evil). Obviously the patches are faster, but if we have an option to not patch the game then I'm in favor of that, if not as the default, but certainly documented somewhere. (Maybe in the .ini but commented out?) |
|
We have proper emulation and patches. This isn't an either or situation. We can use the patches for performance/user reasons, but have accurate emulation for games that can't be patched or for if they want to run things accurately. |
|
With Write-Back Cache enabled, Resident Evil 3 runs at 40~70% speed in-game and at 20% speed in FMVs on my MacBook Pro 16" (2019). This is a direct port of a PS1 game and I can't even run it at full speed! Given the rather large performance loss, I'm highly against having dcache on by default instead of the patches. Perhaps the wiki would be a good place to document this for curious users. |
|
yeah, dcache is way too slow to enable unfortunately. It's not that much slower in Disney games, but the other games are hit really hard. |
|
I would prefer it if 562aa89 were reverted, assuming that properly emulating For context, Super Mario Sunshine uses |
|
I ran tests on hardware and I couldn't find a situation where the dcbt/dcbtst instructions actually do anything, so I think it's more accurate to always ignore the instruction |
|
Regarding the instructions doing nothing, do you have any HW tests to show its behaviors? We can always add simple ones to the repo in order to make sure things stay accurate in the future. |
|
I'd like to see this finished and merged in, seems like it would be a very useful addition to the emulator. |
|
It would be, but there's currently a bit of questions around the dcbt/dcbtst instructions. From what I saw in IRC, and please understand I don't understand anything about this, it needs to be prefetched or something to work. |
|
The idea afaik is that fetching the data cache when emulating dcbt and dcbst would be more accurate to hardware as they are instructions that update the it. However TheLordScruffy found through testing that it doesn't always make a difference on hardware and thus would be more accurate to simply ignore the instructions (as this is what happens the majority of the time?). I should also state I don't know much about what's going on here but I think it'd be nice to verify these findings quickly. |
|
I would prefer that we leave it as be for this, and maybe split it off into a separate pull request then? That way, we can discuss that change alone there. Right now, there are hardware experts that say that ignoring it is wrong. Perhaps we need to do more hardware testing on it in the future. Dolphin does have a HW testing repo - if HW tests proved the instructions didn't do anything then I think it'd be fair to disable them. But to find what situations they did/didn't work in would be a lot of work. I hope this makes sense, I am only explaining this as a bit of a fly on the wall - I am not a hardware expert or someone who really understands low level CPU stuff. |
|
I personally think that is a great plan. This PR as it is already shows undeniable improvements to the emulation accuracy, with or without the dcb(s)t changes. Having the controversial changes made into its own PR would also allow a more proper focus on the issue. If we could do that I think it will be much easier to resolve. |
|
Has anything more happened with this? |
|
I looked over it and, although I'm not particularly familiar with the intricacies of the PPC data cache, it looks fine to me. The commit history is a bit messy and there's a merge conflict (my bad), but other than that, I think we should merge this after the next beta. Any objections? @TheLordScruffy can you rebase this on master and fix the conflicts? |
why not before the next beta? just needs more user testing? |
|
Beta builds are the ones associated with progress reports (supposedly monthly, in practice less frequent than that) and are generally supposed to be stable, ideally with all of the big changes having been in them for a little while so that users on the dev builds can find the bugs ahead of time. I still want to look over this in more detail, but I'm not sure when I'll have time. |
a76719c
to
e97d380
Compare
|
Should be okay now |
Source/Core/Core/PowerPC/Interpreter/Interpreter_SystemRegisters.cpp
Outdated
Show resolved
Hide resolved
|
What's the status here? Seems like there has been no reaction to the comments from ~2 weeks ago. |
|
I'll get to it soon |
|
@Pokechu22 Can you re-check this? I'd like to get this merged sometime... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. I have some thoughts on the way you handled tags vs addrs, but they're fairly minor.
I'm fine with leaving modified as a field in icache even if it's unused; I don't think there's any clean way to avoid it and it's not that much data.
I still would like feedback from @delroth on the removed icache analytics (#11183 (comment)).
|
Delroth has no objections to removing the icache-matters game quirk. So, for #11183 (comment) you should just need to remove the first entry here: dolphin/Source/Core/Core/DolphinAnalytics.h Lines 24 to 26 in dcded04
Mark dolphin/Source/Core/Core/DolphinAnalytics.cpp Lines 137 to 139 in dcded04
and set the size to 27. I'm pretty sure the numeric values are only used inside of Dolphin and aren't used by the actual analytics system. |
The previous code only updated the PLRU on cache misses, which made it so that the least recently inserted cache block was evicted, instead of the least recently used/hit one. This regressed in 9d39647 (part of dolphin-emu#11183, but it was fine in e97d380), although beforehand it was only implemented for the instruction cache, and the instruction cache hit extremely infrequently when the JIT or cached interpreter is in use, which generally keeps it from behaving correctly (the pure interpreter behaves correctly with it). I'm not aware of any games that are affected by this, though I did not do extensive testing.
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
The previous code only updated the PLRU on cache misses, which made it so that the least recently inserted cache block was evicted, instead of the least recently used/hit one. This regressed in 9d39647 (part of dolphin-emu#11183, but it was fine in e97d380), although beforehand it was only implemented for the instruction cache, and the instruction cache hit extremely infrequently when the JIT or cached interpreter is in use, which generally keeps it from behaving correctly (the pure interpreter behaves correctly with it). I'm not aware of any games that are affected by this, though I did not do extensive testing.
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
The previous code only updated the PLRU on cache misses, which made it so that the least recently inserted cache block was evicted, instead of the least recently used/hit one. This regressed in 9d39647 (part of dolphin-emu#11183, but it was fine in e97d380), although beforehand it was only implemented for the instruction cache, and the instruction cache hit extremely infrequently when the JIT or cached interpreter is in use, which generally keeps it from behaving correctly (the pure interpreter behaves correctly with it). I'm not aware of any games that are affected by this, though I did not do extensive testing.
This adapts the instruction cache implementation used by the interpreter into a general purpose PPC cache, enabling the ability to emulate the data cache. Due to the very negative impact on performance, I made it an option in Config > Advanced that defaults to off.
Enabling will make Dolphin compatible with CTGP Revolution, as well as make older versions of CTGP (that normally work on Dolphin) more stable by properly emulating a mistake it relies on, where an instruction patch does not immediately apply due to the data cache not being flushed. In addition, fixes to the IABR and UPMC registers were made to allow CTGP to function properly.
This should also make the dcbz hack preventing regions (0x80000000 - 0x80008000) unnecessary for the few games that do this, so it disables it automatically, but someone with a copy of one of these games will need to test this as I could only simulate it. I also have not yet tested JIT ARM to see if it works at all with the setting on.