Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PowerPC data cache #11183

Merged
merged 5 commits into from Jan 9, 2023

Conversation

mkwcat
Copy link
Contributor

@mkwcat mkwcat commented Oct 20, 2022

This adapts the instruction cache implementation used by the interpreter into a general purpose PPC cache, enabling the ability to emulate the data cache. Due to the very negative impact on performance, I made it an option in Config > Advanced that defaults to off.

Enabling will make Dolphin compatible with CTGP Revolution, as well as make older versions of CTGP (that normally work on Dolphin) more stable by properly emulating a mistake it relies on, where an instruction patch does not immediately apply due to the data cache not being flushed. In addition, fixes to the IABR and UPMC registers were made to allow CTGP to function properly.

This should also make the dcbz hack preventing regions (0x80000000 - 0x80008000) unnecessary for the few games that do this, so it disables it automatically, but someone with a copy of one of these games will need to test this as I could only simulate it. I also have not yet tested JIT ARM to see if it works at all with the setting on.

@JMC47
Copy link
Contributor

JMC47 commented Oct 20, 2022

I can test the DCBZ low games, will give it a try.

Edit: Can confirm this allows Disney Infinity and Cars 2 to run without the DCBZ low hack, however performance is a bit problematic. Not to say the games run well ever. Honestly enabling it doesn't cause them to run much worse than the hack, which is rather surprising.

Overall, versus the hack it's between 10 and 30%... which isn't nearly as catastrophic as I expected. Great job!

@Pokechu22
Copy link
Contributor

Good work! I haven't reviewed this in detail yet, but I want to point out the existence of a few related PRs on my end: #10818 (which fixes an accuracy bug with icbi), and #10826 (which does some other cleanup in PPCCache.cpp, but which I didn't complete due to uncertainty of how to avoid creating new challenges for dcache).

I also see that you removed the comments relating to L2 cache emulation, but I'm not sure if you've actually implemented the L2 cache, or only the L1 data cache. It looks like you did implement the locked L1 data cache at least (though I haven't looked over that in detail).

@mkwcat
Copy link
Contributor Author

mkwcat commented Oct 20, 2022

Good work! I haven't reviewed this in detail yet, but I want to point out the existence of a few related PRs on my end: #10818 (which fixes an accuracy bug with icbi), and #10826 (which does some other cleanup in PPCCache.cpp, but which I didn't complete due to uncertainty of how to avoid creating new challenges for dcache).

I just tested removing the lookup table to see if the slight performance boost still applies to the data cache and it looks like it doesn't, at least on my end. Just testing with Mario Kart Wii, where one of the most CPU intensive things on Dolphin is the THP video processing. The main menu went from a consistent 45% to 35% speed.

I also see that you removed the comments relating to L2 cache emulation, but I'm not sure if you've actually implemented the L2 cache, or only the L1 data cache.

I didn't implement the L2 cache, but I don't see a situation where that really would be necessary with L1 cache emulation, but I could be wrong

It looks like you did implement the locked L1 data cache at least (though I haven't looked over that in detail).

I don't think I touched anything related to that actually, that's just something that already existed in Dolphin

@Pokechu22
Copy link
Contributor

I just tested removing the lookup table to see if the slight performance boost still applies to the data cache and it looks like it doesn't, at least on my end. Just testing with Mario Kart Wii, where one of the most CPU intensive things on Dolphin is the THP video processing. The main menu went from a consistent 45% to 35% speed.

I think some of the other refactoring in my PR made up for the difference, but I'm not 100% sure. I'll need to experiment at some point.

I didn't implement the L2 cache, but I don't see a situation where that really would be necessary with L1 cache emulation, but I could be wrong

I don't think there's any situation where it matters apart from trying to accurately handle timing for cache hits/misses, which probably isn't worth it. It still would be useful to have TODO comments though.

I don't think I touched anything related to that actually, that's just something that already existed in Dolphin

You have the bool locked parameter on GetCache, though I'm not sure if you implemented it correctly; it looks like you treat locked as "no modifications allowed". On further investigation, this is HID0[DLOCK], described in section 3.4.1.3 of the ppc 750cl manual. What I was thinking of is HID2[LCE], which splits the data cache into 2 partitions of 4 ways, described in section 9.2 of the manual. Both of these seem to be referred to as "locked", which is confusing.

Dolphin does have a hack where it maps extra memory that games can use for the locked cache since they tend to have it at a specific fake address (0xE0000000). We probably don't need to replace that hack, but it might be nice to have a correct implementation anyways.

Copy link
Member

@lioncash lioncash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things

Source/Core/Core/PowerPC/MMU.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/MMU.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/MMU.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.cpp Outdated Show resolved Hide resolved
@JMC47
Copy link
Contributor

JMC47 commented Oct 20, 2022

Because 3/6 known dcache sensitive titles only use the dcache at startup, is it possible we could make it so this could be enabled/disabled during emulation, or perhaps through using savestates?

@mkwcat
Copy link
Contributor Author

mkwcat commented Oct 20, 2022

Disabling through save states works, it flushes the cache on load if the setting is disabled

@JMC47
Copy link
Contributor

JMC47 commented Oct 20, 2022

interesting, I tried that and performance was still really bad.

@JMC47
Copy link
Contributor

JMC47 commented Oct 21, 2022

Okay, so performance does improve when I disable dcache, it's just that the mod performs way worse than the vanilla game even in the menus for some odd reason. It might be worth investigation as I'm not entirely sure what's different.

Copy link
Contributor

@JMC47 JMC47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I can't exactly review the code itself, I tested the 6 games I know are dcache games and they all work without patches with this so far. Performance is pretty bad, but hey, I thought it was going to be way worse.

I don't think most casual users will appreciate the feature nearly as much and we probably don't need to rush to add this to the Android GUI where CPU power is a bit more limited.

@JMC47
Copy link
Contributor

JMC47 commented Oct 22, 2022

Tested a few more games with "unknown" issues to see if dcache was the problem, unfortunately it wasn't. The good news is that dcache support works and I get more testing on it. The two games I tested was Active Life: Magical Carnival and Summoner: A Goddess Reborn. Neither game benefited from dcache emulation and their long standing crashes remain. Just noting that to document it was tested.

@JMC47
Copy link
Contributor

JMC47 commented Oct 23, 2022

Still needs a rebase due to state.cpp conflicts.

@JMC47
Copy link
Contributor

JMC47 commented Oct 24, 2022

I GOT ONE!

Ten Pin Alley 2 is our first newly discovered dcache game! It crashes with tons of problems, but with write back cache enabled, it runs fine (albeit slowly.) Unfortunately, the trick for turning off the dcache to speed things up won't work here - it'll crash as soon as you get to the next menu transition.

@Zopolis4
Copy link
Contributor

I'm not sure exactly how to proceed, but I do think some thought should be given to the games that have dcache audio issues and currently use patches (Dead to rights and I think Resident Evil). Obviously the patches are faster, but if we have an option to not patch the game then I'm in favor of that, if not as the default, but certainly documented somewhere. (Maybe in the .ini but commented out?)

@JMC47
Copy link
Contributor

JMC47 commented Oct 24, 2022

We have proper emulation and patches. This isn't an either or situation. We can use the patches for performance/user reasons, but have accurate emulation for games that can't be patched or for if they want to run things accurately.

@OatmealDome
Copy link
Member

With Write-Back Cache enabled, Resident Evil 3 runs at 40~70% speed in-game and at 20% speed in FMVs on my MacBook Pro 16" (2019). This is a direct port of a PS1 game and I can't even run it at full speed!

Given the rather large performance loss, I'm highly against having dcache on by default instead of the patches.

Perhaps the wiki would be a good place to document this for curious users.

@JMC47
Copy link
Contributor

JMC47 commented Oct 24, 2022

yeah, dcache is way too slow to enable unfortunately. It's not that much slower in Disney games, but the other games are hit really hard.

dvessel added a commit to dvessel/dolphin that referenced this pull request Oct 24, 2022
@Pokechu22
Copy link
Contributor

I would prefer it if 562aa89 were reverted, assuming that properly emulating dcbt/dcbtst doesn't have a significant performance cost (it looks like you disabled them only because there was no gain from supporting them). If there is a significant cost, perhaps keep them in the interpreter but comment out the FALLBACK_IF(m_accurate_cpu_cache_enabled); in the JITs.

For context, Super Mario Sunshine uses dcbt in __LCEnable (which enables the locked L1 cache), as part of a sequence of dcbt and dcbst instructions from address 80000000 to 80008000, and it calls __LCEnable from LCEnable from THPPlayer::THPPlayerInit (i.e. it's used in FMVs). Kirby's Return To Dream Land similarly uses __LCEnable in G3dInit__Q24nw4r3g3dFb (I think), and also in memcpy a bit before it reads (dcbst is not used in memcpy). So the instruction does appear in some places, at least.

@mkwcat
Copy link
Contributor Author

mkwcat commented Oct 24, 2022

I ran tests on hardware and I couldn't find a situation where the dcbt/dcbtst instructions actually do anything, so I think it's more accurate to always ignore the instruction

dvessel added a commit to dvessel/dolphin that referenced this pull request Oct 25, 2022
@JMC47
Copy link
Contributor

JMC47 commented Oct 27, 2022

Regarding the instructions doing nothing, do you have any HW tests to show its behaviors? We can always add simple ones to the repo in order to make sure things stay accurate in the future.

@JoshuaMKW
Copy link
Contributor

I'd like to see this finished and merged in, seems like it would be a very useful addition to the emulator.

@JMC47
Copy link
Contributor

JMC47 commented Nov 2, 2022

It would be, but there's currently a bit of questions around the dcbt/dcbtst instructions. From what I saw in IRC, and please understand I don't understand anything about this, it needs to be prefetched or something to work.

@JoshuaMKW
Copy link
Contributor

JoshuaMKW commented Nov 2, 2022

The idea afaik is that fetching the data cache when emulating dcbt and dcbst would be more accurate to hardware as they are instructions that update the it. However TheLordScruffy found through testing that it doesn't always make a difference on hardware and thus would be more accurate to simply ignore the instructions (as this is what happens the majority of the time?). I should also state I don't know much about what's going on here but I think it'd be nice to verify these findings quickly.

@JMC47
Copy link
Contributor

JMC47 commented Nov 2, 2022

I would prefer that we leave it as be for this, and maybe split it off into a separate pull request then? That way, we can discuss that change alone there. Right now, there are hardware experts that say that ignoring it is wrong.

Perhaps we need to do more hardware testing on it in the future. Dolphin does have a HW testing repo - if HW tests proved the instructions didn't do anything then I think it'd be fair to disable them. But to find what situations they did/didn't work in would be a lot of work.

I hope this makes sense, I am only explaining this as a bit of a fly on the wall - I am not a hardware expert or someone who really understands low level CPU stuff.

@JoshuaMKW
Copy link
Contributor

I personally think that is a great plan. This PR as it is already shows undeniable improvements to the emulation accuracy, with or without the dcb(s)t changes. Having the controversial changes made into its own PR would also allow a more proper focus on the issue. If we could do that I think it will be much easier to resolve.

@Hibyehello
Copy link
Contributor

Has anything more happened with this?

@AdmiralCurtiss
Copy link
Contributor

I looked over it and, although I'm not particularly familiar with the intricacies of the PPC data cache, it looks fine to me. The commit history is a bit messy and there's a merge conflict (my bad), but other than that, I think we should merge this after the next beta.

Any objections? @TheLordScruffy can you rebase this on master and fix the conflicts?

@cobalt2727
Copy link

after the next beta

why not before the next beta? just needs more user testing?

@Pokechu22
Copy link
Contributor

Beta builds are the ones associated with progress reports (supposedly monthly, in practice less frequent than that) and are generally supposed to be stable, ideally with all of the big changes having been in them for a little while so that users on the dev builds can find the bugs ahead of time.

I still want to look over this in more detail, but I'm not sure when I'll have time.

@mkwcat
Copy link
Contributor Author

mkwcat commented Dec 8, 2022

Should be okay now

Source/Core/Core/Boot/Boot_BS2Emu.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/MMU.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.h Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PowerPC.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.cpp Show resolved Hide resolved
Source/Core/Core/PowerPC/PowerPC.cpp Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.cpp Outdated Show resolved Hide resolved
@AdmiralCurtiss
Copy link
Contributor

AdmiralCurtiss commented Dec 27, 2022

What's the status here? Seems like there has been no reaction to the comments from ~2 weeks ago.

@mkwcat
Copy link
Contributor Author

mkwcat commented Dec 28, 2022

I'll get to it soon

@AdmiralCurtiss
Copy link
Contributor

@Pokechu22 Can you re-check this? I'd like to get this merged sometime...

Copy link
Contributor

@Pokechu22 Pokechu22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. I have some thoughts on the way you handled tags vs addrs, but they're fairly minor.

I'm fine with leaving modified as a field in icache even if it's unused; I don't think there's any clean way to avoid it and it's not that much data.

I still would like feedback from @delroth on the removed icache analytics (#11183 (comment)).

Source/Core/Core/PowerPC/PPCCache.h Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCCache.cpp Outdated Show resolved Hide resolved
@Pokechu22
Copy link
Contributor

Delroth has no objections to removing the icache-matters game quirk. So, for #11183 (comment) you should just need to remove the first entry here:

// Sometimes code run from ICache is different from its mirror in RAM.
ICACHE_MATTERS = 0,

Mark DIRECTLY_READS_WIIMOTE_INPUT = 0 instead; then also remove the first entry here:

// Keep in sync with enum class GameQuirk definition.
constexpr std::array<const char*, 28> GAME_QUIRKS_NAMES{
"icache-matters",

and set the size to 27. I'm pretty sure the numeric values are only used inside of Dolphin and aren't used by the actual analytics system.

@AdmiralCurtiss AdmiralCurtiss merged commit eeeab3c into dolphin-emu:master Jan 9, 2023
11 checks passed
Pokechu22 added a commit to Pokechu22/dolphin that referenced this pull request Feb 13, 2023
The previous code only updated the PLRU on cache misses, which made it so that the least recently inserted cache block was evicted, instead of the least recently used/hit one.

This regressed in 9d39647 (part of dolphin-emu#11183, but it was fine in e97d380), although beforehand it was only implemented for the instruction cache, and the instruction cache hit extremely infrequently when the JIT or cached interpreter is in use, which generally keeps it from behaving correctly (the pure interpreter behaves correctly with it).

I'm not aware of any games that are affected by this, though I did not do extensive testing.
Minty-Meeo added a commit to Minty-Meeo/dolphin that referenced this pull request Apr 17, 2023
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
Minty-Meeo added a commit to Minty-Meeo/dolphin that referenced this pull request Apr 17, 2023
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
Minty-Meeo added a commit to Minty-Meeo/dolphin that referenced this pull request Apr 17, 2023
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
Minty-Meeo added a commit to Minty-Meeo/dolphin that referenced this pull request Apr 17, 2023
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
SirMangler pushed a commit to SirMangler/dolphin that referenced this pull request Apr 22, 2023
The previous code only updated the PLRU on cache misses, which made it so that the least recently inserted cache block was evicted, instead of the least recently used/hit one.

This regressed in 9d39647 (part of dolphin-emu#11183, but it was fine in e97d380), although beforehand it was only implemented for the instruction cache, and the instruction cache hit extremely infrequently when the JIT or cached interpreter is in use, which generally keeps it from behaving correctly (the pure interpreter behaves correctly with it).

I'm not aware of any games that are affected by this, though I did not do extensive testing.
Minty-Meeo added a commit to Minty-Meeo/dolphin that referenced this pull request Apr 23, 2023
PR dolphin-emu#11183 regressed the lookup table reconstruction and, for some reason, added an else clause that clobbered the dCache whenever dCache emulation is turned on.
Hibyehello pushed a commit to Hibyehello/dolphin that referenced this pull request Apr 27, 2023
The previous code only updated the PLRU on cache misses, which made it so that the least recently inserted cache block was evicted, instead of the least recently used/hit one.

This regressed in 9d39647 (part of dolphin-emu#11183, but it was fine in e97d380), although beforehand it was only implemented for the instruction cache, and the instruction cache hit extremely infrequently when the JIT or cached interpreter is in use, which generally keeps it from behaving correctly (the pure interpreter behaves correctly with it).

I'm not aware of any games that are affected by this, though I did not do extensive testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet