Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Dynamic bat #1882

Closed
wants to merge 1 commit into from
Closed

Conversation

magumagu
Copy link
Contributor

@magumagu magumagu commented Jan 13, 2015

This is close to being ready.

Known to fix Clone Wars, gc-linux (to some extent), issue 8064. Cars 2 still not working yet, but this is probably a step in the right direction. Also some fringe benefits: this makes the MMU code easier to understand, and it should be a bit harder to crash Dolphin.

TODO:

  • Make this work on Andriod
  • Correctly handle the interaction between tlbie and the JIT cache
  • Optimized dcbz implementation
  • Misc performance optimization

This change is Reviewable

@JMC47
Copy link
Contributor

JMC47 commented Jan 13, 2015

Cars 2: "Unable to resolve write address 809c5958 PC 8000"

@Xcedf
Copy link

Xcedf commented Jan 13, 2015

Fixes Issue 1132, succesfully goes ingame, Thanks

case SPR_DBAT6U:
case SPR_DBAT7L:
case SPR_DBAT7U:
PowerPC::DBATUpdated();

This comment was marked as off-topic.

@PatrickFerry
Copy link
Contributor

Without MMU On:
True Crime: New York City: "Unable to resolve write address 3e007c4c PC 80034a0c"

(Then it keeps trying different addresses and so on, of course.)
This is good to see :) (This PR; that is)

@magumagu
Copy link
Contributor Author

Latest should fix fake-vmem... still not sure what's going on with Toy Story 3.

@magumagu
Copy link
Contributor Author

b939ca7 needs to be optimized, but it should fix Toy Story 3.

@magumagu
Copy link
Contributor Author

Current progress: most of the remaining work is performance optimization. The most critical areas are a fixed-up version of block linking and faster non-fastmem load/store. (Non-fastmem load/store matters for paired ops load/store.)

@magumagu magumagu force-pushed the dynamic-bat branch 4 times, most recently from 934da6a to f471ffc Compare January 21, 2015 01:45
@magumagu
Copy link
Contributor Author

More progress: the jit cache stuff is done except it's still missing correct tlbie handling. I came up with a hacky solution for paired load/store, so that should be working much faster. Still missing an optimized version of dcbz, but besides that there aren't any major missing optimizations. Probably worth benchmarking to see where this is relative to master (assuming I didn't break anything).

@JMC47
Copy link
Contributor

JMC47 commented Jan 21, 2015

https://dl.dropboxusercontent.com/u/484730/GLRE64-8.png

Also, non-MMU games are getting a pretty high performance hit.

@@ -26,7 +26,7 @@ TEST(UniqueID, UniqueEnough)
TEST(IsMMIOAddress, SpecialAddresses)
{
// WG Pipe address, should not be handled by MMIO.
EXPECT_FALSE(MMIO::IsMMIOAddress(0xCC008000));
EXPECT_FALSE(MMIO::IsMMIOAddress(0x0C008000));

// Memory zone used by games using the "MMU Speedhack".

This comment was marked as off-topic.

@JMC47
Copy link
Contributor

JMC47 commented Jul 7, 2016

I'm going to gather performance numbers in this post. I tried to even things out over several areas to get a fair assessment in demanding areas.

Rayman Arena: +8%
Star Wars Rogue Squadron 2: -23%
Star Wars Rogue Squadron 3: -29%
F-Zero GX: -1%
Super Mario Galaxy: +-1%
Super Swing Golf: +14%
Mario Kart Wii: +9%

blocks[block_num].msrBits != (MSR & JitBlock::JIT_CACHE_MSR_MASK))
{
MoveBlockIntoFastCache(addr);
addr = PC;

This comment was marked as off-topic.

@degasus
Copy link
Member

degasus commented Jul 10, 2016

I've rebased + squashed those commits a bit: https://github.com/degasus/dolphin/commits/dynamic-bat
There are two new fixup commits in it. The C++ dispatcher is working fine, but "Support for dynamic BAT modification (dynamic-bat)." still breaks some ARM load/store instructions. @magumagu can you give me some hints how this commit affects load/store instructions? May I also get a better description for commit degasus@2750b57 ?

@magumagu
Copy link
Contributor Author

degasus@2750b57 is basically just a fixup for some code I rebased incorrectly; the check for slowmem and the call to MemoryExceptionCheck already exist on master.

It breaks some ARM load/store instructions, hmmm. From the JIT's perspective, the big change is that the fastmem arena reflects the BATs correctly. That means 0x0 isn't usually mapped anymore when address translation is turned on, so if you try to use the wrong memory base, things will explode.

It's probably worth checking that the memory base is set correctly; I tried to add some code to JitAsm to do that (the code that messes with MEM_REG), but I'm not sure I got it right. If I got that wrong, basically every load/store instruction would be broken.

Beyond that, it's probably important to make sure ARM code isn't making bad assumptions about address translation and address values... it looks like paired load/store is doing some suspicious stuff, but the other instructions look okay? Not sure what else could be happening.

@degasus
Copy link
Member

degasus commented Jul 11, 2016

Seems like also the linking is also broken on ARM. Are you fine with me implementing an "unlinking" of blocks instead of "destroying" them? I've also already written the patch to clear the BLR stack on MSR changes :D

UpdateFakeMMUBat(dbat_table, 0x70000000);
}
Memory::UpdateLogicalMemory(dbat_table);
JitInterface::ClearSafe();

This comment was marked as off-topic.

This comment was marked as off-topic.

@magumagu
Copy link
Contributor Author

"Unlinking" is fine, I guess... although I don't really see how it would fix anything.

@JMC47
Copy link
Contributor

JMC47 commented Jul 11, 2016

Also, the icache clears probably fix various bugs in romhacks at least. It allows us to run the Wiimm's Mario Kart Hacks.

Fundamentally, all this does is enforce the invariant that we always
translate effective addresses based on the current BAT registers and
page table before we do anything else with them.

This change can be logically divided into three parts.  The first part is
creating a table to represent the current BAT state, and keeping it up to
date (PowerPC::IBATUpdated, PowerPC::DBATUpdated, etc.).  This does
nothing by itself, but it's necessary for the other parts.

The second part (mostly in MMU.cpp) is simply removing all the hardcoded
checks for specific untranslated addresses, and consistently translating
addresses using the current BAT configuration. Very straightforward, but a
lot of code changes because we hardcoded assumptions all over the place.

The third part (mostly in Memmap.cpp) is making the fastmem arena reflect
the current BAT configuration.  We do this by redoing the mapping (calling
memmap()) based on the BAT table whenever it changes.

One additional minor change is that translation can fail in two ways:
either the segment is a direct store segment, or page table lookup failed.
The difference doesn't usually matter, but the difference affects cache
instructions, like dcbz.

Untested on ARM64, so the changes there are probably broken.
@magumagu
Copy link
Contributor Author

Pushed rebased version. I squashed everything to make it easier to rebase; if anyone wants to look at what it looked like before this rebase, see magumagu@b862065 .

This doesn't include magumagu@66493f3 ; thinking about it a bit more, I'm not sure it's really the right approach, it needs some changes to avoid a substantial performance penalty in MMU games, and it's not really related to the rest of the changes, so I left it out. It'll probably need to be revived in some form to get Wii Linux fully working (so we can model a context switch correctly).

I'm assuming ARM64 JIT is still broken, but it's hard for me to say for sure without understanding exactly what's happening.

if ((address & 0xF0000000) != 0xC0000000)
return 0;
// Translate address
u32 bat_result = dbat_table[address >> BAT_INDEX_SHIFT];

This comment was marked as off-topic.

@JMC47
Copy link
Contributor

JMC47 commented Aug 11, 2016

Was doing some performance testing with games. Decided to mess with MMU emulation always on because I somehow misremembered that being a part of this when it's a separate pull request.

Performance is weird.

Melee on FoD starts at ~90 fps and then slowly increases to above the speed you get with MMU off. (128 fps without, 132 fps with on.)

Seeing this, I decided to check some other games.

Super Mario Galaxy - No Different in performance.
Super Smash Bros. Brawl - 190 fps without MMU, 201 fps with MMU on.

I can test more as well, I just don't want to rush it if this is pointless or flawed because I'm trying to be more careful with my performance testing.

@JMC47
Copy link
Contributor

JMC47 commented Aug 12, 2016

I can't get The Legend of Zelda: Four Swords Adventures to run in this pull request.

Edit: Nevermind. Turns out the different build name caused my firewall to capture it, which froze GBA <-> GCN stuff that I was messing with.

@degasus
Copy link
Member

degasus commented Aug 12, 2016

Reviewed 7 of 27 files at r3, 1 of 6 files at r6, 2 of 5 files at r8, 1 of 2 files at r14, 1 of 4 files at r17, 11 of 19 files at r25, 1 of 1 files at r26.
Review status: all files reviewed at latest revision, 16 unresolved discussions.


Source/Core/Core/HW/Memmap.cpp, line 213 [r26] (raw file):

    }

    mem_size += region.size;

mem_size is not read any more


Source/Core/Core/HW/Memmap.cpp, line 259 [r26] (raw file):

          // practice, that doesn't happen.
          u32 position = physical_region.shm_position;
          if (intersection_start > mapping_address)

no need to make this more complicated. intersection_start is always bigger or equal to mapping_address. So we can make this clause here unconditional.


Source/Core/Core/HW/Memmap.cpp, line 262 [r26] (raw file):

            position += intersection_start - mapping_address;
          u8* base = logical_base + logical_address;
          if (intersection_start > translated_address)

same here


Source/Core/Core/HW/Memmap.cpp, line 273 [r26] (raw file):

          }
          logical_mapped_entries.push_back({mapped_pointer, mapped_size});
          break;

There is a big comment about why this usually doesn't overlap. But if we just drop this "break" here, we'd also support this case. I agree, this won'T matter right now, but it might matter if we support merging adjacent mappings. So I'm for dropping this break; here.


Source/Core/Core/PowerPC/MMU.cpp, line 150 [r26] (raw file):

__forceinline static T ReadFromHardware(const u32 em_address)
{
  int segment = em_address >> 28;

Why have you dropped the variable segment? This could be a good cleanup.


Source/Core/Core/PowerPC/MMU.cpp, line 164 [r26] (raw file):

template <XCheckTLBFlag flag, typename T, bool never_translate = false>
__forceinline static T ReadFromHardware(u32 em_address)

ReadFromHardware and WriteToHardware are crying for merging. The flag is already there, just the return argument here needs to be set by an argument; T* value. Maybe with the restriction to not overwrite it on DSI.

But I think this should go in the next PR....


Source/Core/Core/PowerPC/MMU.cpp, line 182 [r26] (raw file):

      // Note that "word" means 32-bit, so paired singles or doubles might still be 32-bit aligned!
      u32 em_address_next_page = (em_address + sizeof(T) - 1) & ~(HW_PAGE_SIZE - 1);
      auto tlb_addr_next_page = TranslateAddress<flag>(em_address_next_page);

"tlb_addr_next_page" sounds missleading for BAT vs TLB. "addr_next_page"?


Source/Core/Core/PowerPC/MMU.cpp, line 190 [r26] (raw file):

      }
      T var = 0;
      u32 tlb_addr = translated_addr.address;

same "tlb"


Source/Core/Core/PowerPC/MMU.cpp, line 212 [r26] (raw file):

  // physical memory for BAT translation to work; we currently use
  // [0x7E000000, 0x80000000).
  if (Memory::bFakeVMEM && ((em_address & 0xFE000000) == 0x7E000000))

is 0xFE000000 == ~FAKEVMEM_MASK ? No idea if this was on purpose


Source/Core/Core/PowerPC/MMU.cpp, line 214 [r26] (raw file):

  if (Memory::bFakeVMEM && ((em_address & 0xFE000000) == 0x7E000000))
  {
    return bswap(*(T*)&Memory::m_pFakeVMEM[em_address & Memory::RAM_MASK]);

FAKEVMEM_MASK?


Source/Core/Core/PowerPC/Jit64/Jit_LoadStore.cpp, line 372 [r26] (raw file):

    ADD(32, R(RSCRATCH), gpr.R(a));

  if (!UReg_MSR(MSR).DR)

May you reorder this instruction to only emit the slowpath once?
AND(32, R(RSCRATCH), Imm32(~31));
if (DR)
fastpath_if_possible();
slowpath();
if (DR)
SwitchBackToNearCode();


Comments from Reviewable

@Ebola16
Copy link
Member

Ebola16 commented Aug 19, 2016

I just downloaded a build of this PR from https://dl.dolphin-emu.org/prs/pr-1882-dolphin-latest-x64.7z
to see if it has any effect on https://bugs.dolphin-emu.org/issues/8864 (SSBB Alloc Buffer failed).

When attempting to load Gecko OS's .elf Dolphin will crash (attached below). This problem does not occur for me on other builds of Dolphin (tested up to 5.0-443).

Win 10 Pro x64
Intel Core i7-4702MQ CPU @2.20GHz
GeForce GT 750M
2x 8GB Hynix 1600MHz
GeForce Game Ready Driver 372.54
capture
1SSBB Gecko.zip

@JMC47
Copy link
Contributor

JMC47 commented Aug 20, 2016

Probably a bug in the elf loader or something. Didn't that happen last time we merged MMU related stuff?

@PatrickFerry
Copy link
Contributor

PatrickFerry commented Aug 20, 2016

I reckon that bug is because this PR hasn't been rebased.

This commit probably fixed it: link

@JMC47
Copy link
Contributor

JMC47 commented Aug 20, 2016

I'm not 100% sure on that. I can try to make a rebased build to try it, but it doesn't sound exactly like the crash I was having in other games. We'll see! That'd make things easier.

Oh right: Gecko OS was broken for a while because that PR. That makes sense.

@Ebola16
Copy link
Member

Ebola16 commented Aug 21, 2016

I'm currently solving visual studio issues on my computer and I don't know how to check when the last rebase of this PR occurred. Also, https://bugs.dolphin-emu.org/issues/8267 was another source of .elf crashes but I don't know if the PR was merged before the latest rebase.

I'll be happy to test this PR when it's rebased again.

@degasus degasus mentioned this pull request Aug 23, 2016
@degasus
Copy link
Member

degasus commented Sep 6, 2016

Merged in #4146

@degasus degasus closed this Sep 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP / do not merge Work in progress (do not merge)