EERec: Implement fastmem #5821

stenzek · 2022-04-04T15:36:38Z

Description of Changes

Now that 32-bit is gone, we can do fastmem! What is fastmem? It's basically the idea of abusing the host's MMU to do virtual address translation for the guest.

This is still work in progress; I still have to fix up the unaligned loadstore instructions, and fix some edge cases with page remapping on Windows. But it works well enough for some basic testing (use Qt, I didn't even add the options in wx.)

Quoting myself because it's late and I don't feel like rewriting an explanation today:

fastmem just takes that idea of RAM/MMIO pages a step further, and reserves a 4GB region (reserves, so only the page tables are allocated, no actual memory), and then aliases the EE RAM throughout it based on the TLB state
so, MMIO pages have no mapping, and will page fault when you try to access them
so, when you want to access it in the EE JIT, you just yolo access [base + address], if you fault, you backpatch, jump out to a handler which saves regs, calls the C handler, restores and jumps back
basically, optimizing for 99% of cases, and handling the other 1% when they actually happen 😉
current VTLB is still reasonably efficient though; it tests the MSB bit of the pointer in the LUT to check if it's a handler/MMIO page. but it's still an extra memory load (for the pointer), and a compare/branch, whereas with fastmem, we abuse the host's address translation
basically, on x86, it does something like (assuming the address you want to load is in rcx, and data goes into rsi)

mov rdx, rcx
shr rdx, 12
mov rax, oops_we_hit_a_hw_address
mov rbx, qword [vtlb_vmap + rdx * 8]  ; lookup page in vtlb
test rbx, rbx ; check if it's a handler page
js C_dispatcher ; jump out to C if it's a handler page
and rcx, 4095 ; get page offset
mov esi, dword [rbx + rcx] ; load RAM
oops_we_hit_a_hw_address: ; jump back location for handlers

in contrast, that same load with fastmem would be (assuming address in rcx, data in rsi, fastmem base pointer in rbp)

mov esi, dword [rbp + rcx]

with the added benefit that you don't need to flush the regcache prior, since with the current behavior, you don't know if it's going to call to C or not (and with fastmem, you save it in the "slow" thunk)

Performance boost won't be huge yet, it can be 5-10% depending on the game, often <5%. But it enables further optimization with register caching in the future, and it all adds up.

Rationale behind Changes

Little bit of brr.

Suggested Testing Steps

I'll update when it's actually ready for testing/use.

JordanTheToaster · 2022-04-04T21:39:27Z

Burnout 1 Burnout 3 Burnout Revenge and Burnout Dominator all fail to load with fastmem enabled.

github-actions bot added GUI/Qt Vector Units labels Apr 4, 2022

stenzek added 3 commits April 5, 2022 23:20

x86emitter: Fix missing W REX bit for movq reg, xmm

b6b9b06

x86Emitter: Fix incorrect displacement for some 64-bit values

fb4d138

EERec: Implement fastmem

5eafd9c

stenzek marked this pull request as draft April 6, 2022 12:47

lightningterror added the Needs Rebase label May 2, 2022

stenzek closed this Jul 8, 2022

stenzek deleted the fastmem branch July 8, 2022 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EERec: Implement fastmem #5821

EERec: Implement fastmem #5821

stenzek commented Apr 4, 2022

JordanTheToaster commented Apr 4, 2022 •

edited

EERec: Implement fastmem #5821

EERec: Implement fastmem #5821

Conversation

stenzek commented Apr 4, 2022

Description of Changes

Rationale behind Changes

Suggested Testing Steps

JordanTheToaster commented Apr 4, 2022 • edited

JordanTheToaster commented Apr 4, 2022 •

edited