Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

EERec: Implement fastmem #5821

Closed
wants to merge 3 commits into from
Closed

EERec: Implement fastmem #5821

wants to merge 3 commits into from

Conversation

stenzek
Copy link
Member

@stenzek stenzek commented Apr 4, 2022

Description of Changes

Now that 32-bit is gone, we can do fastmem! What is fastmem? It's basically the idea of abusing the host's MMU to do virtual address translation for the guest.

This is still work in progress; I still have to fix up the unaligned loadstore instructions, and fix some edge cases with page remapping on Windows. But it works well enough for some basic testing (use Qt, I didn't even add the options in wx.)

Quoting myself because it's late and I don't feel like rewriting an explanation today:

fastmem just takes that idea of RAM/MMIO pages a step further, and reserves a 4GB region (reserves, so only the page tables are allocated, no actual memory), and then aliases the EE RAM throughout it based on the TLB state
so, MMIO pages have no mapping, and will page fault when you try to access them
so, when you want to access it in the EE JIT, you just yolo access [base + address], if you fault, you backpatch, jump out to a handler which saves regs, calls the C handler, restores and jumps back
basically, optimizing for 99% of cases, and handling the other 1% when they actually happen 馃槈
current VTLB is still reasonably efficient though; it tests the MSB bit of the pointer in the LUT to check if it's a handler/MMIO page. but it's still an extra memory load (for the pointer), and a compare/branch, whereas with fastmem, we abuse the host's address translation
basically, on x86, it does something like (assuming the address you want to load is in rcx, and data goes into rsi)

mov rdx, rcx
shr rdx, 12
mov rax, oops_we_hit_a_hw_address
mov rbx, qword [vtlb_vmap + rdx * 8]  ; lookup page in vtlb
test rbx, rbx ; check if it's a handler page
js C_dispatcher ; jump out to C if it's a handler page
and rcx, 4095 ; get page offset
mov esi, dword [rbx + rcx] ; load RAM
oops_we_hit_a_hw_address: ; jump back location for handlers

in contrast, that same load with fastmem would be (assuming address in rcx, data in rsi, fastmem base pointer in rbp)

mov esi, dword [rbp + rcx]

with the added benefit that you don't need to flush the regcache prior, since with the current behavior, you don't know if it's going to call to C or not (and with fastmem, you save it in the "slow" thunk)

Performance boost won't be huge yet, it can be 5-10% depending on the game, often <5%. But it enables further optimization with register caching in the future, and it all adds up.

Rationale behind Changes

Little bit of brr.

Suggested Testing Steps

I'll update when it's actually ready for testing/use.

@JordanTheToaster
Copy link
Contributor

JordanTheToaster commented Apr 4, 2022

Burnout 1 Burnout 3 Burnout Revenge and Burnout Dominator all fail to load with fastmem enabled.

@stenzek stenzek marked this pull request as draft April 6, 2022 12:47
@stenzek stenzek closed this Jul 8, 2022
@stenzek stenzek deleted the fastmem branch July 8, 2022 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants