New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Allocate 2 GiB of guard pages below fastmem area #11440
Conversation
Source/Core/Core/HW/Memmap.cpp
Outdated
| // If the resulting host address is backed by real memory, the memory access will simply work. | ||
| // If not, a segfault handler will backpatch the JIT code to instead call functions in MMU.cpp. | ||
| // This way most memory accesses will be super fast, and for the small number of memory accesses | ||
| // that need something special (like MMIO), we pay a performance penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be something like "we only pay a performance penalty for memory accesses that need something special"? (And also is this a one-time performance penalty? In particular, it's not clear if the penalty is the more complicated access, or the backpatching process.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the wording.
As for what "penalty" refers to, it's both really. If you interpret it as penalty compared to not having fastmem at all, the backpatching (or rather the segfaulting) is what costs time, but if you don't interpret it that way, I think having to call MMU.cpp is the part that ends up costing more time in the long run. But that's kind of a guess from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New wording looks fine to me.
Related... when I was looking at RS2, it seemed like it compiled a store to the constant address 0x7fdf8118 into a call to PowerPC::Write_U32 instead of using fastmem (while I think it used fastmem for the stores for the addresses after it, which were no longer considered constant). Is that call to PowerPC::Write_U32 faster than fastmem, or is this bad behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT will directly call Write_U32 in that situation if the address is known at compile-time and it's known to be something like MMIO, i.e. something that would always need backpatching if we were to try to use fastmem. Calling Write_U32 is much slower than successfully using fastmem, so if the JIT is unsure if the address might work without backpatching (or if it knows it will work without backpatching), it will try to use fastmem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is backpatching needed for writes that would trigger the game's exception handler (on MMU games), or just for writes to MMIOs (and possibly the write gather pipe)? I feel like it shouldn't be needed for the exception handler as even if it triggers the exception handler one time, it won't necessarily trigger it next time, so it wouldn't be useful to change behavior, but I don't really know how it's implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling an MMU.cpp function (like Write_U32) is required if the access would trigger the game's exception handler, and currently, the only way we have of doing that is to backpatch. It would theoretically be possible to call the function directly from the segfault handler instead of backpatching, but then we would need some kind of heuristic for how many times we want to keep segfaulting before we actually backpatch, since we don't want to go without backpatching if it's a memory access the game does often. Personally I'm not sure if it's worth it, but if it's something you feel like looking into, you could.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, that makes the existing behavior a bit more sensible. I think it might be better to not default to calling Write_U32 in this specific case (since it won't always need to trigger an exception) and then allow it to be backpatched after an exception occurs, so that behavior is the same for constant addresses and non-constant addresses (as long as the address doesn't look like an MMIO). Though I'm not sure whether backpatching would maintain knowledge of it being a constant address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, seems like I was mistaken earlier. When JitArm64 is unsure if a constant address will need backpatching or not, it will try fastmem, but Jit64 will actually go straight for calling MMU.cpp, which seems a bit suboptimal. See EmuCodeBlock::WriteToConstAddress.
See the comment added by this commit. We were previously guarding against overshooting in address calculations, but not against undershooting. Perhaps someone assumed that the displacement of an x86 loadstore was treated as unsigned? Note: While the comment says we can undershoot by up to 2 GiB, in practice Jit64 as it currently behaves won't actually undershoot by more than 0x8000 if my analysis is correct. But address space is cheap, so let's guard the full 2 GiB.
21107c9
to
4fa9fa9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
See the comment added by this pull request. We were previously guarding against overshooting in address calculations, but not against undershooting. Perhaps someone assumed that the displacement of an x86 loadstore was treated as unsigned?
Note: While the comment says we can undershoot by up to 2 GiB, in practice Jit64 as it currently behaves won't actually undershoot by more than 0x8000 if my analysis is correct. But address space is cheap, so let's guard the full 2 GiB.