New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jit: Perform BAT lookup in dcbf/dcbi/dcbst #9957
Conversation
This reverts commit 66b992c. A new (additional) correctness issue was revealed in the old AArch64 code when applying it on top of modern JitArm64: LSR was being used when LSRV was intended. This commit uses LSRV.
When 66b992c fixed https://bugs.dolphin-emu.org/issues/12133, it did so by removing the broken address calculation entirely and always using the slow path. This caused a performance regression, https://bugs.dolphin-emu.org/issues/12477. This commit instead replaces the broken address calculation with a BAT lookup. If the BAT lookup succeeds, we can use the old fast path. Otherwise we use the slow path. Intends to improve https://bugs.dolphin-emu.org/issues/12477.
|
Tested the afflicted games and they're fixed. Happy Feet still works. |
| MOV(64, R(tmp), ImmPtr(GetBlockCache()->GetBlockBitSet())); | ||
| MOV(32, R(value), MComplex(tmp, value, SCALE_4, 0)); | ||
| SHR(32, R(addr), Imm8(5)); | ||
| BT(32, R(value), R(addr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thing I'm wondering here if this needs an extra AND(32, R(addr), 0x1F)? I suspect not as references online claim that bt does an implicit mod32 on the 2nd parameter, which seems logical, but curiously MSVC does emit this and in the bitset code, even in Release mode. Oh well, I think this is fine as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, x64 clears the upper bits of the register when you do a 32bit operation.
This is fine.
|
The x64 implementation looks fine to me, can't say much about the ARM one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JitArm64 changes look fine to me
|
This will improve android performance? |
| { | ||
| tmp = EncodeRegTo64(tmp); | ||
|
|
||
| MOVP2R(tmp, PowerPC::dbat_table.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is worth to place the BAT table somewhere within the PPC_State?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case we would need to have an addition instruction in here instead of the MOVP2R (since we wouldn't be putting the BAT table at the very beginning of PPCState). So we'd save maybe one instruction (if MOVP2R needs two instructions), which isn't all that much.
| MOVI2R(ARM64Reg::X2, 0); | ||
| MOVP2R(ARM64Reg::X3, &JitInterface::InvalidateICache); | ||
| BLR(ARM64Reg::X3); | ||
| MOVP2R(ARM64Reg::X8, &JitInterface::InvalidateICache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback now skips the AND(addr, addr, LogicalImm(~31, 32)); // mask sizeof cacheline call. Was this on purpose?
In the fast path, this is fine. But in the slow path, this might matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the called C++ function, and it seems like it ignores the last 5 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, Thanks a lot.
|
Very happy to see this go in! Thank you for taking care of the fallout of my earlier change. |
When PR #9314 fixed https://bugs.dolphin-emu.org/issues/12133, it did so by removing the broken address calculation entirely and always using the slow path. This caused a performance regression, https://bugs.dolphin-emu.org/issues/12477.
This pull request instead replaces the broken address calculation with a BAT lookup. If the BAT lookup succeeds, we can use the old fast path. Otherwise we use the slow path.
Intends to improve https://bugs.dolphin-emu.org/issues/12477.
@JMC47 Please test Happy Feet and some games with known performance problems.
Everyone else: I'm not entirely confident that I made no mistakes with the assembly, so don't let JMC merge this before it's been properly reviewed :)