Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit: Perform BAT lookup in dcbf/dcbi/dcbst #9957

Merged
merged 3 commits into from Jul 31, 2021

Conversation

JosJuice
Copy link
Member

When PR #9314 fixed https://bugs.dolphin-emu.org/issues/12133, it did so by removing the broken address calculation entirely and always using the slow path. This caused a performance regression, https://bugs.dolphin-emu.org/issues/12477.

This pull request instead replaces the broken address calculation with a BAT lookup. If the BAT lookup succeeds, we can use the old fast path. Otherwise we use the slow path.

Intends to improve https://bugs.dolphin-emu.org/issues/12477.

@JMC47 Please test Happy Feet and some games with known performance problems.

Everyone else: I'm not entirely confident that I made no mistakes with the assembly, so don't let JMC merge this before it's been properly reviewed :)

This reverts commit 66b992c.

A new (additional) correctness issue was revealed in the old
AArch64 code when applying it on top of modern JitArm64:
LSR was being used when LSRV was intended. This commit uses LSRV.
When 66b992c fixed https://bugs.dolphin-emu.org/issues/12133,
it did so by removing the broken address calculation entirely and
always using the slow path. This caused a performance regression,
https://bugs.dolphin-emu.org/issues/12477.

This commit instead replaces the broken address calculation with
a BAT lookup. If the BAT lookup succeeds, we can use the old fast
path. Otherwise we use the slow path.

Intends to improve https://bugs.dolphin-emu.org/issues/12477.
@JMC47
Copy link
Contributor

JMC47 commented Jul 26, 2021

Tested the afflicted games and they're fixed. Happy Feet still works.

MOV(64, R(tmp), ImmPtr(GetBlockCache()->GetBlockBitSet()));
MOV(32, R(value), MComplex(tmp, value, SCALE_4, 0));
SHR(32, R(addr), Imm8(5));
BT(32, R(value), R(addr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing I'm wondering here if this needs an extra AND(32, R(addr), 0x1F)? I suspect not as references online claim that bt does an implicit mod32 on the 2nd parameter, which seems logical, but curiously MSVC does emit this and in the bitset code, even in Release mode. Oh well, I think this is fine as-is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, x64 clears the upper bits of the register when you do a 32bit operation.

This is fine.

@AdmiralCurtiss
Copy link
Contributor

The x64 implementation looks fine to me, can't say much about the ARM one.

Copy link
Member

@leoetlino leoetlino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JitArm64 changes look fine to me

@leoetlino leoetlino merged commit a208ff5 into dolphin-emu:master Jul 31, 2021
11 checks passed
@ghost
Copy link

ghost commented Jul 31, 2021

This will improve android performance?

@JosJuice JosJuice deleted the dcbx-faster branch July 31, 2021 08:32
{
tmp = EncodeRegTo64(tmp);

MOVP2R(tmp, PowerPC::dbat_table.data());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is worth to place the BAT table somewhere within the PPC_State?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we would need to have an addition instruction in here instead of the MOVP2R (since we wouldn't be putting the BAT table at the very beginning of PPCState). So we'd save maybe one instruction (if MOVP2R needs two instructions), which isn't all that much.

MOVI2R(ARM64Reg::X2, 0);
MOVP2R(ARM64Reg::X3, &JitInterface::InvalidateICache);
BLR(ARM64Reg::X3);
MOVP2R(ARM64Reg::X8, &JitInterface::InvalidateICache);
Copy link
Member

@degasus degasus Jul 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback now skips the AND(addr, addr, LogicalImm(~31, 32)); // mask sizeof cacheline call. Was this on purpose?
In the fast path, this is fine. But in the slow path, this might matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the called C++ function, and it seems like it ignores the last 5 bits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, Thanks a lot.

@smurf3tte
Copy link
Contributor

Very happy to see this go in! Thank you for taking care of the fallout of my earlier change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
8 participants