Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JitArm64: Use LogicalImm in boolX #12060

Merged
merged 1 commit into from Aug 13, 2023

Conversation

Sintendo
Copy link
Member

@Sintendo Sintendo commented Jul 21, 2023

ARM64 has a special logical immediate encoding scheme, that can be used with AND, ORR, and EOR. By taking advantage of this, we no longer need to materialize the immediate value in a register, saving instructions and/or reducing register pressure.


andx

Before:

mov    w26, #-0x80000000
and    w27, w27, w26
sxtw   x24, w27

After:

and    w27, w27, #0x80000000
sxtw   x26, w27
orx

Before:

mov    w23, #0x1
orr    w23, w25, w23

After:

orr    w23, w25, #0x1
norx

Before:

mov    w25, #-0x2001
orr    w23, w23, w25
mvn    w23, w23

After:

orr    w23, w23, #0xffffdfff
mvn    w23, w23
xorx

Before:

mov    w23, #0x1e
eor    w23, w27, w23

After:

eor    w23, w27, #0x1e
eqvx

Before:

mov    w23, #0x4
eon    w26, w23, w22

After:

eor    w26, w22, #0xfffffffb

This one has been removed.

andcx

Before:

mov    w24, #-0x20
bic    w27, w24, w26

After:

mvn    w27, w26
and    w27, w27, #0xffffffe0

@JosJuice
Copy link
Member

Overall this is a good idea, but your example output for andcx is worse than before. While the instruction count is the same as before, the critical path becomes one cycle longer.

@JosJuice
Copy link
Member

Now that I think about it, even if both the cycle counts and the critical path latency were the same, the old one that materializes the immediate value would be preferable. This is because if the register later has to be written back to the ppcState struct, it has to be materialized sooner or later anyway (unless it's 0). So: If you can save one instruction by not materializing the immediate, please do so (it helps in the cases where the register doesn't have to be written to ppcState), but otherwise you should keep materializing the immediate.

ARM64 has a special logical immediate encoding scheme, that can be used
with AND, ORR, and EOR. By taking advantage of this, we no longer need
to materialize the immediate value in a register, saving instructions
and/or reducing register pressure.

- orx

Before:
mov    w23, #0x1
orr    w23, w25, w23

After:
orr    w23, w25, #0x1

- andx

Before:
mov    w26, #-0x80000000
and    w27, w27, w26
sxtw   x24, w27

After:
and    w27, w27, #0x80000000
sxtw   x26, w27

- eqvx

Before:
mov    w23, #0x4
eon    w26, w23, w22

After:
eor    w26, w22, #0xfffffffb

- xorx

Before:
mov    w23, #0x1e
eor    w23, w27, w23

After:
eor    w23, w27, #0x1e

- norx

Before:
mov    w25, #-0x2001
orr    w23, w23, w25
mvn    w23, w23

After:
orr    w23, w23, #0xffffdfff
mvn    w23, w23
@Sintendo
Copy link
Member Author

That's an excellent point. Indeed, in cases where the instruction sequence is equal in length, materializing the immediate in a register could allow subsequent uses to leech off of it. Effectively, this means we shouldn't do the optimization for andcx and orcx (except in cases where operand 2 is the immediate and we can precompute its complement). I have updated the PR accordingly.

Comment on lines +416 to +418
AND(gpr.R(a), gpr.R(j), log_imm);
if (final_not)
MVN(gpr.R(a), gpr.R(a));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the final_not case, we could make use of de Morgan's laws and turn ~(s & b) into ~s | ~b. Since inverting the immediate has no runtime cost, this would let us replace the AND+MVN with ORN. But inverting the immediate after we already have log_imm seems like effort... So I'll leave it up to you if you want to try implementing this in this PR or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot about this for a while.

Not a bad idea. You can even use this approach for any immediate, not just those that can be expressed as LogicalImm. But you still need to materialize the immediate somehow and that might take more than one MOV instruction, in which case using LogicalImm might still be preferable...

I should also note that I haven't seen a single game use nand with immediates. And the only game that I've seen use nor is Zelda Master Quest.

So given the complexity and how uncommon these instruction patterns are, I think it would be better to leave this for a follow-up PR for now, if that's alright with you.

gpr.BindToRegister(a, a == j);
ORR(gpr.R(a), gpr.R(j), log_imm);
if (final_not)
MVN(gpr.R(a), gpr.R(a));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here regarding de Morgan's laws.

Copy link
Member

@JosJuice JosJuice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to merge this after the beta is out.

@JosJuice JosJuice merged commit d50494b into dolphin-emu:master Aug 13, 2023
11 checks passed
@Sintendo Sintendo deleted the arm64-bool-logimm branch August 14, 2023 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants