JitArm64: Use LogicalImm in boolX #12060

Sintendo · 2023-07-21T13:26:36Z

ARM64 has a special logical immediate encoding scheme, that can be used with AND, ORR, and EOR. By taking advantage of this, we no longer need to materialize the immediate value in a register, saving instructions and/or reducing register pressure.

andx

Before:

mov    w26, #-0x80000000
and    w27, w27, w26
sxtw   x24, w27

After:

and    w27, w27, #0x80000000
sxtw   x26, w27

orx

Before:

mov    w23, #0x1
orr    w23, w25, w23

After:

orr    w23, w25, #0x1

norx

Before:

mov    w25, #-0x2001
orr    w23, w23, w25
mvn    w23, w23

After:

orr    w23, w23, #0xffffdfff
mvn    w23, w23

xorx

Before:

mov    w23, #0x1e
eor    w23, w27, w23

After:

eor    w23, w27, #0x1e

eqvx

Before:

mov    w23, #0x4
eon    w26, w23, w22

After:

eor    w26, w22, #0xfffffffb

This one has been removed.

~~andcx~~

Before:

mov w24, #-0x20 bic w27, w24, w26

After:

mvn w27, w26 and w27, w27, #0xffffffe0

JosJuice · 2023-07-21T15:46:06Z

Overall this is a good idea, but your example output for andcx is worse than before. While the instruction count is the same as before, the critical path becomes one cycle longer.

JosJuice · 2023-07-21T18:12:17Z

Now that I think about it, even if both the cycle counts and the critical path latency were the same, the old one that materializes the immediate value would be preferable. This is because if the register later has to be written back to the ppcState struct, it has to be materialized sooner or later anyway (unless it's 0). So: If you can save one instruction by not materializing the immediate, please do so (it helps in the cases where the register doesn't have to be written to ppcState), but otherwise you should keep materializing the immediate.

ARM64 has a special logical immediate encoding scheme, that can be used with AND, ORR, and EOR. By taking advantage of this, we no longer need to materialize the immediate value in a register, saving instructions and/or reducing register pressure. - orx Before: mov w23, #0x1 orr w23, w25, w23 After: orr w23, w25, #0x1 - andx Before: mov w26, #-0x80000000 and w27, w27, w26 sxtw x24, w27 After: and w27, w27, #0x80000000 sxtw x26, w27 - eqvx Before: mov w23, #0x4 eon w26, w23, w22 After: eor w26, w22, #0xfffffffb - xorx Before: mov w23, #0x1e eor w23, w27, w23 After: eor w23, w27, #0x1e - norx Before: mov w25, #-0x2001 orr w23, w23, w25 mvn w23, w23 After: orr w23, w23, #0xffffdfff mvn w23, w23

Sintendo · 2023-07-23T16:48:53Z

That's an excellent point. Indeed, in cases where the instruction sequence is equal in length, materializing the immediate in a register could allow subsequent uses to leech off of it. Effectively, this means we shouldn't do the optimization for andcx and orcx (except in cases where operand 2 is the immediate and we can precompute its complement). I have updated the PR accordingly.

JosJuice · 2023-07-23T17:39:56Z

Source/Core/Core/PowerPC/JitArm64/JitArm64_Integer.cpp

+          AND(gpr.R(a), gpr.R(j), log_imm);
+          if (final_not)
+            MVN(gpr.R(a), gpr.R(a));


In the final_not case, we could make use of de Morgan's laws and turn ~(s & b) into ~s | ~b. Since inverting the immediate has no runtime cost, this would let us replace the AND+MVN with ORN. But inverting the immediate after we already have log_imm seems like effort... So I'll leave it up to you if you want to try implementing this in this PR or not.

Sorry, forgot about this for a while.

Not a bad idea. You can even use this approach for any immediate, not just those that can be expressed as LogicalImm. But you still need to materialize the immediate somehow and that might take more than one MOV instruction, in which case using LogicalImm might still be preferable...

I should also note that I haven't seen a single game use nand with immediates. And the only game that I've seen use nor is Zelda Master Quest.

So given the complexity and how uncommon these instruction patterns are, I think it would be better to leave this for a follow-up PR for now, if that's alright with you.

Source/Core/Core/PowerPC/JitArm64/JitArm64_Integer.cpp

JosJuice · 2023-07-23T17:46:10Z

Source/Core/Core/PowerPC/JitArm64/JitArm64_Integer.cpp

+          gpr.BindToRegister(a, a == j);
+          ORR(gpr.R(a), gpr.R(j), log_imm);
+          if (final_not)
+            MVN(gpr.R(a), gpr.R(a));


Same here regarding de Morgan's laws.

JosJuice

I'm planning to merge this after the beta is out.

Sintendo force-pushed the arm64-bool-logimm branch from 3c574d8 to a871b10 Compare July 21, 2023 13:28

Sintendo force-pushed the arm64-bool-logimm branch from a871b10 to a486168 Compare July 23, 2023 16:41

JosJuice reviewed Jul 23, 2023

View reviewed changes

JosJuice approved these changes Aug 5, 2023

View reviewed changes

JosJuice merged commit d50494b into dolphin-emu:master Aug 13, 2023
11 checks passed

Sintendo deleted the arm64-bool-logimm branch August 14, 2023 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JitArm64: Use LogicalImm in boolX #12060

JitArm64: Use LogicalImm in boolX #12060

Sintendo commented Jul 21, 2023 •

edited

JosJuice commented Jul 21, 2023

JosJuice commented Jul 21, 2023

Sintendo commented Jul 23, 2023

JosJuice Jul 23, 2023

Sintendo Aug 5, 2023

JosJuice Jul 23, 2023

JosJuice left a comment

JitArm64: Use LogicalImm in boolX #12060

JitArm64: Use LogicalImm in boolX #12060

Conversation

Sintendo commented Jul 21, 2023 • edited

JosJuice commented Jul 21, 2023

JosJuice commented Jul 21, 2023

Sintendo commented Jul 23, 2023

JosJuice Jul 23, 2023

Choose a reason for hiding this comment

Sintendo Aug 5, 2023

Choose a reason for hiding this comment

JosJuice Jul 23, 2023

Choose a reason for hiding this comment

JosJuice left a comment

Choose a reason for hiding this comment

Sintendo commented Jul 21, 2023 •

edited