Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JitArm64: Optimize cmp #11242

Merged
merged 6 commits into from Nov 4, 2022
Merged

JitArm64: Optimize cmp #11242

merged 6 commits into from Nov 4, 2022

Conversation

Sintendo
Copy link
Member

@Sintendo Sintendo commented Nov 1, 2022

Optimize generated code for cmp/cmpl, mostly by introducing handling for specific constant values.


cmpl

a == 0

Before:

0x52800019   mov    w25, #0x0
0xb94087b6   ldr    w22, [x29, #0x84]
0xcb16033b   sub    x27, x25, x22

After:

0xb94087b9   ldr    w25, [x29, #0x84]
0xcb1903fb   neg    x27, x25

cmp

a == 0

Before:

0x52800016   mov    w22, #0x0
0xb94093b5   ldr    w21, [x29, #0x90]
0x93407ed7   sxtw   x23, w22
0x93407eb9   sxtw   x25, w21
0xcb1902f9   sub    x25, x23, x25

After:

0xb94093b7   ldr    w23, [x29, #0x90]
0x4b1703f9   neg    w25, w23
0x93407f39   sxtw   x25, w25
a == -1

Before:

0x12800015   mov    w21, #-0x1
0x93407eb9   sxtw   x25, w21
0x93407ef8   sxtw   x24, w23
0xcb180338   sub    x24, x25, x24

After:

0x2a3703f8   mvn    w24, w23
0x93407f18   sxtw   x24, w24
general case

Before:

0x93407f59   sxtw   x25, w26
0x93407ebb   sxtw   x27, w21
0xcb1b033b   sub    x27, x25, x27

After:

0x93407f5b   sxtw   x27, w26
0xcb35c37b   sub    x27, x27, w21, sxtw

By explicitly handling this, we can avoid materializing zero in a
register.

Before:
0x52800019   mov    w25, #0x0
0xb94087b6   ldr    w22, [x29, #0x84]
0xcb16033b   sub    x27, x25, x22

After:
0xb94087b9   ldr    w25, [x29, #0x84]
0xcb1903fb   neg    x27, x25
By explicitly handling this, we can avoid materializing zero in a
register and generate more efficient code altogether.

Before:
0x52800016   mov    w22, #0x0
0xb94093b5   ldr    w21, [x29, #0x90]
0x93407ed7   sxtw   x23, w22
0x93407eb9   sxtw   x25, w21
0xcb1902f9   sub    x25, x23, x25

After:
0xb94093b7   ldr    w23, [x29, #0x90]
0x4b1703f9   neg    w25, w23
0x93407f39   sxtw   x25, w25
By explicitly handling this, we can avoid materializing -1 in a
register and generate more efficient code by taking advantage of -x ==
~x + 1.

Before:
0x12800015   mov    w21, #-0x1
0x93407eb9   sxtw   x25, w21
0x93407ef8   sxtw   x24, w23
0xcb180338   sub    x24, x25, x24

After:
0x2a3703f8   mvn    w24, w23
0x93407f18   sxtw   x24, w24
ARM64 can do perform various types of sign and zero extension on a
register value before using it. The Arm64Emitter already had support for
this, but it was kinda hidden away.

This commit exposes the functionality by making the ExtendSpecifier enum
available everywhere and adding a new ArithOption constructor.
We can merge an SXTW with the SUB, eliminating one instruction. In
addition, it is no longer necessary to allocate a temporary register,
reducing register pressure.

Before:
0x93407f59   sxtw   x25, w26
0x93407ebb   sxtw   x27, w21
0xcb1b033b   sub    x27, x25, x27

After:
0x93407f5b   sxtw   x27, w26
0xcb35c37b   sub    x27, x27, w21, sxtw
@AdmiralCurtiss AdmiralCurtiss merged commit 8b4e315 into dolphin-emu:master Nov 4, 2022
11 checks passed
@Sintendo Sintendo deleted the arm64cmp branch November 5, 2022 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants