Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JitArm64: Optimize multiplication #11243

Merged
merged 11 commits into from Nov 15, 2022
Merged

Conversation

Sintendo
Copy link
Member

@Sintendo Sintendo commented Nov 1, 2022

Optimize multiplication for various constants. We introduce a MultiplyImmediate function, which contains the logic much like the one that exists for x86, and reuse it in both mulli and mullwx.

Also a minor register allocation improvement for mulli.


Multiplication by 0

Before:

0x52800019   mov    w25, #0x0
0x1b197f5b   mul    w27, w26, w25

After:

Multiplication by 1 (example 1)

Before:

0x52800038   mov    w24, #0x1
0x1b1a7f1b   mul    w27, w24, w26

After:

0x2a1a03fb   mov    w27, w26
Multiplication by 1 (example 2)

Before:

0x52800039   mov    w25, #0x1
0x1b1a7f3a   mul    w26, w25, w26

After:

Multiplication by -1

Before:

0x12800015   mov    w21, #-0x1
0x1b157f7b   mul    w27, w27, w21

After:

0x4b1b03fb   neg    w27, w27
Multiplication by 2^n

Before:

0x52800817   mov    w23, #0x40
0x1b167ef6   mul    w22, w23, w22

After:

0x531a66d6   lsl    w22, w22, #6
Multiplication by 2^n + 1

Before:

0x52800838   mov    w24, #0x41
0x1b187f7b   mul    w27, w27, w24

After:

0x0b1b1b7b   add    w27, w27, w27, lsl #6
mulli register allocation

Before:

0x52800659   mov    w25, #0x32
0x1b197f5b   mul    w27, w26, w25

After:

0x5280065b   mov    w27, #0x32
0x1b1b7f5b   mul    w27, w26, w27

Add a new function that will handle all the special cases regarding
multiplication. It does nothing for now, but will be expanded in
follow-up commits.
Multiplication by zero always gives zero.

Before:
0x52800019   mov    w25, #0x0
0x1b197f5b   mul    w27, w26, w25

After:
Nothing!
Multiplication by one is also trivial. Depending on the registers
involved, either a single MOV or no instructions will be generated.

Before:
0x52800038   mov    w24, #0x1
0x1b1a7f1b   mul    w27, w24, w26

After:
0x2a1a03fb   mov    w27, w26

Before:
0x52800039   mov    w25, #0x1
0x1b1a7f3a   mul    w26, w25, w26

After:
Nothing!
Turn multiplications by a power of two into bitshifts.

Before:
0x52800817   mov    w23, #0x40
0x1b167ef6   mul    w22, w23, w22

After:
0x531a66d6   lsl    w22, w22, dolphin-emu#6
By taking advantage of ARM64's ability to shift an input register by any
amount, we can calculate multiplication by a number that is one more
than a power of two with a single instruction.

Before:
0x52800838   mov    w24, #0x41
0x1b187f7b   mul    w27, w27, w24

After:
0x0b1b1b7b   add    w27, w27, w27, lsl dolphin-emu#6
If the destination register doesn't equal the input register, using it
to temporarily hold the immediate value is fair game as it'll be
overwritten with the result of the multiplication anyway. This can
slightly reduce register pressure.

Before:

0x52800659   mov    w25, #0x32
0x1b197f5b   mul    w27, w26, w25

After:
0x5280065b   mov    w27, #0x32
0x1b1b7f5b   mul    w27, w26, w27
ARM64's flexible shifting of input registers also allows us to calculate
a negative power of two in one instruction; shift the input of a NEG
instruction.

Before:
0x128001f7   mov    w23, #-0x10
0x1b1a7efa   mul    w26, w23, w26
0x93407f58   sxtw   x24, w26

After:
0x4b1a13fa   neg    w26, w26, lsl dolphin-emu#4
0x93407f58   sxtw   x24, w26
Let's take advantage of ARM64's input register shifting one last time,
shall we?

Before:
0x1280005b   mov    w27, #-0x3
0x1b1b7f18   mul    w24, w24, w27

After:
0x4b180b18   sub    w24, w24, w24, lsl dolphin-emu#2
@Sintendo
Copy link
Member Author

Sintendo commented Nov 2, 2022

Added two more interesting cases. The -(2^n) case also covers -1, so explicit handling of that has been dropped.


Multiplication by -(2^n)

Before:

0x128001f7   mov    w23, #-0x10
0x1b1a7efa   mul    w26, w23, w26
0x93407f58   sxtw   x24, w26

After:

0x4b1a13fa   neg    w26, w26, lsl #4
0x93407f58   sxtw   x24, w26
Multiplication by -(2^n) + 1

Before:

0x1280005b   mov    w27, #-0x3
0x1b1b7f18   mul    w24, w24, w27

After:

0x4b180b18   sub    w24, w24, w24, lsl #2

@Rumi-Larry
Copy link

How about the multiplication by -1 case?

@Sintendo
Copy link
Member Author

Sintendo commented Nov 4, 2022

It's handled by the -(2^n) case now, no need to handle it separately.

@AdmiralCurtiss
Copy link
Contributor

@JosJuice Can you re-check this?

@JosJuice
Copy link
Member

JosJuice commented Nov 6, 2022

Still LGTM. But would I prefer to wait with merging this kind of change since we're so close to a beta.

@AdmiralCurtiss
Copy link
Contributor

Well, since we pushed the beta...

@AdmiralCurtiss AdmiralCurtiss merged commit d7593dd into dolphin-emu:master Nov 15, 2022
11 checks passed
@Sintendo Sintendo deleted the arm64mul branch November 24, 2022 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants