New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JitArm64: Optimize multiplication #11243
Conversation
Add a new function that will handle all the special cases regarding multiplication. It does nothing for now, but will be expanded in follow-up commits.
Multiplication by zero always gives zero. Before: 0x52800019 mov w25, #0x0 0x1b197f5b mul w27, w26, w25 After: Nothing!
Multiplication by one is also trivial. Depending on the registers involved, either a single MOV or no instructions will be generated. Before: 0x52800038 mov w24, #0x1 0x1b1a7f1b mul w27, w24, w26 After: 0x2a1a03fb mov w27, w26 Before: 0x52800039 mov w25, #0x1 0x1b1a7f3a mul w26, w25, w26 After: Nothing!
Turn multiplications by a power of two into bitshifts. Before: 0x52800817 mov w23, #0x40 0x1b167ef6 mul w22, w23, w22 After: 0x531a66d6 lsl w22, w22, dolphin-emu#6
By taking advantage of ARM64's ability to shift an input register by any amount, we can calculate multiplication by a number that is one more than a power of two with a single instruction. Before: 0x52800838 mov w24, #0x41 0x1b187f7b mul w27, w27, w24 After: 0x0b1b1b7b add w27, w27, w27, lsl dolphin-emu#6
If the destination register doesn't equal the input register, using it to temporarily hold the immediate value is fair game as it'll be overwritten with the result of the multiplication anyway. This can slightly reduce register pressure. Before: 0x52800659 mov w25, #0x32 0x1b197f5b mul w27, w26, w25 After: 0x5280065b mov w27, #0x32 0x1b1b7f5b mul w27, w26, w27
ARM64's flexible shifting of input registers also allows us to calculate a negative power of two in one instruction; shift the input of a NEG instruction. Before: 0x128001f7 mov w23, #-0x10 0x1b1a7efa mul w26, w23, w26 0x93407f58 sxtw x24, w26 After: 0x4b1a13fa neg w26, w26, lsl dolphin-emu#4 0x93407f58 sxtw x24, w26
Let's take advantage of ARM64's input register shifting one last time, shall we? Before: 0x1280005b mov w27, #-0x3 0x1b1b7f18 mul w24, w24, w27 After: 0x4b180b18 sub w24, w24, w24, lsl dolphin-emu#2
|
Added two more interesting cases. The -(2^n) case also covers -1, so explicit handling of that has been dropped. Multiplication by -(2^n)Before: After: Multiplication by -(2^n) + 1Before: After: |
|
How about the multiplication by -1 case? |
|
It's handled by the -(2^n) case now, no need to handle it separately. |
|
@JosJuice Can you re-check this? |
|
Still LGTM. But would I prefer to wait with merging this kind of change since we're so close to a beta. |
|
Well, since we pushed the beta... |
Optimize multiplication for various constants. We introduce a
MultiplyImmediatefunction, which contains the logic much like the one that exists for x86, and reuse it in bothmulliandmullwx.Also a minor register allocation improvement for
mulli.Multiplication by 0
Before:
After:
Multiplication by 1 (example 1)
Before:
After:
Multiplication by 1 (example 2)
Before:
After:
Multiplication by -1
Before:
After:
Multiplication by 2^n
Before:
After:
Multiplication by 2^n + 1
Before:
After:
mulli register allocation
Before:
After: