Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit64: subfx optimizations #9425

Merged
merged 5 commits into from Jan 15, 2021
Merged

Jit64: subfx optimizations #9425

merged 5 commits into from Jan 15, 2021

Conversation

Sintendo
Copy link
Member

@Sintendo Sintendo commented Jan 5, 2021

Improved code generation for subfx in various cases.


d == a and is constant

Example 1

Before:

BF 1E 00 00 00       mov         edi,1Eh
8B C7                mov         eax,edi
8B FE                mov         edi,esi
2B F8                sub         edi,eax

After:

8D 7E E2             lea         edi,[rsi-1Eh]
Example 2

Before:

BE 00 AC 3F 80       mov         esi,803FAC00h
8B C6                mov         eax,esi
8B 75 EC             mov         esi,dword ptr [rbp-14h]
2B F0                sub         esi,eax

After:

8B 75 EC             mov         esi,dword ptr [rbp-14h]
81 EE 00 AC 3F 80    sub         esi,803FAC00h

a == 0

Example

Before:

41 83 EE 00          sub         r14d,0

After:
Nothing!


b == 0

Example 1 (d == a)

Before:

41 8B C7             mov         eax,r15d
41 BF 00 00 00 00    mov         r15d,0
44 2B F8             sub         r15d,eax

After:

41 F7 DF             neg         r15d
Example 2 (d != a)

Before:

BF 00 00 00 00       mov         edi,0
41 2B FD             sub         edi,r13d

After:

41 8B FD             mov         edi,r13d
F7 DF                neg         edi

a == b

Example

Before:

2B F6                sub         esi,esi

After:
Nothing, destination register is set to constant zero.

Consider the case where d and a refer to the same PowerPC register,
which is known to hold an immediate value by the RegCache. We place a
ReadWrite constraint on this register and bind it to an x86 register.
The RegCache then allocates a new register, initializes it with the
immediate, and returns a RCX64Reg for both d and a.

At this point information about the immediate value becomes unreachable.
In the case of subfx, this generates suboptimal code:

Before 1:
BF 1E 00 00 00       mov         edi,1Eh       <- done by RegCache
8B C7                mov         eax,edi
8B FE                mov         edi,esi
2B F8                sub         edi,eax

Before 2:
BE 00 AC 3F 80       mov         esi,803FAC00h <- done by RegCache
8B C6                mov         eax,esi
8B 75 EC             mov         esi,dword ptr [rbp-14h]
2B F0                sub         esi,eax

The solution is to explicitly handle the constant a case before having
the RegCache allocate registers for us.

After 1:
8D 7E E2             lea         edi,[rsi-1Eh]

After 2:
8B 75 EC             mov         esi,dword ptr [rbp-14h]
81 EE 00 AC 3F 80    sub         esi,803FAC00h
Occurs a bunch of times in Super Mario Sunshine.

Before:
41 83 EE 00          sub         r14d,0

After:
Nothing!
Happens in Super Mario Sunshine. You could probably do something similar
for b == -1 (like we do for subfic), but I couldn't find any titles that
do this.

- Case 1: d == a

Before:
41 8B C7             mov         eax,r15d
41 BF 00 00 00 00    mov         r15d,0
44 2B F8             sub         r15d,eax

After:
41 F7 DF             neg         r15d

- Case 2: d != a

Before:
BF 00 00 00 00       mov         edi,0
41 2B FD             sub         edi,r13d

After:
41 8B FD             mov         edi,r13d
F7 DF                neg         edi
Soul Calibur II does this.

Before:
2B F6                sub         esi,esi

After:
Nothing!
@lioncash lioncash merged commit 0c2bc35 into dolphin-emu:master Jan 15, 2021
10 checks passed
@Sintendo Sintendo deleted the jit64subfx branch January 15, 2021 07:04
Sintendo added a commit to Sintendo/dolphin that referenced this pull request Jan 22, 2021
This doesn't really add any new optimizations, but fixes an issue that
prevented the optimizations introduced in dolphin-emu#8551 and dolphin-emu#8755 from being
applied in specific cases. A similar issue was solved for subfx as part
of dolphin-emu#9425.

Consider the case where the destination register is also an input
register and happens to hold an immediate value. This results in a set
of constraints that forces the RegCache to allocate a register and move
the immediate value into it for us. By the time we check for immediate
values in the JIT, we're too late.

We solve this by refactoring the code in such a way that we can check
for immediates before involving the RegCache.

- Example 1
Before:
41 BF 00 68 00 CC    mov         r15d,0CC006800h
44 03 FF             add         r15d,edi

After:
44 8D BF 00 68 00 CC lea         r15d,[rdi-33FF9800h]

- Example 2
Before:
41 BE 00 00 00 00    mov         r14d,0
44 03 F7             add         r14d,edi

After:
44 8B F7             mov         r14d,edi

- Example 3
Before:
41 BD 03 00 00 00    mov         r13d,3
44 03 6D 8C          add         r13d,dword ptr [rbp-74h]

After:
44 8B 6D 8C          mov         r13d,dword ptr [rbp-74h]
41 83 C5 03          add         r13d,3
@Sintendo Sintendo mentioned this pull request Jan 22, 2021
Sintendo added a commit to Sintendo/dolphin that referenced this pull request Jan 22, 2021
This doesn't really add any new optimizations, but fixes an issue that
prevented the optimizations introduced in dolphin-emu#8551 and dolphin-emu#8755 from being
applied in specific cases. A similar issue was solved for subfx as part
of dolphin-emu#9425.

Consider the case where the destination register is also an input
register and happens to hold an immediate value. This results in a set
of constraints that forces the RegCache to allocate a register and move
the immediate value into it for us. By the time we check for immediate
values in the JIT, we're too late.

We solve this by refactoring the code in such a way that we can check
for immediates before involving the RegCache.

- Example 1
Before:
41 BF 00 68 00 CC    mov         r15d,0CC006800h
44 03 FF             add         r15d,edi

After:
44 8D BF 00 68 00 CC lea         r15d,[rdi-33FF9800h]

- Example 2
Before:
41 BE 00 00 00 00    mov         r14d,0
44 03 F7             add         r14d,edi

After:
44 8B F7             mov         r14d,edi

- Example 3
Before:
41 BD 03 00 00 00    mov         r13d,3
44 03 6D 8C          add         r13d,dword ptr [rbp-74h]

After:
44 8B 6D 8C          mov         r13d,dword ptr [rbp-74h]
41 83 C5 03          add         r13d,3
Sintendo added a commit to Sintendo/dolphin that referenced this pull request Jan 26, 2021
This doesn't really add any new optimizations, but fixes an issue that
prevented the optimizations introduced in dolphin-emu#8551 and dolphin-emu#8755 from being
applied in specific cases. A similar issue was solved for subfx as part
of dolphin-emu#9425.

Consider the case where the destination register is also an input
register and happens to hold an immediate value. This results in a set
of constraints that forces the RegCache to allocate a register and move
the immediate value into it for us. By the time we check for immediate
values in the JIT, we're too late.

We solve this by refactoring the code in such a way that we can check
for immediates before involving the RegCache.

- Example 1
Before:
41 BF 00 68 00 CC    mov         r15d,0CC006800h
44 03 FF             add         r15d,edi

After:
44 8D BF 00 68 00 CC lea         r15d,[rdi-33FF9800h]

- Example 2
Before:
41 BE 00 00 00 00    mov         r14d,0
44 03 F7             add         r14d,edi

After:
44 8B F7             mov         r14d,edi

- Example 3
Before:
41 BD 03 00 00 00    mov         r13d,3
44 03 6D 8C          add         r13d,dword ptr [rbp-74h]

After:
44 8B 6D 8C          mov         r13d,dword ptr [rbp-74h]
41 83 C5 03          add         r13d,3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants