New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jit64: subfx optimizations #9425
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Consider the case where d and a refer to the same PowerPC register, which is known to hold an immediate value by the RegCache. We place a ReadWrite constraint on this register and bind it to an x86 register. The RegCache then allocates a new register, initializes it with the immediate, and returns a RCX64Reg for both d and a. At this point information about the immediate value becomes unreachable. In the case of subfx, this generates suboptimal code: Before 1: BF 1E 00 00 00 mov edi,1Eh <- done by RegCache 8B C7 mov eax,edi 8B FE mov edi,esi 2B F8 sub edi,eax Before 2: BE 00 AC 3F 80 mov esi,803FAC00h <- done by RegCache 8B C6 mov eax,esi 8B 75 EC mov esi,dword ptr [rbp-14h] 2B F0 sub esi,eax The solution is to explicitly handle the constant a case before having the RegCache allocate registers for us. After 1: 8D 7E E2 lea edi,[rsi-1Eh] After 2: 8B 75 EC mov esi,dword ptr [rbp-14h] 81 EE 00 AC 3F 80 sub esi,803FAC00h
Occurs a bunch of times in Super Mario Sunshine. Before: 41 83 EE 00 sub r14d,0 After: Nothing!
Happens in Super Mario Sunshine. You could probably do something similar for b == -1 (like we do for subfic), but I couldn't find any titles that do this. - Case 1: d == a Before: 41 8B C7 mov eax,r15d 41 BF 00 00 00 00 mov r15d,0 44 2B F8 sub r15d,eax After: 41 F7 DF neg r15d - Case 2: d != a Before: BF 00 00 00 00 mov edi,0 41 2B FD sub edi,r13d After: 41 8B FD mov edi,r13d F7 DF neg edi
Soul Calibur II does this. Before: 2B F6 sub esi,esi After: Nothing!
lioncash
approved these changes
Jan 15, 2021
Sintendo
added a commit
to Sintendo/dolphin
that referenced
this pull request
Jan 22, 2021
This doesn't really add any new optimizations, but fixes an issue that prevented the optimizations introduced in dolphin-emu#8551 and dolphin-emu#8755 from being applied in specific cases. A similar issue was solved for subfx as part of dolphin-emu#9425. Consider the case where the destination register is also an input register and happens to hold an immediate value. This results in a set of constraints that forces the RegCache to allocate a register and move the immediate value into it for us. By the time we check for immediate values in the JIT, we're too late. We solve this by refactoring the code in such a way that we can check for immediates before involving the RegCache. - Example 1 Before: 41 BF 00 68 00 CC mov r15d,0CC006800h 44 03 FF add r15d,edi After: 44 8D BF 00 68 00 CC lea r15d,[rdi-33FF9800h] - Example 2 Before: 41 BE 00 00 00 00 mov r14d,0 44 03 F7 add r14d,edi After: 44 8B F7 mov r14d,edi - Example 3 Before: 41 BD 03 00 00 00 mov r13d,3 44 03 6D 8C add r13d,dword ptr [rbp-74h] After: 44 8B 6D 8C mov r13d,dword ptr [rbp-74h] 41 83 C5 03 add r13d,3
Merged
Sintendo
added a commit
to Sintendo/dolphin
that referenced
this pull request
Jan 22, 2021
This doesn't really add any new optimizations, but fixes an issue that prevented the optimizations introduced in dolphin-emu#8551 and dolphin-emu#8755 from being applied in specific cases. A similar issue was solved for subfx as part of dolphin-emu#9425. Consider the case where the destination register is also an input register and happens to hold an immediate value. This results in a set of constraints that forces the RegCache to allocate a register and move the immediate value into it for us. By the time we check for immediate values in the JIT, we're too late. We solve this by refactoring the code in such a way that we can check for immediates before involving the RegCache. - Example 1 Before: 41 BF 00 68 00 CC mov r15d,0CC006800h 44 03 FF add r15d,edi After: 44 8D BF 00 68 00 CC lea r15d,[rdi-33FF9800h] - Example 2 Before: 41 BE 00 00 00 00 mov r14d,0 44 03 F7 add r14d,edi After: 44 8B F7 mov r14d,edi - Example 3 Before: 41 BD 03 00 00 00 mov r13d,3 44 03 6D 8C add r13d,dword ptr [rbp-74h] After: 44 8B 6D 8C mov r13d,dword ptr [rbp-74h] 41 83 C5 03 add r13d,3
Sintendo
added a commit
to Sintendo/dolphin
that referenced
this pull request
Jan 26, 2021
This doesn't really add any new optimizations, but fixes an issue that prevented the optimizations introduced in dolphin-emu#8551 and dolphin-emu#8755 from being applied in specific cases. A similar issue was solved for subfx as part of dolphin-emu#9425. Consider the case where the destination register is also an input register and happens to hold an immediate value. This results in a set of constraints that forces the RegCache to allocate a register and move the immediate value into it for us. By the time we check for immediate values in the JIT, we're too late. We solve this by refactoring the code in such a way that we can check for immediates before involving the RegCache. - Example 1 Before: 41 BF 00 68 00 CC mov r15d,0CC006800h 44 03 FF add r15d,edi After: 44 8D BF 00 68 00 CC lea r15d,[rdi-33FF9800h] - Example 2 Before: 41 BE 00 00 00 00 mov r14d,0 44 03 F7 add r14d,edi After: 44 8B F7 mov r14d,edi - Example 3 Before: 41 BD 03 00 00 00 mov r13d,3 44 03 6D 8C add r13d,dword ptr [rbp-74h] After: 44 8B 6D 8C mov r13d,dword ptr [rbp-74h] 41 83 C5 03 add r13d,3
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Improved code generation for
subfxin various cases.d == a and is constant
Example 1
Before:
After:
Example 2
Before:
After:
a == 0
Example
Before:
After:
Nothing!
b == 0
Example 1 (d == a)
Before:
After:
Example 2 (d != a)
Before:
After:
a == b
Example
Before:
After:
Nothing, destination register is set to constant zero.