New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jit64: boolX constant optimizations #9481
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PowerPC instructions andcx and orcx complement the value of register b before performing their respective bitwise operation. If this register happens to contain a known value, we can precompute the complement, allowing us to generate simpler code. - andcx Before: BF 00 01 00 00 mov edi,100h F7 D7 not edi 41 23 FE and edi,r14d After: 41 8B FE mov edi,r14d 81 E7 FF FE FF FF and edi,0FFFFFEFFh - orc Before: 41 BE 04 00 00 00 mov r14d,4 41 F7 D6 not r14d 45 0B F5 or r14d,r13d After: 45 8B F5 mov r14d,r13d 41 83 CE FB or r14d,0FFFFFFFBh
In the case of eqvx, the final complement can always be baked directly into the immediate value. Before: 45 8B EF mov r13d,r15d 41 F7 D5 not r13d 41 83 F5 04 xor r13d,4 After: 45 8B EF mov r13d,r15d 41 83 F5 FB xor r13d,0FFFFFFFBh
No computation necessary, but we may need a MOV. Before: 8B FE mov edi,esi 83 F7 00 xor edi,0 After: 8B FE mov edi,esi
Ever so slightly shorter. When the condition register needs updating, we still prefer xor over not+test. Before: 45 8B F5 mov r14d,r13d 41 83 F6 FF xor r14d,0FFFFFFFFh After: 45 8B F5 mov r14d,r13d 41 F7 D6 not r14d
XOR allows for a more compact representation for constants that can be represented by a signed 8-bit integer, while MOV does not. By letting MOV handle the larger constants we can occasionally save a byte. Before: 44 89 F7 mov edi,r14d 81 F7 A0 52 57 01 xor edi,15752A0h After: BF A0 52 57 01 mov edi,15752A0h 41 33 FE xor edi,r14d
All cases involving immediate values are now guaranteed to be handled elsewhere, making these checks redundant.
Bitwise and with zero is always zero. Before: 45 8B F8 mov r15d,r8d 41 83 E7 00 and r15d,0 After: Nothing, register a is set to constant 0.
Bitwise and with all ones doesn't accomplish much. Before: 41 8B F5 mov esi,r13d 83 E6 FF and esi,0FFFFFFFFh After: 41 8B F5 mov esi,r13d
AND allows for a more compact representation for constants that can be represented by a signed 8-bit integer, while MOV does not. By letting MOV handle the larger constants we can occasionally save a byte. Before: 41 8B FE mov edi,r14d 81 E7 FF FE FF FF and edi,0FFFFFEFFh After: BF FF FE FF FF mov edi,0FFFFFEFFh 41 23 FE and edi,r14d
Bitwise or with zero is just a fancy MOV, really. - Example 1 Before: 41 BA 00 00 00 00 mov r10d,0 45 0B D1 or r10d,r9d After: 45 8B D1 mov r10d,r9d - Example 2 Before: 41 83 CA 00 or r10d,0 After: Nothing!
OR allows for a more compact representation for constants that can be represented by a signed 8-bit integer, while MOV does not. By letting MOV handle the larger constants we can occasionally save a byte. Before: 45 8B F5 mov r14d,r13d 41 81 CE 00 80 01 00 or r14d,18000h After: 41 BE 00 80 01 00 mov r14d,18000h 45 0B F5 or r14d,r13d
lioncash
reviewed
Jan 28, 2021
|
Yearly bump |
Sintendo
pushed a commit
to Sintendo/dolphin
that referenced
this pull request
Sep 25, 2022
A (partial) port of dolphin-emu#9481 to ARM64. This commit adds special cases for immediate values equal to 0 or OxFFFFFFFF, allowing for more efficient or no code to be generated.
Sintendo
pushed a commit
to Sintendo/dolphin
that referenced
this pull request
Sep 25, 2022
A (partial) port of dolphin-emu#9481 to ARM64. This commit adds special cases for immediate values equal to 0 or 0xFFFFFFFF, allowing for more efficient or no code to be generated.
|
@Sintendo Is this PR still ready to merge? |
|
Oh did we never merge this? I guess since we merged the ARM one this should be fine as well. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Those who've seen some of my previous PRs (#9262 or #9425) probably know what to expect by now.
Major optimizations include precomputing the complement (
andcx,orcx, andeqvx), handling special cases (0 and0xFFFFFFFF), and preferring shorter instructions.For those wondering, special case
0xFFFFFFFFis missing for OR because I couldn't find a game doing this. I also had some ideas fornandxandnorxwhen Rc = 1, but I've never seen that happen so I didn't bother.and
Precompute complement (andcx)
Before:
After:
Special case 0
Before:
After:
Nothing, register is set to constant zero.
Special case 0xFFFFFFFF
Before:
After:
Size optimization
Before:
After:
or
Precompute complement (orcx)
Before:
After:
Special case 0 (Example 1)
Before:
After:
Special case 0 (Example 2)
Before:
After:
Nothing!
Size optimization
Before:
After:
xor
Precompute complement (eqvx)
Before:
After:
Special case 0
Before:
After:
Special case 0xFFFFFFFF
Before:
After:
Size optimization
Before:
After: