Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit64: boolX constant optimizations #9481

Merged
merged 12 commits into from Sep 28, 2022

Conversation

Sintendo
Copy link
Member

Those who've seen some of my previous PRs (#9262 or #9425) probably know what to expect by now.

Major optimizations include precomputing the complement (andcx, orcx, and eqvx), handling special cases (0 and 0xFFFFFFFF), and preferring shorter instructions.

For those wondering, special case 0xFFFFFFFF is missing for OR because I couldn't find a game doing this. I also had some ideas for nandx and norx when Rc = 1, but I've never seen that happen so I didn't bother.


and

Precompute complement (andcx)

Before:

BF 00 01 00 00       mov         edi,100h
F7 D7                not         edi
41 23 FE             and         edi,r14d

After:

41 8B FE             mov         edi,r14d
81 E7 FF FE FF FF    and         edi,0FFFFFEFFh
Special case 0

Before:

45 8B F8             mov         r15d,r8d
41 83 E7 00          and         r15d,0

After:
Nothing, register is set to constant zero.

Special case 0xFFFFFFFF

Before:

41 8B F5             mov         esi,r13d
83 E6 FF             and         esi,0FFFFFFFFh

After:

41 8B F5             mov         esi,r13d
Size optimization

Before:

41 8B FE             mov         edi,r14d
81 E7 FF FE FF FF    and         edi,0FFFFFEFFh

After:

BF FF FE FF FF       mov         edi,0FFFFFEFFh
41 23 FE             and         edi,r14d

or

Precompute complement (orcx)

Before:

41 BE 04 00 00 00    mov         r14d,4
41 F7 D6             not         r14d
45 0B F5             or          r14d,r13d

After:

45 8B F5             mov         r14d,r13d
41 83 CE FB          or          r14d,0FFFFFFFBh
Special case 0 (Example 1)

Before:

41 BA 00 00 00 00    mov         r10d,0
45 0B D1             or          r10d,r9d

After:

45 8B D1             mov         r10d,r9d
Special case 0 (Example 2)

Before:

41 83 CA 00          or          r10d,0

After:
Nothing!

Size optimization

Before:

45 8B F5             mov         r14d,r13d
41 81 CE 00 80 01 00 or          r14d,18000h

After:

41 BE 00 80 01 00    mov         r14d,18000h
45 0B F5             or          r14d,r13d

xor

Precompute complement (eqvx)

Before:

45 8B EF             mov         r13d,r15d
41 F7 D5             not         r13d
41 83 F5 04          xor         r13d,4

After:

45 8B EF             mov         r13d,r15d
41 83 F5 FB          xor         r13d,0FFFFFFFBh
Special case 0

Before:

8B FE                mov         edi,esi
83 F7 00             xor         edi,0

After:

8B FE                mov         edi,esi
Special case 0xFFFFFFFF

Before:

45 8B F5             mov         r14d,r13d
41 83 F6 FF          xor         r14d,0FFFFFFFFh

After:

45 8B F5             mov         r14d,r13d
41 F7 D6             not         r14d
Size optimization

Before:

44 89 F7             mov         edi,r14d
81 F7 A0 52 57 01    xor         edi,15752A0h

After:

BF A0 52 57 01       mov         edi,15752A0h
41 33 FE             xor         edi,r14d

PowerPC instructions andcx and orcx complement the value of register b
before performing their respective bitwise operation. If this register
happens to contain a known value, we can precompute the complement,
allowing us to generate simpler code.

- andcx
Before:
BF 00 01 00 00       mov         edi,100h
F7 D7                not         edi
41 23 FE             and         edi,r14d

After:
41 8B FE             mov         edi,r14d
81 E7 FF FE FF FF    and         edi,0FFFFFEFFh

- orc
Before:
41 BE 04 00 00 00    mov         r14d,4
41 F7 D6             not         r14d
45 0B F5             or          r14d,r13d

After:
45 8B F5             mov         r14d,r13d
41 83 CE FB          or          r14d,0FFFFFFFBh
In the case of eqvx, the final complement can always be baked directly
into the immediate value.

Before:
45 8B EF             mov         r13d,r15d
41 F7 D5             not         r13d
41 83 F5 04          xor         r13d,4

After:
45 8B EF             mov         r13d,r15d
41 83 F5 FB          xor         r13d,0FFFFFFFBh
No computation necessary, but we may need a MOV.

Before:
8B FE                mov         edi,esi
83 F7 00             xor         edi,0

After:
8B FE                mov         edi,esi
Ever so slightly shorter.

When the condition register needs updating, we still prefer xor over
not+test.

Before:
45 8B F5             mov         r14d,r13d
41 83 F6 FF          xor         r14d,0FFFFFFFFh

After:
45 8B F5             mov         r14d,r13d
41 F7 D6             not         r14d
XOR allows for a more compact representation for constants that can be
represented by a signed 8-bit integer, while MOV does not. By letting
MOV handle the larger constants we can occasionally save a byte.

Before:
44 89 F7             mov         edi,r14d
81 F7 A0 52 57 01    xor         edi,15752A0h

After:
BF A0 52 57 01       mov         edi,15752A0h
41 33 FE             xor         edi,r14d
All cases involving immediate values are now guaranteed to be handled
elsewhere, making these checks redundant.
Bitwise and with zero is always zero.

Before:
45 8B F8             mov         r15d,r8d
41 83 E7 00          and         r15d,0

After:
Nothing, register a is set to constant 0.
Bitwise and with all ones doesn't accomplish much.

Before:
41 8B F5             mov         esi,r13d
83 E6 FF             and         esi,0FFFFFFFFh

After:
41 8B F5             mov         esi,r13d
AND allows for a more compact representation for constants that can be
represented by a signed 8-bit integer, while MOV does not. By letting
MOV handle the larger constants we can occasionally save a byte.

Before:
41 8B FE             mov         edi,r14d
81 E7 FF FE FF FF    and         edi,0FFFFFEFFh

After:
BF FF FE FF FF       mov         edi,0FFFFFEFFh
41 23 FE             and         edi,r14d
Bitwise or with zero is just a fancy MOV, really.

- Example 1
Before:
41 BA 00 00 00 00    mov         r10d,0
45 0B D1             or          r10d,r9d

After:
45 8B D1             mov         r10d,r9d

- Example 2
Before:
41 83 CA 00          or          r10d,0

After:
Nothing!
OR allows for a more compact representation for constants that can be
represented by a signed 8-bit integer, while MOV does not. By letting
MOV handle the larger constants we can occasionally save a byte.

Before:
45 8B F5             mov         r14d,r13d
41 81 CE 00 80 01 00 or          r14d,18000h

After:
41 BE 00 80 01 00    mov         r14d,18000h
45 0B F5             or          r14d,r13d
@MasterofGalaxies
Copy link

Yearly bump

Sintendo pushed a commit to Sintendo/dolphin that referenced this pull request Sep 25, 2022
A (partial) port of dolphin-emu#9481 to ARM64. This commit adds special cases for
immediate values equal to 0 or OxFFFFFFFF, allowing for more efficient
or no code to be generated.
Sintendo pushed a commit to Sintendo/dolphin that referenced this pull request Sep 25, 2022
A (partial) port of dolphin-emu#9481 to ARM64. This commit adds special cases for
immediate values equal to 0 or 0xFFFFFFFF, allowing for more efficient
or no code to be generated.
@MasterofGalaxies
Copy link

@Sintendo Is this PR still ready to merge?

@AdmiralCurtiss
Copy link
Contributor

Oh did we never merge this? I guess since we merged the ARM one this should be fine as well.

@AdmiralCurtiss AdmiralCurtiss merged commit dafe2c7 into dolphin-emu:master Sep 28, 2022
10 checks passed
@Sintendo Sintendo deleted the jit64boolx branch December 11, 2022 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants