New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] PowerPC flags emulation optimization #527
Conversation
| { | ||
| s64 sign_extended = (s64)(s32)value; | ||
| u64 cr_val = (u64)sign_extended; | ||
| cr_val = (cr_val & ~(1ull << 61)) | ((u64)GetXER_SO() << 61); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@magumagu @Sonicadvance1 please port your respective JITs to this change by sending me a PR on delroth/dolphin:flags-opt (and post links to these PRs here so other people get notified). I'd like to get this merged in the next 2 weeks. |
Previously using the new "lower 8 bits" registers (SIL, SPL, ...) caused SETcc to write to other registers (for example, SETcc SIL would generate SETcc DH).
|
This is ready for a new round of review and I consider it mergeable. @JMC47 has done some fairly extensive testing and benchmarking, and this has gone through a round of forums testing too (which only showed that forum users can't test shit). |
|
Performance benchmarks: https://docs.google.com/spreadsheets/d/1kdmarUISpO2lfM87_8H5xrDGa8WSLccYAcwxKCOzC9Q/edit#gid=0 |
| MOV(8, M(&PowerPC::ppcState.cr_fast[0]), Imm8(0x4)); | ||
| else | ||
| MOV(8, M(&PowerPC::ppcState.cr_fast[0]), Imm8(0x2)); | ||
| // TODO(delroth): Moving a 32 bit immediate to the lower part of a 64 |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
PowerPC has a 32 bit CR register, which is used to store flags for results of
computations. Most instructions have an optional bit that tells the CPU whether
the flags should be updated. This 32 bit register actually contains 8 sets of 4
flags: Summary Overflow (SO), Equals (EQ), Greater Than (GT), Less Than (LT).
These 8 sets are usually called CR0-CR7 and accessed independently. In the most
common operations, the flags are computed from the result of the operation in
the following fashion:
* EQ is set iff result == 0
* LT is set iff result < 0
* GT is set iff result > 0
* (Dolphin does not emulate SO)
While X86 architectures have a similar concept of flags, it is very difficult
to access the FLAGS register directly to translate its value to an equivalent
PowerPC value. With the current Dolphin implementation, updating a PPC CR
register requires CPU branching, which has a few performance issues: it uses
space in the BTB, and in the worst case (!GT, !LT, EQ) requires 2 branches not
taken.
After some brainstorming on IRC about how this could be improved, calc84maniac
figured out a neat trick that makes common CR operations way more efficient to
JIT on 64 bit X86 architectures. It relies on emulating each CRn bitfield with
a 64 bit register internally, whose value is the result of the operation from
which flags are updated, sign extended to 64 bits. Then, checking if a CR bit
is set can be done in the following way:
* EQ is set iff LOWER_32_BITS(cr_64b_val) == 0
* GT is set iff (s64)cr_64b_val > 0
* LT is set iff bit 62 of cr_64b_val is set
To take a few examples, if the result of an operation is:
* -1 (0xFFFFFFFFFFFFFFFF) -> lower 32 bits not 0 => !EQ
-> (s64)val (-1) is not > 0 => !GT
-> bit 62 is set => LT
!EQ, !GT, LT
* 0 (0x0000000000000000) -> lower 32 bits are 0 => EQ
-> (s64)val (0) is not > 0 => !GT
-> bit 62 is not set => !LT
EQ, !GT, !LT
* 1 (0x0000000000000001) -> lower 32 bits not 0 => !EQ
-> (s64)val (1) is > 0 => GT
-> bit 62 is not set => !LT
!EQ, GT, !LT
Sometimes we need to convert PPC CR values to these 64 bit values. The
following convention is used in this case:
* Bit 0 (LSB) is set iff !EQ
* Bit 62 is set iff LT
* Bit 63 is set iff !GT
* Bit 32 always set to disambiguize between EQ and GT
Some more examples:
* !EQ, GT, LT -> 0x4000000100000001 (!B63, B62, B32, B0)
-> lower 32 bits not 0 => !EQ
-> (s64)val is > 0 => GT
-> bit 62 is set => LT
* EQ, GT, !LT -> 0x0000000100000000
-> lower 32 bits are 0 => EQ
-> (s64)val is > 0 (note: B32) => GT
-> bit 62 is not set => !LT
Due to how the new CR-flags work, it isn't possible without some hefty work in the JITIL backend to support this on 32bit systems.
|
Looks good to me. |
|
Code looks good, I just want to run a few benchmarks. |
|
Benchmarks. I'm getting about the same as JMC47 on my core 2 duo. |
[RFC] PowerPC flags emulation optimization
delroth@8140449 explains in details what this is about.
Fairly nice performance boost for CPU emulation. @JMC47 reports a 5-12% performance improvement in CPU limited situations.
Caveats: currently broken for JITIL64, JITARM and JITILARM. Since it needs to touch interpreter code, it's difficult to isolate the change from these 3 JITs, and they will have to be modified.