Please sign in to comment.
Optimize PPC CR emulation by using magic 64 bit values
PowerPC has a 32 bit CR register, which is used to store flags for results of computations. Most instructions have an optional bit that tells the CPU whether the flags should be updated. This 32 bit register actually contains 8 sets of 4 flags: Summary Overflow (SO), Equals (EQ), Greater Than (GT), Less Than (LT). These 8 sets are usually called CR0-CR7 and accessed independently. In the most common operations, the flags are computed from the result of the operation in the following fashion: * EQ is set iff result == 0 * LT is set iff result < 0 * GT is set iff result > 0 * (Dolphin does not emulate SO) While X86 architectures have a similar concept of flags, it is very difficult to access the FLAGS register directly to translate its value to an equivalent PowerPC value. With the current Dolphin implementation, updating a PPC CR register requires CPU branching, which has a few performance issues: it uses space in the BTB, and in the worst case (!GT, !LT, EQ) requires 2 branches not taken. After some brainstorming on IRC about how this could be improved, calc84maniac figured out a neat trick that makes common CR operations way more efficient to JIT on 64 bit X86 architectures. It relies on emulating each CRn bitfield with a 64 bit register internally, whose value is the result of the operation from which flags are updated, sign extended to 64 bits. Then, checking if a CR bit is set can be done in the following way: * EQ is set iff LOWER_32_BITS(cr_64b_val) == 0 * GT is set iff (s64)cr_64b_val > 0 * LT is set iff bit 62 of cr_64b_val is set To take a few examples, if the result of an operation is: * -1 (0xFFFFFFFFFFFFFFFF) -> lower 32 bits not 0 => !EQ -> (s64)val (-1) is not > 0 => !GT -> bit 62 is set => LT !EQ, !GT, LT * 0 (0x0000000000000000) -> lower 32 bits are 0 => EQ -> (s64)val (0) is not > 0 => !GT -> bit 62 is not set => !LT EQ, !GT, !LT * 1 (0x0000000000000001) -> lower 32 bits not 0 => !EQ -> (s64)val (1) is > 0 => GT -> bit 62 is not set => !LT !EQ, GT, !LT Sometimes we need to convert PPC CR values to these 64 bit values. The following convention is used in this case: * Bit 0 (LSB) is set iff !EQ * Bit 62 is set iff LT * Bit 63 is set iff !GT * Bit 32 always set to disambiguize between EQ and GT Some more examples: * !EQ, GT, LT -> 0x4000000100000001 (!B63, B62, B32, B0) -> lower 32 bits not 0 => !EQ -> (s64)val is > 0 => GT -> bit 62 is set => LT * EQ, GT, !LT -> 0x0000000100000000 -> lower 32 bits are 0 => EQ -> (s64)val is > 0 (note: B32) => GT -> bit 62 is not set => !LT
- Loading branch information...
Showing with 452 additions and 381 deletions.
- +1 −1 Source/Core/Core/PowerPC/Interpreter/Interpreter.cpp
- +13 −23 Source/Core/Core/PowerPC/Interpreter/Interpreter_Integer.cpp
- +2 −4 Source/Core/Core/PowerPC/Jit64/Jit.cpp
- +10 −0 Source/Core/Core/PowerPC/Jit64/Jit.h
- +6 −18 Source/Core/Core/PowerPC/Jit64/Jit_Branch.cpp
- +16 −8 Source/Core/Core/PowerPC/Jit64/Jit_FloatingPoint.cpp
- +89 −265 Source/Core/Core/PowerPC/Jit64/Jit_Integer.cpp
- +253 −43 Source/Core/Core/PowerPC/Jit64/Jit_SystemRegisters.cpp
- +4 −2 Source/Core/Core/PowerPC/Jit64IL/IR_X86.cpp
- +3 −3 Source/Core/Core/PowerPC/Jit64IL/JitIL.cpp
- +6 −5 Source/Core/Core/PowerPC/PowerPC.cpp
- +49 −9 Source/Core/Core/PowerPC/PowerPC.h
Oops, something went wrong.