JIT64: optimize CA calculations #852

FioraAeterna · 2014-08-22T04:04:27Z

JIT64: optimize CA calculations

Omit carry calculations that get overwritten later in the block before they're used. Very common in the case of srawix and friends.

I previously had a patch to do PS1 optimizations too here but I dropped it for now because it was becoming rather scary and messy (and still broken).

Please only review the last patch in this PR; the others are just patches that it depends on (the FPRF branch and integeropts branch).

Source/Core/Core/PowerPC/Interpreter/Interpreter_Tables.cpp

@@ -34,7 +34,7 @@ static GekkoOPTemplate primarytable[] =
 	{10, Interpreter::cmpli,        {"cmpli",    OPTYPE_INTEGER, FL_IN_A | FL_SET_CRn, 1, 0, 0, 0}},
 	{11, Interpreter::cmpi,         {"cmpi",     OPTYPE_INTEGER, FL_IN_A | FL_SET_CRn, 1, 0, 0, 0}},
 	{12, Interpreter::addic,        {"addic",    OPTYPE_INTEGER, FL_OUT_D | FL_IN_A | FL_SET_CA, 1, 0, 0, 0}},
-	{13, Interpreter::addic_rc,     {"addic_rc", OPTYPE_INTEGER, FL_OUT_D | FL_IN_A | FL_SET_CR0, 1, 0, 0, 0}},
+	{ 13, Interpreter::addic_rc, { "addic_rc", OPTYPE_INTEGER, FL_OUT_D | FL_IN_A | FL_SET_CA | FL_SET_CR0, 1, 0, 0, 0 } },


Also remove unused pow2/pow2f functions.

Factor out common code and handle a few more common cases.

Take advantage of movzx as a replacement for anding with 0xff or 0xffff, and abuse loads from the register cache to save ops.

Use TEST instead of CMP if we're comparing against 0 (rather common), and optimize the case of immediate compares further.

Also remove some comments that no longer apply since x86_32 was dropped.

Register B gets immediately moved into the shift register, so even if a == b it doesn't need to be loaded.

Carries are rather common and unpredictable, so do them branchlessly wherever we can.

Shift by 31 and 1, both of which are pretty common, can be done in a few less instructions. Tested with a hwtest.

Not quite as common a branch instruction as cmpwi, but close.

Omit carry calculations that get overwritten later in the block before they're used. Very common in the case of srawix and friends.

skidau · 2014-09-04T04:21:52Z

Code looks ok. What games/homebrew have you tested this on?

comex · 2014-09-04T04:53:37Z

I have reviewed some of this and it looks reasonable. ☂

JIT64: optimize CA calculations

FioraAeterna force-pushed the optimizeca branch 2 times, most recently from 3a9294c to 2c9dc1f Compare August 22, 2014 04:25

FioraAeterna changed the title ~~JIT64: optimize carry calculations~~ JIT64: optimize CA + PS1 calculations Aug 22, 2014

FioraAeterna force-pushed the optimizeca branch 3 times, most recently from 7ddfefd to e258345 Compare August 22, 2014 11:08

FioraAeterna changed the title ~~JIT64: optimize CA + PS1 calculations~~ JIT64: optimize CA calculations Aug 22, 2014

badkarma12 reviewed Aug 23, 2014
View reviewed changes

FioraAeterna force-pushed the optimizeca branch from e258345 to db24597 Compare August 23, 2014 02:33

FioraAeterna added 13 commits September 1, 2014 20:41

Rename Log2 and add IsPow2 to MathUtils for future use

b51aa4f

Also remove unused pow2/pow2f functions.

JIT64: optimize multiplication by immediate constants

58dc802

Factor out common code and handle a few more common cases.

JIT64: optimize rlwinmx/rlwinix and friends

41c3dde

Take advantage of movzx as a replacement for anding with 0xff or 0xffff, and abuse loads from the register cache to save ops.

JIT64: Optimize cmpXX

61af91f

Use TEST instead of CMP if we're comparing against 0 (rather common), and optimize the case of immediate compares further.

JIT64: optimize sign/zero-extend

355850f

Also remove some comments that no longer apply since x86_32 was dropped.

JIT64: avoid using LEA for adds when not necessary

cd0c52b

JIT64: use LEA for the "a = b + imm" case of addi

27996a6

JIT64: use xor instead of mov for loading a zero regcache immediate

ad51fc7

JIT64: tweak srwx/slwx BindToRegister arguments

ee24d47

Register B gets immediately moved into the shift register, so even if a == b it doesn't need to be loaded.

JIT64: Optimize carry handling

805be80

Carries are rather common and unpredictable, so do them branchlessly wherever we can.

JIT64: optimize some special cases of srawix

10d691a

Shift by 31 and 1, both of which are pretty common, can be done in a few less instructions. Tested with a hwtest.

JIT64: support merged branching for rlwinmx, too

a40278b

Not quite as common a branch instruction as cmpwi, but close.

JIT64: optimize carry calculations

3aa40da

Omit carry calculations that get overwritten later in the block before they're used. Very common in the case of srawix and friends.

FioraAeterna force-pushed the optimizeca branch from db24597 to 3aa40da Compare September 2, 2014 03:41

comex added a commit that referenced this pull request Sep 5, 2014

Merge pull request #852 from FioraAeterna/optimizeca

97420c6

JIT64: optimize CA calculations

comex merged commit 97420c6 into dolphin-emu:master Sep 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT64: optimize CA calculations #852

JIT64: optimize CA calculations #852

FioraAeterna commented Aug 22, 2014

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

skidau commented Sep 4, 2014

comex commented Sep 4, 2014

JIT64: optimize CA calculations #852

JIT64: optimize CA calculations #852

Conversation

FioraAeterna commented Aug 22, 2014

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

skidau commented Sep 4, 2014

comex commented Sep 4, 2014