New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JitArm64: Use LogicalImm in boolX #12060
Conversation
3c574d8
to
a871b10
Compare
Overall this is a good idea, but your example output for andcx is worse than before. While the instruction count is the same as before, the critical path becomes one cycle longer. |
Now that I think about it, even if both the cycle counts and the critical path latency were the same, the old one that materializes the immediate value would be preferable. This is because if the register later has to be written back to the ppcState struct, it has to be materialized sooner or later anyway (unless it's 0). So: If you can save one instruction by not materializing the immediate, please do so (it helps in the cases where the register doesn't have to be written to ppcState), but otherwise you should keep materializing the immediate. |
ARM64 has a special logical immediate encoding scheme, that can be used with AND, ORR, and EOR. By taking advantage of this, we no longer need to materialize the immediate value in a register, saving instructions and/or reducing register pressure. - orx Before: mov w23, #0x1 orr w23, w25, w23 After: orr w23, w25, #0x1 - andx Before: mov w26, #-0x80000000 and w27, w27, w26 sxtw x24, w27 After: and w27, w27, #0x80000000 sxtw x26, w27 - eqvx Before: mov w23, #0x4 eon w26, w23, w22 After: eor w26, w22, #0xfffffffb - xorx Before: mov w23, #0x1e eor w23, w27, w23 After: eor w23, w27, #0x1e - norx Before: mov w25, #-0x2001 orr w23, w23, w25 mvn w23, w23 After: orr w23, w23, #0xffffdfff mvn w23, w23
a871b10
to
a486168
Compare
That's an excellent point. Indeed, in cases where the instruction sequence is equal in length, materializing the immediate in a register could allow subsequent uses to leech off of it. Effectively, this means we shouldn't do the optimization for |
AND(gpr.R(a), gpr.R(j), log_imm); | ||
if (final_not) | ||
MVN(gpr.R(a), gpr.R(a)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the final_not
case, we could make use of de Morgan's laws and turn ~(s & b) into ~s | ~b. Since inverting the immediate has no runtime cost, this would let us replace the AND+MVN with ORN. But inverting the immediate after we already have log_imm
seems like effort... So I'll leave it up to you if you want to try implementing this in this PR or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot about this for a while.
Not a bad idea. You can even use this approach for any immediate, not just those that can be expressed as LogicalImm. But you still need to materialize the immediate somehow and that might take more than one MOV instruction, in which case using LogicalImm might still be preferable...
I should also note that I haven't seen a single game use nand
with immediates. And the only game that I've seen use nor
is Zelda Master Quest.
So given the complexity and how uncommon these instruction patterns are, I think it would be better to leave this for a follow-up PR for now, if that's alright with you.
gpr.BindToRegister(a, a == j); | ||
ORR(gpr.R(a), gpr.R(j), log_imm); | ||
if (final_not) | ||
MVN(gpr.R(a), gpr.R(a)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here regarding de Morgan's laws.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning to merge this after the beta is out.
ARM64 has a special logical immediate encoding scheme, that can be used with
AND
,ORR
, andEOR
. By taking advantage of this, we no longer need to materialize the immediate value in a register, saving instructions and/or reducing register pressure.andx
Before:
After:
orx
Before:
After:
norx
Before:
After:
xorx
Before:
After:
eqvx
Before:
After:
This one has been removed.
andcxBefore:
After: