-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more performant in-reg byte-reverse series of instr for P8 & P9 #5702
Conversation
fyi: @aviansie-ben @zl-wang |
Passes Tril LogicalTest on a P9 machine:
|
I would like to review before this is merged. @aviansie-ben could you please add me as a reviewer? @zl-wang I hope you can review as well. |
Updated PR description with more details and performance results |
Ready for another review... |
Was this run with a special version of Tril? AFAIK, the CPU detection logic doesn't currently run in Tril, so unless you hacked something together the new code wouldn't have been tested. |
I didn't know about that, thanks. Will use another test that checks byte-swap (sanity/functional/manual). |
500c36c
to
53f07b6
Compare
Updated code with suggested changes. P8 BumbleBench results are as following:
P9 BumbleBench results are as following:
* P8 short byteswap has unexpected result and I'll be looking more into it.
|
Looking more into the openj9-omr v0.23 release vs OMR Dec. 14 master d255f37
Looking for the openj9-omr v0.23 release vs OMR Dec. 14 master d255f37 plus this PR
|
A possible change in hand of some significance is reverting the short ByteSwap to original improved implementation (with rlwimi instructions and different implementation for P7) and that would have the false dependency but it improves P8 short ByteSwap with 1% (Reason for the updated change: #5702 (comment)). BumbleBench Result in compare to openj9-omr v0.23 release:
@aviansie-ben @zl-wang @gita-omr |
Given the ~1% performance gain in hand comparing to a possible situational unnecessary dependency. This change will give the following performance numbers compared to before the PR
And the following performance numbers compared to openj9-omr v0.23 release
|
As rlwimi became cheaper in P8 & P9, using it for in-register byte-reverse instruction sequence results in less total instructions and better performance Signed-off-by: Abdulrahman Alattas <rmnattas@gmail.com>
This to ensure same constant and argument type-length in 64- and 32-bit run modes. Signed-off-by: Abdulrahman Alattas <rmnattas@gmail.com>
53f07b6
to
0edd401
Compare
PR ready for review |
Thanks! I will take a look soon. |
Seemed no change from last time I reviewed. Approved! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@genie-omr build plinux,aix |
In-register byte-reverse uses the same instruction sequence for P7, P8 & P9.
Given that
rlwimi
became cheaper in P8 & P9, using it instead ofrlwinm
in P8 & P9 results in more performant and less total instructions to execute.Also, given the higher throughput of
rlwimi
in P8 & P9, instructions were ordered in a non-dependent matter to allow for parallel execution when possible.rlwimi
CostNumber of Instructions:
BumbleBench Performance Result on P9 (Updated numbers in comments)
Signed-off-by: Abdulrahman Alattas rmnattas@gmail.com