-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bitpermute opcode #2156
Add bitpermute opcode #2156
Conversation
@0dvictor Could you please review the X86 implementation? |
@fjeremic You may be interested in a Z implementation of this operation to improve JProfiling performance. |
compiler/il/ILProps.hpp
Outdated
@@ -228,8 +228,7 @@ namespace ILProp2 | |||
ZeroExtension = 0x01000000, | |||
SignExtension = 0x02000000, | |||
ByteSwap = 0x04000000, | |||
// Available = 0x08000000, | |||
// Available = 0x10000000, | |||
BitPermute = 0x08000000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to have deleted the 0x10000000
available line. We should leave that line in so others can use it. Otherwise a glossing eye may overlook that we jump from 0x08000000
to 0x20000000
Also why do we need this flag exactly? Doesn't the IL imply this already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was an accidental removal carried over from another set of changes.
The flag was added in an attempt to mirror ByteSwap's changes as a template for adding an opcode. Neither of the opcodes seem to make use of if. I'll remove BitPermute
.
9852f34
to
7ee28df
Compare
Any chance of getting 8 and 16-bit variants? Maybe not as part of this PR, but could you open an issue about it? 🙂 |
1458d3e
to
5674d61
Compare
8-bit and 16-bit variants have been added. |
looks reasonable to me |
Looks good to me. |
I have added TRIL tests for the new opcodes, which should help in keeping behaviour consistent across the platforms. |
d23e71c
to
781c616
Compare
Huge thumbs up for the Tril tests. Great addition @ncough ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look great! Awesome job @ncough 👍
genie-omr build all |
781c616
to
d91cb31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions only.
@@ -1023,6 +1023,17 @@ OMR::CodeGenerator::getSupportsIbyteswap() | |||
return false; | |||
} | |||
|
|||
/** | |||
* Query whether ibitpermute and lbitpermute are supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be: "whether [bsil]bitpermute IL opcodes are supported" ? It covers more than just i
and l
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was missed with the addition of the other variants. Thanks for finding it.
if (x < 8) | ||
generateRegRegInstruction(OR1RegReg, node, resultReg, tmpReg, cg); | ||
else | ||
generateRegRegInstruction(ORRegReg(nodeIs64Bit), node, resultReg, tmpReg, cg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can save some DLL code size here by generating one call with differing opcodes, something like:
TR_X86OpCodes op = (x < 8) ? OR1RegReg : ORRegReg(nodeIs64Bit);
generateRegRegInstruction(op, node, resultReg, tmpReg, cg);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have followed your suggestion and improved it.
d91cb31
to
a2c4537
Compare
genie-omr build all |
a2c4537
to
c0f38c1
Compare
This change adds a new opcode: bitpermute. The instruction takes an original value and an array of bit indices. It then constructs a new value by extracting bits at these indices in the original value and placing them in the lower bits of the new. Signed-off-by: Nicholas Coughlin <cnic@ca.ibm.com>
c0f38c1
to
d660085
Compare
There was an issue with the disable used to limit the Tril tests according to platform, as its currently only supported on X86. This has been corrected and other platforms should be excluded. It should be noted that this does not use |
genie-omr build all |
Test have passed. Merging. |
@ncough just to clarify this means running comptest on non-x86 platforms will result in the "Opcode not implemented" asserts once the BitPermute.cpp tests run? |
@fjeremic The BitPermute.cpp tests won't be instantiated on non-x86 platforms, as they are guarded by |
@dchopra001 FYI. You can use this as an example for Z on how to avoid VectorTest.cpp in #1941 until we have proper Tril support for platform determination. |
This change adds a new opcode, bitpermute, with 8, 16, 32 and 64 bit variants.
The opcode takes a value and an array of bit indices. It then constructs a new value, of the same width as the original, by extracting bits at these indices in the original value and placing them in the lower bits of the result, corresponding to their position in the bit index array. Unspecified bits are zeroed. This calculation can be expressed as:
The node has three children:
The opcode is specified using shifts, rather than bit numbering, to remain consistent across platforms. Similar to the shift operations, the behaviour for bit indices greater the width of the value is undefined. This affects both indices in the byte array greater than the width and if the number of specified bits, the array length, exceeds the width of the result.
This change includes an X86 implementation, which consists of a series of bit tests and sets. The current implementation has an optimization for constant length arrays, a common case under JProfiling.