New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64Emitter: Simplify LogicalImm logic #10807
Conversation
Source/Core/Common/Arm64Emitter.h
Outdated
// it. | ||
n = out_n; | ||
const u64 inverse_mask_from_trailing_ones = ~value | (value + 1); | ||
const size_t rotation = Common::LeastSignificantSetBit(value & inverse_mask_from_trailing_ones); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LeastSignificantSetBit(x)
can just return __builtin_ctzll(x)
- the result of which is undefined if x is 0, which it can be in this case. Maybe add a Common::CountTrailingZeros
matching the zero-handling of Common::CountLeadingZeros
?
I'd usually clear the trailing ones with just value & (value + 1)
, but otherwise this looks good.
(In case it's of interest, I wrote a blog post on optimising logical immediate encoding a while ago, which also removes the initial loop - it'd probably look something like this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! Why am I not surprised this isn't a novel insight! Neat!
I intuitively knew you could mask trailing ones again after rotation to determine z+o, but the observation that verifying rotation also ensured esize was a power of two wasn't obvious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yeah, I don't know if that one's obvious to anyone. (I'd consider adding an explicit "is power of two" check to spare future people who stumble upon this code from having to understand that, but it feels silly to add code that I know is redundant.)
Heavily simplify logical immediate encoding. This is based on the observation that if a valid repeating element exists, it repeats through `value`. Thus it does not matter which one you analyse. Thus we skip over the least significent element if LSB = 1 by masking it out with `inverse_mask_from_trailing_ones`, to avoid the degenerate case of a stretch of 1 bits going 'round the end' of the word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the intermediate version of the algorithm (the one in the first commit), but the final version looks good. Just some style nits.
LGTM after squashing the changes in the last commit into the appropriate commits. |
Simplify logical immediate encoding logic.
This is based on the observation that if a valid repeating element exists, it repeats through
value
. Thus it does not matter which one you analyse. Thus we skip over the least significant element if LSB = 1 by masking it out withinverse_mask_from_trailing_ones
, to avoid the degenerate case of a stretch of 1 bits going 'round the end' of the word.