-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BitOperations.IsPow2 for all supported integral types #36163
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
734390c
to
d273780
Compare
src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs
Outdated
Show resolved
Hide resolved
{ | ||
// 1 set bit means number is a power of 2 unless it is the sign bit, so get rid of that | ||
// Cast to unsigned type so logic shift not signed arithmetic shift | ||
return PopCount((uint)value << 1) == 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(int)0x80000001
is not a power of 2, so you can't just discard the sign bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it doesn't match software fallback behavior (add more test cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually matches the bug in the SoftwareFallback((uint)value << 1)
function call. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, indeed, I didn't notice << 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops sorry - will add a test for that and fix the fallback. I think the best way to do it in the accelerated route is
uint unsigned = (uint)value;
uint withoutSign = unsigned << 1;
uint mask = unsigned >> 31;
return PopCount(withoutSign) & ~mask == 1;
which is probably cheaper than a branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, if you are using >> 31
and & ~mask
to ensure that IsPow2(value)
returns false
when value < 0
, then it does not matter what PopCount
returns for those values, so the << 1
is not necessary.
Is the code with PopCount
actually faster than the software fallback, in which the value >= 0
check can be combined with the value != 0
check that is necessary anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on Skylake, where popcount isn't as fast, and it seems branchless popcount is still generally faster than branch + popcount or the software fallback. Different should be even bigger on Ryzen where popcount is faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide some numbers and the benchmark you are running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this just (value > 0) && IsPow2((uint)value)
?
Negative values generally aren't considered to be powers of 2, since you can't raise either 2
or -2
to n
to get them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsPow2((uint)value)
almost works, except that -2147483648
would return true.
src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs
Outdated
Show resolved
Hide resolved
@john-h-k are you able to provide the benchmark numbers requested? thanks for the contribution |
Tagging subscribers to this area: @tannergooding, @pgovind, @jeffhandley |
@tannergooding @pgovind Is this something we could get perf numbers for ourselves? |
From a discussion on discord, @john-h-k reported the following using https://gist.github.com/john-h-k/f69f3017143e746c5f6e9f904a32767c:
|
My own box reports:
|
Given the above, it looks like the right approach is to go with The difference between using |
I think for // C++ compiler output
// x64 ARM64
// -------------------------------------------------
// blsr eax, ecx sub w1, w0, #1
// sete dl tst w1, w0
// test ecx, ecx ccmp w0, 0, 4, eq
// setg al cset w0, gt
// and al, dl ret
// ret
static bool IsPow2A(int x)
{
return (x & (x - 1)) == 0 && x > 0;
}
// C++ compiler output
// x64 ARM64
// -------------------------------------------------
// blsi eax, ecx neg w1, w0
// shr ecx, 1 and w1, w1, w0
// cmp eax, ecx cmp w1, w0, lsr 1
// setg al cset w0, gt
// ret ret
static bool IsPow2B(int x)
{
// Unsigned shift, signed comparison
return (x & -x) > (int)((uint)x >> 1);
} |
/// <param name="value">The value.</param> | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
[CLSCompliant(false)] | ||
public static bool IsPow2(uint value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update:
runtime/src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs
Line 224 in 64da821
if (PopCount(value) != 1) |
and
runtime/src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs
Line 237 in 64da821
if (PopCount(value) != 1) |
to use
IsPow2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to use just if ((value & (value - 1)) != 0)
without checking for 0. Otherwise Log2Ceiling(0)
would return 1
, which is greater than Log2Ceiling(1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log2Ceiling(0) doesn't matter right now because no code will do that. If Log2Ceiling is every made public, erroneous inputs will need to be considered then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only it would be correct, it should be faster as well. Ideally (value & (value - 1))
should be compiled into a single blsr
instruction on x64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever implementation is chosen for IsPow2, it doesn't change the fact that these call sites should use IsPow2. That's what my initial comment here is about.
[CLSCompliant(false)] | ||
public static bool IsPow2(uint value) | ||
{ | ||
if (Popcnt.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PopCount also has a path that special-cases AdvSimd.Arm64.IsSupported. Is that not worth checking for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current stance is that there looks to be some open questions around popcnt
perf, particularly with regards to random inputs: #36163 (comment), and so it might be better to just not have Popcnt
until we can get some more information as to what's causing this.
That is, is it due to popcnt
or is it due to bad branch prediction?
If it's the latter, then what is the "optimal" ordering based on expected inputs?
I think that these can largely be answered in a follow PR given the perf difference, even in the winning cases, is fairly minor.
When we do get to looking at that, I'd expect that IsPow2 is largely used with validation and with positive inputs for the int
case, in which case doing the "popcnt" check before the "edge case check" would be better.
This is because the "popcnt" check will cover everything except int.MinValue
(0x8000_0000
).
This means that if the assumption about inputs is correct, then the "expected" case will be you always do two compares and in the case where it's wrong, the first check results in an early exit in more cases than the "edge case check".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that Aarch64 has no POPCNT
instruction for general-purpose registers. It would normally take 4 instructions: fmov
, cnt
, addv
, fmov
, where the first fmov
instruction copies the value from a general-purpose register to a SIMD register and the last fmov
instruction copies the calculated population count back to a general-purpose register for a comparison.
@tannergooding Should you run tests again, please try the IsPow2B
variation I mentioned above.
Discussed with tanner in DMs
@@ -53,30 +53,30 @@ static class BitOperations | |||
/// </summary> | |||
/// <param name="value">The value.</param> | |||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | |||
public static bool IsPow2(int value) => value >= 0 && (value & (value - 1)) == 0; | |||
public static bool IsPow2(int value) => (value & (value - 1)) == 0 && value >= 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be && (value > 0)
because ((value & (value - 1)) == 0)
will be true
for 0
(hence why we have && (value != 0)
for the unsigned variant)
Hello @tannergooding! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
Fixes #31297
Currently writing tests locally