-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add a BitManipulation class #18876
Comments
public int IsBitSet(long value, int position); Should this return |
@nguerrera I recall you had mentioned some internal helpers that Roslyn was using that were related to this sort of functionality. Any input here? |
CountBits is the main one. Search for BitArithmetic in Roslyn and corefx. I've long wanted a nice class to hide the scariness of https://graphics.stanford.edu/~seander/bithacks.html |
Yes that should return bool. If this seems like something that would be liked I'd happily submit a pr for little endian architecture. |
This issue, at least partially, overlaps with https://github.com/dotnet/coreclr/issues/6906 |
Shouldn't the methods be declared as static?
corrected All and all I like the idea of a central location for common bit manipulating functions. |
@mburbea All the Set methods need a return value or the value as an out param. |
Very, very much +1 for this idea. Could contain other useful methods such as public int CountBits(int value);
public int RoundUpToPowerOfTwo(int value);
public int IsPowerOfTwo(int value); to prevent people from having their own hand-rolled, inefficient implementations. IMO, we may also want to name it something like |
Isn't CountBits just pop count? |
Yeah, @jamesqo, all I was saying that the method was covered. I wouldn't be opposed to calling it something else, I went with the name because it was suggested by @mellinoe . public static long IsPowerOfTwo(long value)=> x == (x&-x) && x!=0;
public static long RoundUpToPowerOfTwo(long value)=> 1 << (64 -CountLeadingZeroes(value - 1)); |
Ah, ok, I see. Question-- do you think it's necessary to have methods like Also, there is somewhat of an ambiguity-- people don't know if the |
@jamesqo I think it's easier to read a single method call with a descriptive name than an expression with three operators and two magic constants (even if they are just 0 and 1). Especially when it comes to negation, I think it's relatively easy to confuse You are free to keep writing it the way you're used to, but I think having those methods in the base library as an option would be useful. |
I'm not opposed to removing them if they're considered a hindrance to api acceptance, but I like having convience methods. None of those methods are hard to write, but like most C# developers most of my code is higher level, so I rarely twiddle bits. When I need to, I often end up googling these things, so centralizing them made sense to me. |
@mburbea @jamesqo I write a lot of bit twiddling code and still would appreciate to be able to get a proper API I could reason about instead of having my entire codebase filled with magic constants, shifts and binary operations. Always as long as the actual code generated is as efficient (guaranteed) as the code generated using them. |
@redknightlois Hmm, ok. I guess popular vote wins since everyone seems to want those methods 😄 There is still the issue of ambiguity whether |
That might be a question of endianness (as in big endian long, the msb is where the 0th bit is stored and the lsb is where the 63rd bit is stored, but in little endian it's reversed), Intuitively, I feel its implied that its in sequential order for your hardware. e.g. |
@mellinoe @CarolEidt https://arxiv.org/abs/1611.07612 faster popcnt than popcnt using AVX2 |
@mellinoe , @CarolEidt What else needs to be resolved for this API proposal to move forward? |
@mburbea I think that this is in a good state and should be ready for our meeting next week. For our reviewing, I think having a real implementation for all of these methods would actually be very helpful. A big part of the benefit of these API's is that they will hide the "scariness" of complicated bit manipulation from the user. It would be good if we could see just how "scary" the code is on a case-by-case basis. For example, it's not clear that the code for |
Additionally, can you update the original post with your finalized proposal, including any extra methods that were suggested that you believe are valuable? |
@mellinoe, would these methods have intrinsic support (many of these functions have CPU level instructions that are useable/preferred in some scenarios). |
It's something to consider as an optimization, but I don't think it's anything we need to block the proposal or an implementation on. |
Are we still considering different names for these methods? IMO, in their current form they are kinda verbose-- for example, I think we may want to rename some of the methods, e.g.:
We can also consider another name for the class, such as BitTwiddler.RotateLeft(i, 5);
BitManipulation.CircularShiftLeft(i, 5); edit: In addition, |
IMO, C# has always been a bit more on the verbose side. I felt that was intentional as code is more often read than written. The longer names have always been explained away with the fact that most people are writing C# using an IDE which offers Intelisense.
SqlDataRecord has both a |
We use
|
I wonder if this is one of the types that is likely going to be used with |
Verbose does not always equal more readable-- the meaning that's conveyed is what's important.
😄 🎉 We should consider both
IMO, I think people who don't have a background in assembly deciphering what
Yes, but the point is
|
I think we should make it so people don't have to add |
@CarolEidt, their activity log doesn't show them to be very active. I may just create a new issue to track this if it doesn't get a response soon. |
@tannergooding , my current role doesn't allow me to participate too much on github :( If you'd like to create a new issue on this, by all means. I think my initial API & PoC are pretty close, but obviously deficient. I mostly agree with your points but I do feel that an enum for acceleration check would be a good idea though, software workarounds could be brutal performance hit. |
I am happy to take this one, but I propose an incremental approach that starts with a simpler API and takes advantage of intrinsics but is software emulated. In later PR, we can add more functions and specific hardware optimizations: public static partial class BitOps // .Primitive
{
/// <summary>
/// Reads whether the specified bit in a mask is set.
/// </summary>
/// <param name="value">The bit mask.</param>
/// <param name="offset">The ordinal position of the bit to read.</param>
public static bool ExtractBit(in byte value, in byte offset);
public static bool ExtractBit(in ushort value, in byte offset);
public static bool ExtractBit(in uint value, in byte offset);
public static bool ExtractBit(in ulong value, in byte offset);
/// <summary>
/// Sets the specified bit in a mask and returns whether it was originally set,
/// like BTS/BTR.
/// </summary>
/// <param name="value">The bit mask.</param>
/// <param name="offset">The ordinal position of the bit to write.</param>
/// <param name="on">True to set the bit to 1, or false to set it to 0.</param>
public static bool InsertBit(ref uint value, in byte offset, in bool on);
public static bool InsertBit(ref int value, in byte offset, in bool on);
public static bool InsertBit(ref ulong value, in byte offset, in bool on);
public static bool InsertBit(ref long value, in byte offset, in bool on);
/// <summary>
/// Negates the specified bit in a mask and returns whether it was originally set,
/// like BTC.
/// </summary>
/// <param name="value">The bit mask.</param>
/// <param name="offset">The ordinal position of the bit to flip.</param>
public static bool FlipBit(ref byte value, in byte offset);
public static bool FlipBit(ref ushort value, in byte offset);
public static bool FlipBit(ref uint value, in byte offset);
public static bool FlipBit(ref ulong value, in byte offset);
/// <summary>
/// Rotates the specified mask left by the specified number of bits.
/// </summary>
/// <param name="value">The value to rotate.</param>
/// <param name="offset">The number of bits to rotate by.</param>
/// <returns>The rotated value.</returns>
public static byte RotateLeft(in byte value, in byte offset);
public static byte RotateRight(in byte value, in byte offset);
public static ushort RotateLeft(in ushort value, in byte offset);
public static ushort RotateRight(in ushort value, in byte offset);
// Takes advantage of existing intrinsics (https://github.com/dotnet/coreclr/pull/1830)
public static uint RotateLeft(in uint value, in byte offset);
public static uint RotateRight(in uint value, in byte offset);
public static ulong RotateLeft(in ulong value, in byte offset);
public static ulong RotateRight(in ulong value, in byte offset);
/// <summary>
/// Returns the population count (number of bits set) in a mask.
/// </summary>
/// <param name="value">The bit mask.</param>
public static int PopCount(in byte value);
public static int PopCount(in ushort value);
public static int PopCount(in uint value);
public static int PopCount(in ulong value);
/// <summary>
/// Count the number of leading bits in a mask.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="on">True to count each 1, or false to count each 0.</param>
public static int LeadingCount(in byte value, in bool on);
public static int LeadingCount(in ushort value, in bool on);
public static int LeadingCount(in uint value, in bool on);
public static int LeadingCount(in ulong value, in bool on);
/// <summary>
/// Count the number of trailing bits in a mask.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="on">True to count each 1, or false to count each 0.</param>
public static int TrailingCount(in byte value, in bool on);
public static int TrailingCount(in ushort value, in bool on);
public static int TrailingCount(in uint value, in bool on);
public static int TrailingCount(in ulong value, in bool on);
// In a separate PR
// Read/Write:byte/ushort/uint
// Span<byte> overloads
// FloorLog2
// Other...
} I will create a formal API proposal if there's any interest in doing this. |
PopCount, LeadingOnes, LeadingZeroes, TrailingOnes, TrailingZeroes, etc could use a Span interface. In that way the API would add something not readily available on intrinsics interface |
I am keeping this design minimal so that we can get the basic functionality out the door. Span overloads are a good idea and (per comments in code) planned for the next iteration. |
@grant-d, creating a formal API proposal would be great. It slipped off my radar a while back and I never got around to it. It would likely be useful to incorporate the feedback @CarolEidt and I gave here (and a couple posts down): https://github.com/dotnet/corefx/issues/12425#issuecomment-356079499 I think the primary feedback was: |
@redknightlois, the point of these APIs is to provide a general-purpose API that works on all platforms (which means providing a software fallback) and is generally-usable. Hardware Intrinsics are for performance oriented scenarios where you require hardware acceleration and need more direct control of the code that is emitted. |
Why would Span overloads prevent getting this "out of the door"? I think at absolute minimum we should design APIs as we want them long term, review the full design, and then possibly implement in stages. |
I honestly don't see the use-case for the Span APIs here. Could someone elaborate on why you would want them? |
I will create a formal PR. In the meantime, I have updated the |
@tannergooding For example, whenever you want to use Popcount is for data structures which are compressed in bitstreams (the same apply for LeadingOnes, TrailingZeroes and the like) which means that you are not doing it on a long, but on a variable size byte stream. Furthermore, optimized Popcount is not as simple as apply popcount and sum the results, in fact depending on the architecture different implementations may be exercised for those operations. Depending on the bitstream size it would make sense to prime the L2 cache, etc... If size is big enough, the Avx2 based Harley Seal algorithm have 50% memory throughput than using the built-in version because of higher instruction parallelism, etc... So, it makes sense that all of that should be abstracted away given that those primitives are quintessential to high performance compressed data structures and many other performance sensitive operations. Not including the batch versions while easier could counter the actual gains that could be obtained in code simplification on its intended usage scenario. |
This seems like a case of complicating the implementation/exposed APIs for something that may be very use-case specific and which may be better suited for exposure elsewhere. |
How does it complicate APIs? I do agree it's more complex implementation, but we don't split logical overloads into separate classes/features just because implementation of some of the overloads is more tricky. |
Because,
I would think that |
Indeed
That's exactly what a pit of success looks like IMHO. A sensible API should give you the latter by default and not expect the user to have to deal with that, until there is no other choice. |
Right. But is For
There are many algorithms which can benefit from So, personally, I believe |
Agree with @tannergooding. I don't deny that there might be a need for |
Where I come from popcount over higher than 64 bits is hardly esoteric, it is the base operation. I don't care about
I usually tend to agree, but having seen EDIT: And the problem is that the Case in point: https://github.com/dotnet/corefx/issues/2209#issuecomment-139636114 |
@redknightlois trying to understand the requirement: would |
EDIT: If say |
I have to agree with the position that having a |
Here's a stab at the public static partial class BitOps // .Span
{
bool ExtractBit(ReadOnlySpan<byte> value, in uint offset);
bool ExtractBit(ReadOnlySpan<uint> value, in uint offset);
bool ExtractBit(ReadOnlySpan<ulong> value, in uint offset);
bool ExtractBit(ReadOnlySpan<ushort> value, in uint offset);
bool FlipBit(Span<byte> value, in uint offset);
bool FlipBit(Span<uint> value, in uint offset);
bool FlipBit(Span<ulong> value, in uint offset);
bool FlipBit(Span<ushort> value, in uint offset);
bool InsertBit(Span<byte> value, in uint offset, in bool on);
bool InsertBit(Span<uint> value, in uint offset, in bool on);
bool InsertBit(Span<ulong> value, in uint offset, in bool on);
bool InsertBit(Span<ushort> value, in uint offset, in bool on);
long PopCount(ReadOnlySpan<byte> value);
long PopCount(ReadOnlySpan<uint> value);
long PopCount(ReadOnlySpan<ulong> value);
long PopCount(ReadOnlySpan<ushort> value);
long LeadingCount(ReadOnlySpan<byte> value, in bool on);
long LeadingCount(ReadOnlySpan<uint> value, in bool on);
long LeadingCount(ReadOnlySpan<ulong> value, in bool on);
long LeadingCount(ReadOnlySpan<ushort> value, in bool on);
long TrailingCount(ReadOnlySpan<byte> value, in bool on);
long TrailingCount(ReadOnlySpan<uint> value, in bool on);
long TrailingCount(ReadOnlySpan<ulong> value, in bool on);
long TrailingCount(ReadOnlySpan<ushort> value, in bool on);
} |
You shouldn't need |
I like the self-documenting nature of them, even if in the case of |
Formal proposal here: https://github.com/dotnet/corefx/issues/32269 |
Closing this in favor of https://github.com/dotnet/corefx/issues/32269 |
I propose adding a new class to CoreFx, which will contain many common bit manipulation techniques.
Part of having a class like this is that the Jit can target these methods class for intrinsics, similar to what was done for
Vector<T>
.There is still some open discussion as to what the shape of this API should be, and even the class name.
Class Name
There are several class names being thrown around and none are particularly winning everyone over.
Two Classes?
@benaadams, correctly notes that there seems to be two different APIs here.
A low level view allowing you to manipulate a numeric type to extract or inject a bit (like a bit vector), byte, short, or int value.
These methods are the equivalent of the following but safer (and possibly faster)
And another set of utility exposing methods for treating an integer register type like a unit of data allowing you to manipulate it or introspect it.
Does it make sense to keep these APIs in one class?
Method Names
Another point of contention. What should we call these methods? For the "view" members, there is some dislike of the naming convention of
Get
/Set
even though there is prior art likeBitVector32
&SqlDataRecord
. Everyone seems to likeRead
/Write
more, but whileRead
is fine.Write
isn't neccessarily the operation being done. I'm still looking for some verbiage to note that this really takes a value, modifies it a bit (no pun intended), and spit out a new one.PopCount/HammingWeight/ CountSetBits - We can't decide on this name. I personally like
PopCount
as it is a well known name for the algorithm. However, for someone who does not CPU intrinsics or bit tricks this name might mean nothing to you. the .net api is split where common or well known algorithms are simply called that (e.g.Rijndael
) and sometimes descriptive for new users. I think that this class in general is fairly low-level so even a novice should be expected to do a quick google search in the subject area.And even naming the methods as actions (e.g.
CountTrailingZeroes
) or properties (e.g.TrailingZeroesCount
).Hardware Acceleration / Detection
@benaadams, suggested adding a simple flag to determine if the class has hardware acceleration. I personally suggest going a step further and adding an
enum
to describe which methods are accelerated (not in the PoC).Unfortunately, the
enum
based approach does raise the question if the jit could do branch elimination on a comparison like the following if it could replaceAcceleratedOperations
as a runtime constant. (unknown)I admittedly question what you would do differently if it isn't accelerated. This isn't like
Vector<T>
where you might switch to a different approach and useulong
like smaller vectors. The methods should be pretty good solutions to these problems and outside of switching to some native code I don't see us doing better.Methods that could be accelerated (in AMD64 at least)::
PopCount
=>POPCNT
RotateRight
=>ROR
(already accelerated!)RotateLeft
=>ROL
(already accelerated!)CountLeadingZeroes
=>LZCNT
(in ABM compliant hardware) or (BSR
followed by xor 63)CountTrailingZeroes
=>TZCNT
/BSF
ReadBit
=>BT
WriteBit
=>BTS
/BTC
(maybe?)ReadByte
/ReadInt16
/ReadInt32
=>BEXTR
(possibly)Updated spec:
Edit: POC class
https://gist.github.com/mburbea/c9a71ac1b1a25762c38c9fee7de0ddc2
More updates! Removed signed rotate operators.
The text was updated successfully, but these errors were encountered: