Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add a BitManipulation class #18876

Closed
mburbea opened this issue Oct 6, 2016 · 101 comments
Closed

Proposal: Add a BitManipulation class #18876

mburbea opened this issue Oct 6, 2016 · 101 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Numerics
Milestone

Comments

@mburbea
Copy link

mburbea commented Oct 6, 2016

I propose adding a new class to CoreFx, which will contain many common bit manipulation techniques.
Part of having a class like this is that the Jit can target these methods class for intrinsics, similar to what was done for Vector<T>.
There is still some open discussion as to what the shape of this API should be, and even the class name.

Class Name

There are several class names being thrown around and none are particularly winning everyone over.

  • Bits
  • BitOperations or BitOps
  • BitUtility
  • BitTwiddler
  • BitView (see the next section)

Two Classes?

@benaadams, correctly notes that there seems to be two different APIs here.
A low level view allowing you to manipulate a numeric type to extract or inject a bit (like a bit vector), byte, short, or int value.
These methods are the equivalent of the following but safer (and possibly faster)

public static unsafe int ExtractInt(ulong u,int pos)=>((int*)&u)[pos];
public static unsafe ulong InjectInt(ulong u, int pos, int toSet) {
    ((int*)&u)[pos] = toSet;
     return u;
}

And another set of utility exposing methods for treating an integer register type like a unit of data allowing you to manipulate it or introspect it.

Does it make sense to keep these APIs in one class?

Method Names

Another point of contention. What should we call these methods? For the "view" members, there is some dislike of the naming convention of Get/Set even though there is prior art like BitVector32 & SqlDataRecord. Everyone seems to like Read/Write more, but while Read is fine. Write isn't neccessarily the operation being done. I'm still looking for some verbiage to note that this really takes a value, modifies it a bit (no pun intended), and spit out a new one.

PopCount/HammingWeight/ CountSetBits - We can't decide on this name. I personally like PopCount as it is a well known name for the algorithm. However, for someone who does not CPU intrinsics or bit tricks this name might mean nothing to you. the .net api is split where common or well known algorithms are simply called that (e.g. Rijndael) and sometimes descriptive for new users. I think that this class in general is fairly low-level so even a novice should be expected to do a quick google search in the subject area.

And even naming the methods as actions (e.g. CountTrailingZeroes) or properties (e.g. TrailingZeroesCount).

Hardware Acceleration / Detection

@benaadams, suggested adding a simple flag to determine if the class has hardware acceleration. I personally suggest going a step further and adding an enum to describe which methods are accelerated (not in the PoC).
Unfortunately, the enum based approach does raise the question if the jit could do branch elimination on a comparison like the following if it could replace AcceleratedOperations as a runtime constant. (unknown)

// 
if (Bits.AcceleratedOperations == BitOps.PopCount){
   // use popcount
}
else{
  // work around it?
}

I admittedly question what you would do differently if it isn't accelerated. This isn't like Vector<T> where you might switch to a different approach and use ulong like smaller vectors. The methods should be pretty good solutions to these problems and outside of switching to some native code I don't see us doing better.
Methods that could be accelerated (in AMD64 at least)::
PopCount => POPCNT
RotateRight => ROR (already accelerated!)
RotateLeft => ROL (already accelerated!)
CountLeadingZeroes => LZCNT (in ABM compliant hardware) or (BSR followed by xor 63)
CountTrailingZeroes => TZCNT / BSF
ReadBit => BT
WriteBit => BTS/BTC (maybe?)
ReadByte/ReadInt16/ReadInt32 => BEXTR (possibly)

Updated spec:

public static class Bits{
       public static int PopCount(ulong value);
       public static int PopCount(long value);
       public static int PopCount(uint value);
       public static int PopCount(int value);

       public static ulong RotateRight(ulong value, int offset);
       public static uint  RotateRight(uint value, int offset);

       public static ulong RotateLeft(ulong value, int offset);
       public static uint  RotateLeft(uint value, int offset);

       public static int CountLeadingZeros(ulong value);
       public static int CountLeadingZeros(long value);
       public static int CountLeadingZeros(uint value);
       public static int CountLeadingZeros(int value);

       public static int CountTrailingZeroes(ulong value);
       public static int CountTrailingZeroes(long value);
       public static int CountTrailingZeroes(uint value);
       public static int CountTrailingZeroes(int value);

       public static  bool ReadBit(ulong value, int offset);
       public static  bool ReadBit(long value, int offset);
       public static  bool ReadBit(uint value, int offset);
       public static  bool ReadBit(int value, int offset);

       public static ulong WriteBit(ulong value, int offset, bool toSet);
       public static long WriteBit(long value, int offset, bool toSet);
       public static uint WriteBit(uint value, int offset, bool toSet);
       public static int WriteBit(int value, int offset, bool toSet);

       public static byte ReadByte(ulong value, int offset);
       public static byte ReadByte(long value, int offset);
       public static byte ReadByte(uint value, int offset);
       public static byte ReadByte(int value, int offset);

       public static short ReadInt16(ulong value, int offset);
       public static short ReadInt16(long value, int offset);
       public static short ReadInt16(uint value, int offset);
       public static short ReadInt16(int value, int offset);

       public static int ReadInt32(ulong value,  int offset);
       public static int ReadInt32(long value,  int offset);

       public static ulong WriteByte(ulong value, int offset, byte toSet);
       public static long WriteByte(long value, int offset, byte toSet);
       public static uint WriteByte(uint value, int offset, byte toSet);
       public static int WriteByte(int value, int offset, byte toSet);

       public static ulong WriteInt16(ulong value, int offset, short toSet);
       public static long WriteInt16(long value, int offset, short toSet);
       public static uint WriteInt16(uint value, int offset, short toSet);
       public static int WriteInt16(int value, int offset, short toSet);

       public static ulong WriteInt32(ulong value, int position, int toSet);
       public static long WriteInt32(long value, int position, int toSet);
}

Edit: POC class
https://gist.github.com/mburbea/c9a71ac1b1a25762c38c9fee7de0ddc2

More updates! Removed signed rotate operators.

@svick
Copy link
Contributor

svick commented Oct 6, 2016

public int IsBitSet(long value, int position);

Should this return bool instead of int?

@mellinoe
Copy link
Contributor

mellinoe commented Oct 6, 2016

@nguerrera I recall you had mentioned some internal helpers that Roslyn was using that were related to this sort of functionality. Any input here?

@nguerrera
Copy link
Contributor

CountBits is the main one. Search for BitArithmetic in Roslyn and corefx. I've long wanted a nice class to hide the scariness of https://graphics.stanford.edu/~seander/bithacks.html

@mburbea
Copy link
Author

mburbea commented Oct 7, 2016

Yes that should return bool. If this seems like something that would be liked I'd happily submit a pr for little endian architecture.

@Tornhoof
Copy link
Contributor

Tornhoof commented Oct 7, 2016

This issue, at least partially, overlaps with https://github.com/dotnet/coreclr/issues/6906

@SunnyWar
Copy link
Contributor

SunnyWar commented Oct 7, 2016

Shouldn't the methods be declared as static?
And these have variable name conflicts and a missing comma on SetInt16.

public void SetBit(long value, int position, bool value); public void SetByte(long value, int position, byte value); public void SetInt16(long value, int position. short value); public void SetInt32(long value, int position, int value);

corrected
public static void SetBit(long value, int position, bool to); public static void SetByte(long value, int position, byte to); public static void SetInt16(long value, int position, short to); public static void SetInt32(long value, int position, int to);

All and all I like the idea of a central location for common bit manipulating functions.

@mburbea
Copy link
Author

mburbea commented Oct 7, 2016

Yes, I fixed the error thanks @SunnyWar .
@Tornhoof ,Yes it does, for some of the intrinsics.

@SunnyWar
Copy link
Contributor

SunnyWar commented Oct 7, 2016

@mburbea All the Set methods need a return value or the value as an out param.

@jamesqo
Copy link
Contributor

jamesqo commented Oct 9, 2016

Very, very much +1 for this idea. Could contain other useful methods such as

public int CountBits(int value);
public int RoundUpToPowerOfTwo(int value);
public int IsPowerOfTwo(int value);

to prevent people from having their own hand-rolled, inefficient implementations.

IMO, we may also want to name it something like BitUtility instead of BitManipulation. The latter feels a bit verbose, and is more prone to typos.

@mburbea
Copy link
Author

mburbea commented Oct 9, 2016

Isn't CountBits just pop count?

@jamesqo
Copy link
Contributor

jamesqo commented Oct 9, 2016

@mburbea Yes, indeed. I believe the JIT does not currently generate that instruction yet though (related issue: #14781) because the code pattern is too hard to recognize, so this might be a good entry point to expose it.

@mburbea
Copy link
Author

mburbea commented Oct 10, 2016

Yeah, @jamesqo, all I was saying that the method was covered. I wouldn't be opposed to calling it something else, I went with the name because it was suggested by @mellinoe .
I think these are good additions and once we have some of these intrinsics could be implemented quite easily.

public static long IsPowerOfTwo(long value)=> x == (x&-x) && x!=0;
public static long RoundUpToPowerOfTwo(long value)=> 1 << (64 -CountLeadingZeroes(value - 1));

@jamesqo
Copy link
Contributor

jamesqo commented Oct 10, 2016

@mburbea

Yeah, @jamesqo, all I was saying that the method was covered. I wouldn't be opposed to calling it something else, I went with the name because it was suggested by @mellinoe .

Ah, ok, I see.

Question-- do you think it's necessary to have methods like IsBitSet, GetByte, SetByte, etc? I mean most people with basic knowledge of bit manipulation would know they could just write (value & (1 << position)) != 0 instead of IsBitSet, (value & 0xFF00) >> 8 instead of GetByte(value, 1), etc. I think the most useful methods in this class would be the ones that aren't very obvious to implement.

Also, there is somewhat of an ambiguity-- people don't know if the position parameter starts from the msb or lsb.

@svick
Copy link
Contributor

svick commented Oct 10, 2016

@jamesqo I think it's easier to read a single method call with a descriptive name than an expression with three operators and two magic constants (even if they are just 0 and 1). Especially when it comes to negation, I think it's relatively easy to confuse (value & (1 << position)) != 0 and (value & (1 << position)) == 0, but you aren't likely to confuse IsBitSet(value, position) with !IsBitSet(value, position).

You are free to keep writing it the way you're used to, but I think having those methods in the base library as an option would be useful.

@mburbea
Copy link
Author

mburbea commented Oct 10, 2016

I'm not opposed to removing them if they're considered a hindrance to api acceptance, but I like having convience methods. None of those methods are hard to write, but like most C# developers most of my code is higher level, so I rarely twiddle bits. When I need to, I often end up googling these things, so centralizing them made sense to me.

@redknightlois
Copy link

@mburbea @jamesqo I write a lot of bit twiddling code and still would appreciate to be able to get a proper API I could reason about instead of having my entire codebase filled with magic constants, shifts and binary operations. Always as long as the actual code generated is as efficient (guaranteed) as the code generated using them.

@jamesqo
Copy link
Contributor

jamesqo commented Oct 10, 2016

@redknightlois Hmm, ok. I guess popular vote wins since everyone seems to want those methods 😄 There is still the issue of ambiguity whether position starts from the msb or lsb, though.

@mburbea
Copy link
Author

mburbea commented Oct 10, 2016

That might be a question of endianness (as in big endian long, the msb is where the 0th bit is stored and the lsb is where the 63rd bit is stored, but in little endian it's reversed),

Intuitively, I feel its implied that its in sequential order for your hardware. e.g. SetBit(0, 0, true) should return the number 1 regardless of endian order.

@benaadams
Copy link
Member

@mellinoe @CarolEidt https://arxiv.org/abs/1611.07612 faster popcnt than popcnt using AVX2
Apache License 2.0 source https://github.com/CountOnes/hamming_weight

@mburbea
Copy link
Author

mburbea commented Dec 5, 2016

@mellinoe , @CarolEidt What else needs to be resolved for this API proposal to move forward?

@mellinoe
Copy link
Contributor

mellinoe commented Dec 6, 2016

@mburbea I think that this is in a good state and should be ready for our meeting next week.

For our reviewing, I think having a real implementation for all of these methods would actually be very helpful. A big part of the benefit of these API's is that they will hide the "scariness" of complicated bit manipulation from the user. It would be good if we could see just how "scary" the code is on a case-by-case basis. For example, it's not clear that the code for SetBit (above) is scary enough to justify inclusion, but I would bet that the code for PopulationCount definitely is. Having the code ready to look at would be very helpful for our review.

@mellinoe
Copy link
Contributor

mellinoe commented Dec 6, 2016

Additionally, can you update the original post with your finalized proposal, including any extra methods that were suggested that you believe are valuable?

@tannergooding
Copy link
Member

@mellinoe, would these methods have intrinsic support (many of these functions have CPU level instructions that are useable/preferred in some scenarios).

@mellinoe
Copy link
Contributor

mellinoe commented Dec 7, 2016

@mellinoe, would these methods have intrinsic support (many of these functions have CPU level instructions that are useable/preferred in some scenarios).

It's something to consider as an optimization, but I don't think it's anything we need to block the proposal or an implementation on.

@jamesqo
Copy link
Contributor

jamesqo commented Dec 7, 2016

Are we still considering different names for these methods? IMO, in their current form they are kinda verbose-- for example, BitManipulation.PopulationCount is pretty long to type, and its meaning is not immediately obvious to non-assembly programmers. (It may not even be implemented with popcnt.)

I think we may want to rename some of the methods, e.g.:

  • PopulationCount => CountBits
  • CircularShiftLeft => RotateLeft
  • CircularShiftRight => RotateRight

We can also consider another name for the class, such as BitTwiddler. Compare the number of characters:

BitTwiddler.RotateLeft(i, 5);
BitManipulation.CircularShiftLeft(i, 5);

edit: In addition, GetInt32 / SetInt32 and family sound a bit too Java-like for me. (Set also implies that something is being mutated, rather than a new value being returned.) Not sure what would be a good rename, though.

@mburbea
Copy link
Author

mburbea commented Dec 7, 2016

IMO, C# has always been a bit more on the verbose side. I felt that was intentional as code is more often read than written. The longer names have always been explained away with the fact that most people are writing C# using an IDE which offers Intelisense.

  • I am okay with renaming the class to maybe something like BitUtility.
  • I'll rename CircularShift*=> Rotate*.
  • PopulationCount: hows about PopCount? It is a well known name for this operation, and is shorter than CountBits.

SqlDataRecord has both a GetInt32 and SetInt32, so the names are not exactly unique to Java.
As for the mutation question, would you prefer that the method takes a ref parameter and modify their input?
Edit: Any suggestion for namespace? System.Numerics?

@redknightlois
Copy link

We use Bits and Sparrow.Binary as a namespace to put all that stuff and in practice it reads quite well. https://github.com/ravendb/ravendb/blob/v4.0/src/Sparrow/Binary/Bits.cs (Sparrow is our fast library name, call that System.Binary here instead).

PopCount is a very well known name, the same with RotateLeft32, RotateRight32, RotateLeft64, RotateRight64

@svick
Copy link
Contributor

svick commented Dec 7, 2016

I wonder if this is one of the types that is likely going to be used with using static. If that's the case, the length of the class name does not matter much, since it's going to appear only once per file.

@jamesqo
Copy link
Contributor

jamesqo commented Dec 7, 2016

@mburbea

IMO, C# has always been a bit more on the verbose side. I felt that was intentional as code is more often read than written.

Verbose does not always equal more readable-- the meaning that's conveyed is what's important. CircularShiftLeft does not tell the reader more information than RotateLeft.

I am okay with renaming the class to maybe something like BitUtility.

I'll rename CircularShift*=> Rotate*.

😄 🎉 We should consider both BitUtility (in tandem with WebUtility) and BitTwiddling for the name.

PopulationCount: hows about PopCount? It is a well known name for this operation, and is shorter than CountBits.

IMO, I think people who don't have a background in assembly deciphering what PopCount means without Googling it. CountBits is longer, but only by 1 character. I think the API reviewers should consider both.

SqlDataRecord has both a GetInt32 and SetInt32, so the names are not exactly unique to Java.

Yes, but the point is Get / Set kind of goes against the grain-- .NET has properties, so get/set-naming isn't very common in code. The example you pointed out is an object which mutates state, which is different from the scenario we're dealing with currently.

As for the mutation question, would you prefer that the method takes a ref parameter and modify their input?

ref parameters are not very flexible and do not work well when you want to do multiple operations on the int. I am OK with the returning-a-new-value-each-time scheme, I'm just trying to think of a better name.

Any suggestion for namespace? System.Numerics?

BitConverter is already in System, and it would be inconvenient if someone had to add another using every time they wanted to do bit twiddling. I think it should be System.

@jamesqo
Copy link
Contributor

jamesqo commented Dec 7, 2016

@svick

I wonder if this is one of the types that is likely going to be used with using static. If that's the case, the length of the class name does not matter much, since it's going to appear only once per file.

I think we should make it so people don't have to add using static in the first place. I'm not sure what you mean by "one of the types that is likely going to be used with using static;" I rarely use it because it can be very ambiguous to the reader.

@tannergooding
Copy link
Member

@CarolEidt, their activity log doesn't show them to be very active. I may just create a new issue to track this if it doesn't get a response soon.

@mburbea
Copy link
Author

mburbea commented Jan 16, 2018

@tannergooding , my current role doesn't allow me to participate too much on github :( If you'd like to create a new issue on this, by all means. I think my initial API & PoC are pretty close, but obviously deficient.

I mostly agree with your points but I do feel that an enum for acceleration check would be a good idea though, software workarounds could be brutal performance hit.

@grant-d
Copy link
Contributor

grant-d commented Sep 11, 2018

I am happy to take this one, but I propose an incremental approach that starts with a simpler API and takes advantage of intrinsics but is software emulated. In later PR, we can add more functions and specific hardware optimizations:

public static partial class BitOps // .Primitive
{
       /// <summary>
       /// Reads whether the specified bit in a mask is set.
       /// </summary>
       /// <param name="value">The bit mask.</param>
       /// <param name="offset">The ordinal position of the bit to read.</param>
       public static bool ExtractBit(in byte value, in byte offset);
       public static bool ExtractBit(in ushort value, in byte offset);
       public static bool ExtractBit(in uint value, in byte offset);
       public static bool ExtractBit(in ulong value, in byte offset);

       /// <summary>
       /// Sets the specified bit in a mask and returns whether it was originally set,
       /// like BTS/BTR.
       /// </summary>
       /// <param name="value">The bit mask.</param>
       /// <param name="offset">The ordinal position of the bit to write.</param>
       /// <param name="on">True to set the bit to 1, or false to set it to 0.</param>
       public static bool InsertBit(ref uint value, in byte offset, in bool on);
       public static bool InsertBit(ref int value, in byte offset, in bool on);
       public static bool InsertBit(ref ulong value, in byte offset, in bool on);
       public static bool InsertBit(ref long value, in byte offset, in bool on);

       /// <summary>
       /// Negates the specified bit in a mask and returns whether it was originally set,
       /// like BTC.
       /// </summary>
       /// <param name="value">The bit mask.</param>
       /// <param name="offset">The ordinal position of the bit to flip.</param>
       public static bool FlipBit(ref byte value, in byte offset);
       public static bool FlipBit(ref ushort value, in byte offset);
       public static bool FlipBit(ref uint value, in byte offset);
       public static bool FlipBit(ref ulong value, in byte offset);

       /// <summary>
       /// Rotates the specified mask left by the specified number of bits.
       /// </summary>
       /// <param name="value">The value to rotate.</param>
       /// <param name="offset">The number of bits to rotate by.</param>
       /// <returns>The rotated value.</returns>
       public static byte RotateLeft(in byte value, in byte offset);
       public static byte RotateRight(in byte value, in byte offset);
       public static ushort RotateLeft(in ushort value, in byte offset);
       public static ushort RotateRight(in ushort value, in byte offset);

       // Takes advantage of existing intrinsics (https://github.com/dotnet/coreclr/pull/1830)
       public static uint RotateLeft(in uint value, in byte offset);
       public static uint RotateRight(in uint value, in byte offset);
       public static ulong RotateLeft(in ulong value, in byte offset);
       public static ulong RotateRight(in ulong value, in byte offset);

       /// <summary>
       /// Returns the population count (number of bits set) in a mask.
       /// </summary>
       /// <param name="value">The bit mask.</param>
       public static int PopCount(in byte value);
       public static int PopCount(in ushort value);
       public static int PopCount(in uint value);
       public static int PopCount(in ulong value);

       /// <summary>
       /// Count the number of leading bits in a mask.
       /// </summary>
       /// <param name="value">The mask.</param>
       /// <param name="on">True to count each 1, or false to count each 0.</param>
       public static int LeadingCount(in byte value, in bool on);
       public static int LeadingCount(in ushort value, in bool on);
       public static int LeadingCount(in uint value, in bool on);
       public static int LeadingCount(in ulong value, in bool on);

       /// <summary>
       /// Count the number of trailing bits in a mask.
       /// </summary>
       /// <param name="value">The mask.</param>
       /// <param name="on">True to count each 1, or false to count each 0.</param>
       public static int TrailingCount(in byte value, in bool on);
       public static int TrailingCount(in ushort value, in bool on);
       public static int TrailingCount(in uint value, in bool on);
       public static int TrailingCount(in ulong value, in bool on);

       // In a separate PR
       // Read/Write:byte/ushort/uint
       // Span<byte> overloads
       // FloorLog2
       // Other...
}

I will create a formal API proposal if there's any interest in doing this.
PoC code here

@redknightlois
Copy link

PopCount, LeadingOnes, LeadingZeroes, TrailingOnes, TrailingZeroes, etc could use a Span interface. In that way the API would add something not readily available on intrinsics interface

@grant-d
Copy link
Contributor

grant-d commented Sep 11, 2018

I am keeping this design minimal so that we can get the basic functionality out the door. Span overloads are a good idea and (per comments in code) planned for the next iteration.

@tannergooding
Copy link
Member

@grant-d, creating a formal API proposal would be great. It slipped off my radar a while back and I never got around to it.

It would likely be useful to incorporate the feedback @CarolEidt and I gave here (and a couple posts down): https://github.com/dotnet/corefx/issues/12425#issuecomment-356079499

I think the primary feedback was: BitOps for the class name and Insert/Extract rather than Write/Read.

@tannergooding
Copy link
Member

@redknightlois, the point of these APIs is to provide a general-purpose API that works on all platforms (which means providing a software fallback) and is generally-usable.

Hardware Intrinsics are for performance oriented scenarios where you require hardware acceleration and need more direct control of the code that is emitted.

@KrzysztofCwalina
Copy link
Member

Why would Span overloads prevent getting this "out of the door"? I think at absolute minimum we should design APIs as we want them long term, review the full design, and then possibly implement in stages.

@tannergooding
Copy link
Member

I honestly don't see the use-case for the Span APIs here. Could someone elaborate on why you would want them?

@grant-d
Copy link
Contributor

grant-d commented Sep 11, 2018

I will create a formal PR. In the meantime, I have updated the class above with @tannergooding / @CarolEidt feedback (and had already incorporated previous feedback re: BTR/BTS/BTC).

@redknightlois
Copy link

@tannergooding For example, whenever you want to use Popcount is for data structures which are compressed in bitstreams (the same apply for LeadingOnes, TrailingZeroes and the like) which means that you are not doing it on a long, but on a variable size byte stream.

Furthermore, optimized Popcount is not as simple as apply popcount and sum the results, in fact depending on the architecture different implementations may be exercised for those operations. Depending on the bitstream size it would make sense to prime the L2 cache, etc... If size is big enough, the Avx2 based Harley Seal algorithm have 50% memory throughput than using the built-in version because of higher instruction parallelism, etc...

So, it makes sense that all of that should be abstracted away given that those primitives are quintessential to high performance compressed data structures and many other performance sensitive operations. Not including the batch versions while easier could counter the actual gains that could be obtained in code simplification on its intended usage scenario.

@tannergooding
Copy link
Member

Furthermore, optimized Popcount is not as simple as apply popcount and sum the results, in fact depending on the architecture different implementations may be exercised for those operations.

This seems like a case of complicating the implementation/exposed APIs for something that may be very use-case specific and which may be better suited for exposure elsewhere.

@KrzysztofCwalina
Copy link
Member

How does it complicate APIs? I do agree it's more complex implementation, but we don't split logical overloads into separate classes/features just because implementation of some of the overloads is more tricky.

@tannergooding
Copy link
Member

tannergooding commented Sep 11, 2018

How does it complicate APIs?

Because, PopCount(uint) is a simple, general-purpose API that everyone can use and which is generally useful. It can also be used to build up more complex implementations. (and for which we have known use-cases today, even in the framework/runtime).

PopCount(Span<uint>) is a more complex API that, for the naive implementation, is just a for loop and sum (which is trivial to implement), but for an "optimized" implementation may have a lot of fine-tuning for your exact target/scenario.

I would think that PopCount(Span<uint>) is something better suited to a 3rd party library, or which should be reviewed separately if enough evidence indicating it was needed came up. The way I see it is that you are basically talking about the difference between supporting primitive types and adding APIs which support the concept of a BitVector type.

@redknightlois
Copy link

redknightlois commented Sep 11, 2018

Because, PopCount(uint) is a simple, general-purpose API that everyone can use and which is generally useful. It can also be used to build up more complex implementations. (and for which we have known use-cases today, even in the framework/runtime).

Indeed PopCount is a simple primitive that just happen to have a single instruction to act on registers on harware, but the operation is defined over every bitstream of data. Say for simplication a Span<byte> to avoid having to introduce an entirely unrelated concept like BitVector.

PopCount(Span<uint>) is a more complex API that, for the naive implementation, is just a for loop and sum (which is trivial to implement), but for an "optimized" implementation may have a lot of fine-tuning for your exact target/scenario.

That's exactly what a pit of success looks like IMHO. A sensible API should give you the latter by default and not expect the user to have to deal with that, until there is no other choice.

@tannergooding
Copy link
Member

That's exactly what a pit of success looks like IMHO. A sensible API should give you the latter by default and not expect the user to have to deal with that, until there is no other choice.

Right. But is PopCount(Span<uint>) (and similar overloads for the other BitOps methods) a common enough use-case that it warrants being part of the BCL?

For PopCount(uint) (and most of the other methods listed above), we already know they are useful. Not only have we had a number of requests for this, but we need it ourselves for various parts of the framework/runtime. However, it is still a "niche" scenario and generally targets people wanting or needing to write performance oriented code.

PopCount(Span<>) is likely even more esoteric; potentially to the point where implementing it as part of the BCL is not worthwhile.

There are many algorithms which can benefit from PopCount, but, for most inputs; it may be more desirable to intersperse the PopCount with other instructions (for better throughput, overall) or to just do a trivial for loop (because the dataset is small enough).

So, personally, I believe PopCount(Span<>) falls into the category of: "I want to see more data to support this first"

@grant-d
Copy link
Contributor

grant-d commented Sep 11, 2018

Agree with @tannergooding. I don't deny that there might be a need for Span et al, but my aim is to create a highly scoped feature set (just the primitives) in order to get it out of limbo. We can create an independent issue to track more advanced scenarios, I am more than happy to help on that too.
(I do agree there's a risk in that approach in that a staggered api might be inconsistent, we'll need diligence)

@redknightlois
Copy link

redknightlois commented Sep 11, 2018

Where I come from popcount over higher than 64 bits is hardly esoteric, it is the base operation. I don't care about PopCount anymore now that I can have the intrinsic because I built the Span one myself (with AVX2). But, I know the pitfalls, I am being paid to know exactly that. Most of that knowledge is akin to dark arts in most development circles, having a single operation aka PopCount(Span<byte>) makes yet another usual pitfall not matter for the 99%.

So, personally, I believe PopCount(Span<>) falls into the category of: "I want to see more data to support this first"

I usually tend to agree, but having seen PopCount, TrailingZeroes, etc routines crop up in several places with absurdly suboptimal implementations over the years just because 'we want to see more data' makes me a bit hesitant on this front.

EDIT: And the problem is that the for and sum is one of those cases. There is 60% difference between the two in GB/sec.

Case in point: https://github.com/dotnet/corefx/issues/2209#issuecomment-139636114

@grant-d
Copy link
Contributor

grant-d commented Sep 11, 2018

@redknightlois trying to understand the requirement: would RotateLeft(Span<byte> span, int n) rotate just the specified byte or would it rotate the whole array by n bits?

@redknightlois
Copy link

redknightlois commented Sep 11, 2018

Rotate is too implementation dependent, I personally haven't seen yet any use for that one on bitstreams though. I am talking about PopCount, Trailing\Leading|Ones\Zeroes which are typical operations. We could consider say Extract\Flip\Insert because they are easy to implement but they don't have obvious pitfalls that I have seen.

EDIT: If say RotateLeft is implemented on a Span (I probably would not implement it) the whole Span should be considered an atom, so rotate should rotate the whole thing left to be consistent.

@CarolEidt
Copy link
Contributor

I have to agree with the position that having a Span<uint> overload makes a lot of sense. I would wager that the number of places where a simple uint version would be used without it being a building block for a larger dense bitvector would be a very small percentage. Perhaps I'm looking at it too simplistically, but I can't see that it would be overly complex to support.

@grant-d
Copy link
Contributor

grant-d commented Sep 11, 2018

Here's a stab at the Span<T> signatures, should we decide it is part of the initial design.

    public static partial class BitOps // .Span
    {
        bool ExtractBit(ReadOnlySpan<byte> value, in uint offset);
        bool ExtractBit(ReadOnlySpan<uint> value, in uint offset);
        bool ExtractBit(ReadOnlySpan<ulong> value, in uint offset);
        bool ExtractBit(ReadOnlySpan<ushort> value, in uint offset);

        bool FlipBit(Span<byte> value, in uint offset);
        bool FlipBit(Span<uint> value, in uint offset);
        bool FlipBit(Span<ulong> value, in uint offset);
        bool FlipBit(Span<ushort> value, in uint offset);

        bool InsertBit(Span<byte> value, in uint offset, in bool on);
        bool InsertBit(Span<uint> value, in uint offset, in bool on);
        bool InsertBit(Span<ulong> value, in uint offset, in bool on);
        bool InsertBit(Span<ushort> value, in uint offset, in bool on);

        long PopCount(ReadOnlySpan<byte> value);
        long PopCount(ReadOnlySpan<uint> value);
        long PopCount(ReadOnlySpan<ulong> value);
        long PopCount(ReadOnlySpan<ushort> value);

        long LeadingCount(ReadOnlySpan<byte> value, in bool on);
        long LeadingCount(ReadOnlySpan<uint> value, in bool on);
        long LeadingCount(ReadOnlySpan<ulong> value, in bool on);
        long LeadingCount(ReadOnlySpan<ushort> value, in bool on);

        long TrailingCount(ReadOnlySpan<byte> value, in bool on);
        long TrailingCount(ReadOnlySpan<uint> value, in bool on);
        long TrailingCount(ReadOnlySpan<ulong> value, in bool on);
        long TrailingCount(ReadOnlySpan<ushort> value, in bool on);
    }

@tannergooding
Copy link
Member

You shouldn't need in or ref for any of the Span<T> signatures.

@grant-d
Copy link
Contributor

grant-d commented Sep 12, 2018

I like the self-documenting nature of them, even if in the case of Span they are redundant. I have removed them for consistency's sake.

@grant-d
Copy link
Contributor

grant-d commented Sep 13, 2018

Formal proposal here: https://github.com/dotnet/corefx/issues/32269

@tannergooding
Copy link
Member

Closing this in favor of https://github.com/dotnet/corefx/issues/32269

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Numerics
Projects
None yet
Development

No branches or pull requests