Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sdk] Consider supporting popcnt, ctz & clz #52673

Open
modulovalue opened this issue Jun 11, 2023 · 5 comments
Open

[sdk] Consider supporting popcnt, ctz & clz #52673

modulovalue opened this issue Jun 11, 2023 · 5 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-enhancement A request for a change that isn't a bug

Comments

@modulovalue
Copy link
Contributor

Dart has no support for several operations for which efficient instructions appear to exist on most supported platforms.

The three that this issue focuses on are:

  • popcount: returns the number of 1s.
  • ctz: returns the number of trailing zeros.
  • clz: returns the number of leading zeros.

It is not uncommon to support these operations in standard libraries. GHC and C# are examples where that is the case.

For reference:

  • WASM supports all of them (popcnt, ctz, clz).
  • CRoaring contains a utility library for portable implementations of popcount, ctz and clz on native platforms.

Proposal

Add support for popcount, ctz & clz that exposes some optimal implementation on each supported platform.


Cf. #10212
Cf. #5798

@mit-mit
Copy link
Member

mit-mit commented Jun 12, 2023

cc @mraleph @lrhn

@mit-mit mit-mit added the area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. label Jun 12, 2023
@lrhn
Copy link
Member

lrhn commented Jun 12, 2023

One isssue with adding these to int is that they're not directly supported on the web. It has Math.clz32 (the "32" is fine, they'll be 32-bit operations like all other bit-operations on the web), but not Math.ctz or Math.popcount.

We can do ctz as

function ctz(n) {
  n |= 0;
  if (n == 0) return 32;
  return 31 - Math.clz32(n ^ (n - 1));
}

in JavaScript, and popcount probably also 32-bit in some fairly efficient way, but that's still not "one opcode".

It just doesn't feel like it really belongs on Dart int, which is (on the web) not really a normal n-bit two's complement number. But the other bit-operations are there, so that's probably a lost cause.

I'm also not sure ARM has a popcnt instruction. like intel x86/AMD x64, so it's not a single operation there either.
(There's efficient code, maybe even short enough to inline, but not one instruction).

Since there'll be a difference between native (64-bit operatons) and web (32-bit operations with truncation), it's likely to be operations that are used in platform specific code in any case. But so is int.operator~.

We also already have bitLength, which is the same as "index of highest one bit" for positive integers, which is closely related to clz for positive numbers. The clz is pretty boring for negative numbers, because it's consistently zero.

Anyway, it's a possible addition. Even reasonable. I hate the names, but they are historical.

abstract class int {
   // ...

   /// Number of leading (most significant) zero-digits of the binary representation of this number.
   ///
   /// On the web, this value is first converted to a 32-bit integer by truncating to
   /// to the least signficant 32 bits in the binary representation of the number,
   /// and interpreting that as a signed 32-bit integer.
   /// On native, the actual 64-bit signed integer is used directly.
   ///
   /// The leading zero-bit count is always zero if the integer is negative.
   /// Otherwise, the leading zero bit count is the count of leading zeros
   /// the integer would have in base 2, if padded to the size of integer that the platform
   /// uses for bit operations (64-bit on native, 32-bit on the web).
   int get leadingZeroBitCount;

   /// Number of trailing (least significant) zero-digits of the binary representation of this number.
   ///
   /// On the web, this value is first converted to a 32-bit integer by truncating to
   /// to the least signficant 32 bits in the binary representation of the number,
   /// and interpreting that as a signed 32-bit integer.
   /// On native, the actual 64-bit signed integer is used directly.
   ///
   /// The trailing zero-bit count is position of the least signficant 
   /// 1-bit in the binary representation of the integer.
   /// If the integer is zero, the value is the size size of integer that the platform
   /// uses for bit operations (64-bit on native, 32-bit on the web).
   int get trailingZeroBitCount;

   /// The count of non-zero bits in the binary representation of this number.
   ///
   /// On the web, this value is first converted to a 32-bit integer by truncating to
   /// to the least signficant 32 bits in the binary representation of the number,
   /// and interpreting that as a signed 32-bit integer.
   /// On native, the actual 64-bit signed integer is used directly.
   ///
   /// the non-zero-bit count (aka one-bit count) is the number of "1" digits
   /// in the binary representation of that integer. A negative integer has
   /// one-digits up to the size size of integer that the platform
   /// uses for bit operations (64-bit on native, 32-bit on the web).
   /// The value of `n.nonZeroBitCount + (~n).nonZeroBitCount` is always the
   /// size the platform uses for bit operations (on the web, at least if the
   /// value starts out as a 32-bit integer).
   int get nonZeroBitCount;
}

@lrhn lrhn added library-core type-enhancement A request for a change that isn't a bug labels Jun 12, 2023
@modulovalue
Copy link
Contributor Author

I'm also not sure ARM has a popcnt instruction

Unless I misunderstood you, this article claims that popcount has existed in the ARM NEON extension since 2005 under VCNT. Pure ARM appears to just have a byte-based popcount instruction CNT

One isssue with adding these to int is that they're not directly supported on the web.

Maybe it would be useful to have something like a dart:native system library that is only supported by the wasm and vm compilation targets? I think it would be really unfortunate if users who need better performance would have to be limited by the quirks of js.

@lrhn
Copy link
Member

lrhn commented Jun 13, 2023

As I read the ARM documentation (not an ARM expert!), I think the NEON VCNT instructions is also a per-byte SIMD instruction. It's probably a good first instruction to using a 64-bit popcount, then the rest can be done by a few shift+add operations, without needing extra masks.

@mraleph
Copy link
Member

mraleph commented Jun 13, 2023

the rest can be done by a few shift+add operations, without needing extra masks.

I have checked what other compilers generate and learned that vcnt + uaddlv does the trick in two instructions (plus you need two moves in and out of vector registers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

4 participants