[sdk] Consider supporting popcnt, ctz & clz #52673

modulovalue · 2023-06-11T13:29:16Z

Dart has no support for several operations for which efficient instructions appear to exist on most supported platforms.

The three that this issue focuses on are:

popcount: returns the number of 1s.
ctz: returns the number of trailing zeros.
clz: returns the number of leading zeros.

It is not uncommon to support these operations in standard libraries. GHC and C# are examples where that is the case.

For reference:

WASM supports all of them (popcnt, ctz, clz).
CRoaring contains a utility library for portable implementations of popcount, ctz and clz on native platforms.

Proposal

Add support for popcount, ctz & clz that exposes some optimal implementation on each supported platform.

Cf. #10212
Cf. #5798

mit-mit · 2023-06-12T12:01:42Z

cc @mraleph @lrhn

lrhn · 2023-06-12T12:55:59Z

One isssue with adding these to int is that they're not directly supported on the web. It has Math.clz32 (the "32" is fine, they'll be 32-bit operations like all other bit-operations on the web), but not Math.ctz or Math.popcount.

We can do ctz as

function ctz(n) {
  n |= 0;
  if (n == 0) return 32;
  return 31 - Math.clz32(n ^ (n - 1));
}

in JavaScript, and popcount probably also 32-bit in some fairly efficient way, but that's still not "one opcode".

It just doesn't feel like it really belongs on Dart int, which is (on the web) not really a normal n-bit two's complement number. But the other bit-operations are there, so that's probably a lost cause.

I'm also not sure ARM has a popcnt instruction. like intel x86/AMD x64, so it's not a single operation there either.
(There's efficient code, maybe even short enough to inline, but not one instruction).

Since there'll be a difference between native (64-bit operatons) and web (32-bit operations with truncation), it's likely to be operations that are used in platform specific code in any case. But so is int.operator~.

We also already have bitLength, which is the same as "index of highest one bit" for positive integers, which is closely related to clz for positive numbers. The clz is pretty boring for negative numbers, because it's consistently zero.

Anyway, it's a possible addition. Even reasonable. I hate the names, but they are historical.

abstract class int {
   // ...

   /// Number of leading (most significant) zero-digits of the binary representation of this number.
   ///
   /// On the web, this value is first converted to a 32-bit integer by truncating to
   /// to the least signficant 32 bits in the binary representation of the number,
   /// and interpreting that as a signed 32-bit integer.
   /// On native, the actual 64-bit signed integer is used directly.
   ///
   /// The leading zero-bit count is always zero if the integer is negative.
   /// Otherwise, the leading zero bit count is the count of leading zeros
   /// the integer would have in base 2, if padded to the size of integer that the platform
   /// uses for bit operations (64-bit on native, 32-bit on the web).
   int get leadingZeroBitCount;

   /// Number of trailing (least significant) zero-digits of the binary representation of this number.
   ///
   /// On the web, this value is first converted to a 32-bit integer by truncating to
   /// to the least signficant 32 bits in the binary representation of the number,
   /// and interpreting that as a signed 32-bit integer.
   /// On native, the actual 64-bit signed integer is used directly.
   ///
   /// The trailing zero-bit count is position of the least signficant 
   /// 1-bit in the binary representation of the integer.
   /// If the integer is zero, the value is the size size of integer that the platform
   /// uses for bit operations (64-bit on native, 32-bit on the web).
   int get trailingZeroBitCount;

   /// The count of non-zero bits in the binary representation of this number.
   ///
   /// On the web, this value is first converted to a 32-bit integer by truncating to
   /// to the least signficant 32 bits in the binary representation of the number,
   /// and interpreting that as a signed 32-bit integer.
   /// On native, the actual 64-bit signed integer is used directly.
   ///
   /// the non-zero-bit count (aka one-bit count) is the number of "1" digits
   /// in the binary representation of that integer. A negative integer has
   /// one-digits up to the size size of integer that the platform
   /// uses for bit operations (64-bit on native, 32-bit on the web).
   /// The value of `n.nonZeroBitCount + (~n).nonZeroBitCount` is always the
   /// size the platform uses for bit operations (on the web, at least if the
   /// value starts out as a 32-bit integer).
   int get nonZeroBitCount;
}

modulovalue · 2023-06-12T16:43:24Z

I'm also not sure ARM has a popcnt instruction

Unless I misunderstood you, this article claims that popcount has existed in the ARM NEON extension since 2005 under VCNT. Pure ARM appears to just have a byte-based popcount instruction CNT

One isssue with adding these to int is that they're not directly supported on the web.

Maybe it would be useful to have something like a dart:native system library that is only supported by the wasm and vm compilation targets? I think it would be really unfortunate if users who need better performance would have to be limited by the quirks of js.

lrhn · 2023-06-13T10:10:57Z

As I read the ARM documentation (not an ARM expert!), I think the NEON VCNT instructions is also a per-byte SIMD instruction. It's probably a good first instruction to using a 64-bit popcount, then the rest can be done by a few shift+add operations, without needing extra masks.

mraleph · 2023-06-13T10:28:10Z

the rest can be done by a few shift+add operations, without needing extra masks.

I have checked what other compilers generate and learned that vcnt + uaddlv does the trick in two instructions (plus you need two moves in and out of vector registers)

mit-mit added the area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. label Jun 12, 2023

lrhn added library-core type-enhancement A request for a change that isn't a bug labels Jun 12, 2023

modulovalue mentioned this issue Jun 13, 2023

[ffi] function call overhead? #52692

Open

This was referenced Dec 22, 2023

Consider using ctz for iteration. isoos/bit_array#19

Open

Consider using popcount for finding the cardinality of a BitArray. isoos/bit_array#20

Open

Performance tracking issue isoos/bit_array#24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sdk] Consider supporting popcnt, ctz & clz #52673

[sdk] Consider supporting popcnt, ctz & clz #52673

modulovalue commented Jun 11, 2023

mit-mit commented Jun 12, 2023

lrhn commented Jun 12, 2023

modulovalue commented Jun 12, 2023

lrhn commented Jun 13, 2023

mraleph commented Jun 13, 2023 •

edited

Loading

[sdk] Consider supporting popcnt, ctz & clz #52673

[sdk] Consider supporting popcnt, ctz & clz #52673

Comments

modulovalue commented Jun 11, 2023

Proposal

mit-mit commented Jun 12, 2023

lrhn commented Jun 12, 2023

modulovalue commented Jun 12, 2023

lrhn commented Jun 13, 2023

mraleph commented Jun 13, 2023 • edited Loading

mraleph commented Jun 13, 2023 •

edited

Loading