Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimal "numberOfTrailingZeros" method #38346

Open
jtmcdole opened this issue Sep 11, 2019 · 32 comments
Open

Add optimal "numberOfTrailingZeros" method #38346

jtmcdole opened this issue Sep 11, 2019 · 32 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-enhancement A request for a change that isn't a bug type-performance Issue relates to performance or code size

Comments

@jtmcdole
Copy link
Contributor

#37789 Did a lot of speed up NTZ, but it was suggested that this basic operation could/should be provided by the SDK as the most optimal for the given platform. Example GCC:

Built-in Function: int __builtin_ctz (unsigned int x)
Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.
@aartbik aartbik self-assigned this Sep 12, 2019
@aartbik aartbik added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. vm-perf-or-size-improvement labels Sep 12, 2019
@aartbik
Copy link
Contributor

aartbik commented Sep 12, 2019

@mraleph @mkustermann @a-siva what is the proper procedure to add methods to the library, do we have a review committee for that?

The steps to be taken

  • add leading/trailing zeros to library, perhaps popcount too while we are at it, using Dart implementations as base functionality
  • recognize these method in the DartVM/compiler, lower these to the best possible instruction on each given architecture

@mraleph
Copy link
Member

mraleph commented Sep 12, 2019

do we have a review committee for that?

Lasse (@lrhn) owns core libraries, so he decides where things should go.

@lrhn lrhn added the type-enhancement A request for a change that isn't a bug label Sep 12, 2019
@lrhn
Copy link
Member

lrhn commented Sep 12, 2019

Adding methods to int is currently easy - since you can't extend or implement int, it won't break anyone. (That won't stay true after adding extension methods).

If I read the GCC docs correctly, "trailing zeros" are the low bits (not at all obvious if you don't know it already, especially when most platforms are little-endian).
We already have bitLength which is equivalent to the high zero bits for positive numbers, so there is precedence for such operations.
I assume we'd want both leading and trailing zero counts, and maybe even popcount. Do we want to count leading/trailing ones?

It's not clear what the JS compiled version of this would be, but counting low zero bits on doubles is definitely possible (negative numbers may need twiddling), but more likely they will just convert to 32-bit integers before doing any bitwise operations.

Now, names are hard.
Maybe int get lowZeroBitCount, int get highZeroBitCount, int get nonZeroBitCount? I want bit in there, I personally do not like "leading" and "trailing", but I do recognize that they are considered standard (writing the digits as a normal base-2 number, so we can define clz as (n.toRadixString(2) + "1").indexOf("1") if necessary). If we use them then countLeadingZeroBits, countTrailingZeroBits, countNonZeroBits.

Does it matter that our numbers are signed (it doesn't affect the bit patterns, but will it perhaps confuse users that -1 has no leading zeros/)

So, we can add these operations. If there is sufficient support for doing so (and people willing to do the implementation on all platforms), then I'm fine with doing it.

@jtmcdole
Copy link
Contributor Author

+1 for "{low|high}ZeroBitCount" .

Not sure how I feel about signed numbers, but if you look at the quoted issue (now closed); you can do ntz with a const array, you'd just want to make it work for 2^53?

https://gist.github.com/jtmcdole/297434f327077dbfe5fb19da3b4ef5be

The fixnum package already had some API as well:

https://pub.dev/documentation/fixnum/latest/fixnum/Int64/numberOfTrailingZeros.html

@aartbik
Copy link
Contributor

aartbik commented Sep 12, 2019

The difference between signed and unsigned is a bit moot for the DartVM, since the operations are purely defined in terms of the bits in the 64-bit int (with possible two's complement interpretation). Big or little endian does not play a role at all here, since that only relates to the order in which the individual bytes appear to the outside world (memory, network, etc). The bits in a 64-bit int are well-defined from lsb to msb on all architectures.

Or, if my words are unclear, let my table do the talking :-)

Below I generated a table with all interesting bit patterns (in hex), the unsigned and signed interpretation of these bits, and the expected values of NTZ, NLZ, and popcount.

I think the only bike shedding we can do on the int behavior is what should be returned for the corner case (0x0000000000000000) for NTZ and NLZ (all zeros).

Bit pattern (hex)   Unsigned             Signed               NTZ NLZ Popcount
------------------------------------------------------------------------------
0x8000000000000000  9223372036854775808 -9223372036854775808   63    0    1
0x8000000000000001  9223372036854775809 -9223372036854775807    0    0    2
0x8000000000000002  9223372036854775810 -9223372036854775806    1    0    2
0x8000000000000003  9223372036854775811 -9223372036854775805    0    0    3
0x8000000000000004  9223372036854775812 -9223372036854775804    2    0    2
0x8000000000000005  9223372036854775813 -9223372036854775803    0    0    3
....
0xfffffffffffffffb 18446744073709551611                   -5    0    0   63
0xfffffffffffffffc 18446744073709551612                   -4    2    0   62
0xfffffffffffffffd 18446744073709551613                   -3    0    0   63
0xfffffffffffffffe 18446744073709551614                   -2    1    0   63
0xffffffffffffffff 18446744073709551615                   -1    0    0   64
0x0000000000000000                    0                    0  <bike-shed> 0
0x0000000000000001                    1                    1    0   63    1
0x0000000000000002                    2                    2    1   62    1
0x0000000000000003                    3                    3    0   62    2
0x0000000000000004                    4                    4    2   61    1
0x0000000000000005                    5                    5    0   61    2
....
0x7ffffffffffffffa  9223372036854775802  9223372036854775802    1    1   61
0x7ffffffffffffffb  9223372036854775803  9223372036854775803    0    1   62
0x7ffffffffffffffc  9223372036854775804  9223372036854775804    2    1   61
0x7ffffffffffffffd  9223372036854775805  9223372036854775805    0    1   62
0x7ffffffffffffffe  9223372036854775806  9223372036854775806    1    1   62
0x7fffffffffffffff  9223372036854775807  9223372036854775807    0    1   63

@aartbik
Copy link
Contributor

aartbik commented Sep 13, 2019

I am okay with lowZeroBitCount/highZeroBitCount/nonZeroBitCount. Slightly non-standard nomenclature, but that is perhaps the Dart way :-) If we want to be more conventional, numberOfTrailingZeros/numberOfLeadingZeros would be the choice.

Also, no bike shedding the NTZ/NTL(0)? My suggestions is to keep it simple and return 64 in both cases.....

Also, I am mainly interested in adding this to int (not to num or double of course). Also, I am willing to do the implementation, provided I understand correctly what that involves at the library side (which is still a bit new to me). The DartVM work is certainly on me!

@rakudrama
Copy link
Member

I'd prefer we didn't add methods to int that are antagonistic to the web platform.

count-leading-zeros assumes a word length.

int.bitLength has existed for some time and is a word-length agnostic method related to counting leading zeros or ones (depending on what the most-leading, or sign, bit is).
Being word-length agnostic, it tends to 'keep working' better when the code is run compiled to JavaScript or the code is converted to BigInt, both of which have the property that there is no clear word length.

Would it be possible to better optimize the existing int.bitLength and derive the methods like clz64() and clz32() from that, implemented as extension methods or methods in dart:math? If the machine has a clz64 instruction, int.bitLength would be implemented in terms of that instruction and subtraction, so perhaps the compiler could recognize that 64-(64-x) --> x for fixed-word-length arithmetic.

@lrhn
Copy link
Member

lrhn commented Sep 16, 2019

I've assume that the JS operations would be bitwise operations and convert to Int32 before counting.
That would mean that 0.highZeroBits returns 32 in JavaScript and 64 in native code. That's consistent and compatible with the other bitwise operations in JavaScript (|, &, ^, ~), but not with bitLength.

The problem with using the JS double's representation for anything is that most of the operations gives different results even for numbers that are (negative) 32 bit integers.

We might be able to find a reasonable compromise that still gives the VM a useful result, which also allowing dart2js to not truncate to 32 bits, but I'm not sure it's worth it. Truncating is probably better (or at least more consistent, I'm not particularly fond of of that discrepancy to begin with).

Leading zeros is the only one which assumes a finite length for all operands (trailing zeros only does so for 0). It is definable in terms of bitLength (x < 0 ? 0 : 64 - x.bitLength) so omitting it is not impossible. We can also do it as countHighZeroBits([int bitSize]) which first truncates to toSigned(bitSize) and then counts relative to that, or countHighZeroBits32 and countHighZeroBits64 both being available (although the 32 looks spurious).

The population count can work for positive doubles, but for negative values the result is different from the same on the VM if we just count bits in mantissa. We can flip the result for negative numbers, but only if we know the bit-size, so again, maybe: countNonZeroBits([int bitSize]) which truncates/sign-extends to that number of bits and counts the ones.

That leaves the low zero bits. Converting to a finite width isn't as useful here, this is more like bitLength. Also, for 2's complement numbers, it's symmetric around zero (-2 and 2 are both even numbers), so you can ignore the sign. We still need a value for zero.
If we define it as returning the position of the least signficant non-zero bit (where bitLength returns the most significant one) for positive numbes, and -1 for zero, then JS can answer this truthfully for doubles as well.
This effectively sign/zero-extends the number to "infinite precision two's complement" and gives the position of the first 1 bit (if any).

Just truncating to 32-bit operations for JS is easier. It gives different results for counting leading zeros, but I think that's as acceptable as what we do for and/or/xor.

dart-bot pushed a commit that referenced this issue Sep 17, 2019
Rationale:
We are adding library support for bit operations:
  lowZeroBitCount/highZeroBitCount/nonZeroBitCount
The added instructions for IA32 and X64 will enable
very efficient direct native implementations of
these operations in the DartVM codegen.

#38346

Change-Id: I04da6138eed45eb7b21cc310594c5c7ba7942750
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/117585
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
@mraleph mraleph added type-performance Issue relates to performance or code size and removed type-enhancement A request for a change that isn't a bug vm-perf-or-size-improvement labels Sep 18, 2019
dart-bot pushed a commit that referenced this issue Sep 18, 2019
Rationale:
We are adding library support for bit operations:
  lowZeroBitCount/highZeroBitCount/nonZeroBitCount
The added features will enable very efficient direct
native implementations of these operations in the
DartVM codegen.

#38346

Change-Id: Id734a7590666c6abdc2a683a240094c6bf6cd46a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/117768
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
@mraleph mraleph added this to Backlog in Dart VM Performance Sep 20, 2019
@mraleph mraleph moved this from Backlog to Icebox in Dart VM Performance Sep 20, 2019
@mraleph mraleph moved this from Icebox to In Progress in Dart VM Performance Sep 20, 2019
@lrhn lrhn added type-enhancement A request for a change that isn't a bug library-core and removed area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. labels Sep 26, 2019
@lrhn
Copy link
Member

lrhn commented Sep 26, 2019

I'm still open to adding these operations, if we can find a good design that people can agree on (one that people can agree on and that I think is good, obviously 😄).

The operations are not VM-only if they exist on int, so they should work in some way when compiled to JS. The JS compiler people can decide exactly how they work, as long as it is somewhat reasonable, and preferably gives the same result as the VM on the same numbers, at least in a specific known range (up to the Int32 range is one option).

@leafpetersen
How do you feel about the proposed names?

We have generally not used "assembler mnemonic" names for operations in Dart , so bsf, popcnt, and clz are not good Dart names.

Another option, which is not as general, is to introduce a VM-only library, dart:bits, with (assembler-mnemonic named?) "intrinsic" functions. That would mean that the operations are not available on the web (but then, neither are 64-bit integers, so operations on the bits of 64-bit integers are not directly convertible without getting into WASM, and then complexity ensues).
I'd hope to avoid going down that road since it may increase fragmentation in the ecosystem.

@rakudrama
Copy link
Member

ddc/dart2js bitLength works outside the 32-bit range - it is effectively integer log base 2.
If the accumulated errors from doing integer arithmetic using doubles does not affect the most-significant bit, then the result is always correct and consistent with the VM.

The implementation is written to take advantage of JavaScript's Math.clz32, which we will do when we drop support for IE11.

If we provide countHighZeroBits32 then it could compile directly to Math.clz32
countHighZeroBits64 would be a small method.
For naming, I would consider removing 'count', i.e. highZeroBits32 and highZeroBits64.

@mraleph If we decided on the word-bit-length being a parameter, would you think it a reasonable burden to put on the compiler(s) to recognize .highZeroBits(32) and .highZeroBits(64)?

Counting trailing zeros, of course, is much more sensitive to rounding errors.

dart-bot pushed a commit that referenced this issue Oct 2, 2019
Rationale:
We are adding library support for bit operations:
  lowZeroBitCount/highZeroBitCount/nonZeroBitCount
The rbit on AM64 will enable very efficient direct
native implementations of these operations in the
DartVM codegen.

#38346

Change-Id: I4456a622d7bad1a2cfbf9383927c76983c1de9ab
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119586
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Oct 2, 2019
Rationale:
We are adding library support for bit operations:
  lowZeroBitCount/highZeroBitCount/nonZeroBitCount
The rbit on AM32 will enable very efficient direct
native implementations of these operations in the
DartVM codegen.

#38346

Change-Id: Icedf4af301b5c8011922fccb1530a9c28e5c0964
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119600
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Oct 2, 2019
Rationale:
Useful to implement the simulator, but also useful
in the future when compiler wants to constant fold
operations related to bit reversal.

#38346

Change-Id: I2f09ba6b1897202b5c07a2cff771afd7bbc99495
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119703
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
@aartbik
Copy link
Contributor

aartbik commented Oct 3, 2019

While implementing the VM part of this feature, I had to add the rbit instruction to ARM. Given this direct native support, we may want to consider adding the int.reverse() method (reverse bits lsb <-> msb) as well.

@leafpetersen
Copy link
Member

@leafpetersen
How do you feel about the proposed names?

I'm fine with the names. Are we in alignment on the VM and JS sides now?

@aartbik
Copy link
Contributor

aartbik commented Oct 3, 2019

I have completed the backend work for the VM (waiting for the library methods of course). Here are the results on a x86_64 with a quick prototype recognizer that maps John's ntz64 to the internal bitwise operator. On JIT and AOT, this roughly gives 3x and 2.3x speedup, respectively. I am starting a perf job to get some results across other architectures.

JIT
DartNTZ.NTZ32(RunTime): 437.4761430446194 us.
DartNTZ.NTZ64(RunTime): 373.9482237801458 us.  -> 124.38441940298509 us.

AOT
DartNTZ.NTZ32(RunTime): 166.85242671227164 us.
DartNTZ.NTZ64(RunTime): 168.2568121477244 us.   ->  72.05758981841764 us.

@aartbik
Copy link
Contributor

aartbik commented Oct 3, 2019

And very promising results with the prototype on x86 and arm on our performance cluster (% improvement).

JIT
DartNTZ.NTZ64 (Intel Core i5) 53.62% (3.1 noise)
DartNTZ.NTZ64 (Intel Xeon) 79.46% (9.8 noise)
DartNTZ.NTZ64 (Odroid-C2) 76.70% (2.6 noise)

AOT
DartNTZ.NTZ64 (Intel Xeon)   121.5% (17.1 noise)
DartNTZ.NTZ64 (Intel Core i5) 160.9% (79.4 noise)
DartNTZ.NTZ64 (Odroid-C2) 164.9% (6.8 noise)

@jtmcdole
Copy link
Contributor Author

jtmcdole commented Oct 3, 2019 via email

@aartbik
Copy link
Contributor

aartbik commented Oct 3, 2019

@jtmcdole yes, lacking the library support, I have a prototype that naively replaces every "ntz64" call with the VM internal implementation of NTZ (bsf on intel, rbit/clz on arm). So the speedup obtained is relative to your table-optimized implementation. Once we have the library methods, I can also provide a relative speedup over these reference implementations, of course.

dart-bot pushed a commit that referenced this issue Oct 7, 2019
Rationale:
Useful in the future when compiler wants to constant fold
operations related to leading/trailing/counting (non)zeros.

#38346

Change-Id: I158aca125d3a15019af20ca7c2a516b74e4d84c2
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119920
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
@aartbik
Copy link
Contributor

aartbik commented Oct 9, 2019

Can we make progress please? I initially proposed to add the library methods myself but the library team decided to take over. I am fine with that but let's not stall for no apparent reason. I have provided comments on the pending library CL and I have another CL ready with all the required backend VM support.

@mraleph
Copy link
Member

mraleph commented Oct 9, 2019

I have a high-level question: what is the motivating use case for this beyond a microbenchmark?

@jtmcdole John, could you share what are you trying to speed up?

@jtmcdole
Copy link
Contributor Author

jtmcdole commented Oct 9, 2019 via email

@aartbik
Copy link
Contributor

aartbik commented Oct 9, 2019

In ART (Android Runtime) we added a lot of bit manipulating intrinsics for the public API (besides the one above, also reverse, lowestOneBit, highestOneBit). Some of these were responsible for great improvements in my 8x8 checkers app on Google Play, something that was mentioned on Google I/O. Just saying ;-)

@aartbik aartbik removed their assignment Oct 22, 2019
@mraleph mraleph moved this from In Progress to Icebox in Dart VM Performance Nov 7, 2019
@rokob
Copy link

rokob commented Aug 10, 2020

Any hope of this progressing anytime soon? It looks like all the backend work is in place. I'd be happy to help if there is a willingness to move this forward.

@mraleph
Copy link
Member

mraleph commented Aug 10, 2020

@rokob This is hanging in the limbo - due to the lack of practical motivation (i.e. outside of some microbenchmarks it does not look like there are a lot of use cases that can benefit). Any particular reason you are interested in this API?

@rokob
Copy link

rokob commented Aug 10, 2020

I was implementing a specialized variant of a skip list and needed to be able to compute the value N for which an integer i is divisible by 2^k for all k in [1,N], which is basically the count of trailing zeros. I don't much mind using the lookup table approach, but I have used the intrinsic before and decided to look if it was available on the int type. I saw bitLength and assumed this was similar enough to just be an omission. I eventually found my way to this issue.

@rokob
Copy link

rokob commented Aug 10, 2020

I also thought it already existed and I just realized why. I have a need for 64-bit cross platform ints, so I end up using the fixnum package which has https://pub.dev/documentation/fixnum/latest/fixnum/Int64/numberOfTrailingZeros.html

tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Rationale:
We are adding library support for bit operations:
  lowZeroBitCount/highZeroBitCount/nonZeroBitCount
The added instructions for IA32 and X64 will enable
very efficient direct native implementations of
these operations in the DartVM codegen.

dart-lang#38346

Change-Id: I04da6138eed45eb7b21cc310594c5c7ba7942750
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/117585
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Rationale:
We are adding library support for bit operations:
  lowZeroBitCount/highZeroBitCount/nonZeroBitCount
The added features will enable very efficient direct
native implementations of these operations in the
DartVM codegen.

dart-lang#38346

Change-Id: Id734a7590666c6abdc2a683a240094c6bf6cd46a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/117768
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Rationale:
We are adding library support for bit operations:
  lowZeroBitCount/highZeroBitCount/nonZeroBitCount
The rbit on AM64 will enable very efficient direct
native implementations of these operations in the
DartVM codegen.

dart-lang#38346

Change-Id: I4456a622d7bad1a2cfbf9383927c76983c1de9ab
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119586
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Rationale:
We are adding library support for bit operations:
  lowZeroBitCount/highZeroBitCount/nonZeroBitCount
The rbit on AM32 will enable very efficient direct
native implementations of these operations in the
DartVM codegen.

dart-lang#38346

Change-Id: Icedf4af301b5c8011922fccb1530a9c28e5c0964
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119600
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Rationale:
Useful in the future when compiler wants to constant fold
operations related to leading/trailing/counting (non)zeros.

dart-lang#38346

Change-Id: I158aca125d3a15019af20ca7c2a516b74e4d84c2
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119920
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
@dupuchba
Copy link

We are also very interested with popcnt for Clojuredart.
We use it extensively for computing bitmaps for PersistentHashMap on lookups, insert and removal.
We are using the popular implementation from "Bit Twiddling Hacks" but it takes ~12 instructions.

I would be grateful to have updates on this issue. Thanks a lot 👌

@mraleph
Copy link
Member

mraleph commented May 20, 2021

@lrhn any idea on how we can get https://dart-review.googlesource.com/c/sdk/+/117321 moving?

Alternatively we could just add an "intrinsic" mechanism to dart:ffi to enable people write their own inline assembly, then we don't need to add this to core libraries, but that's might be a bit too radical of a proposal.

@lrhn
Copy link
Member

lrhn commented May 20, 2021

I'm not sure we ever reached complete agreement on the naming or operations.

Should we have highZeroBitCount(int bitWidth), or a pair of highZeroBitCount32/highZeroBitCount64?
@rakudrama suggested that above.

Should lowZeroBitCount instead be lowestOneBitPosition (giving -1 when asked on zero)? That avoids having the answer depend on the bit width of the number, but -1 is a weird result.

@dupuchba
Copy link

Alternatively we could just add an "intrinsic" mechanism to dart:ffi to enable people write their own inline assembly, then we don't need to add this to core libraries, but that's might be a bit too radical of a proposal.

@mraleph that would be 🏆 💯 🥇

@SteveAlexander
Copy link

Should lowZeroBitCount instead be lowestOneBitPosition (giving -1 when asked on zero)? That avoids having the answer depend on the bit width of the number, but -1 is a weird result.

Should it return an int, and -1 for being asked on zero, or should it return an int? and null when asked on zero?

If it gives a -1 when asked on zero, then the analyzer is not able to help me ensure I deal with this edge case. If it returns int? and returns null when asked on zero, then the analyzer (via the type system) helps me ensure I've dealt with the case for zero.

@mraleph
Copy link
Member

mraleph commented May 20, 2021

int? is pretty bad choice from the performance perspective: would require branching to handle null case even if user knows that null is not possible as a return value.

I think from performance perspective it might be enough to provide lowZeroBitCount and highZeroBitCount - which have very clear lowerings to 1-2 machine instructions, the rest can be computed from those by the user.

@rakudrama
Copy link
Member

I hope we can find a solution that works consistently and well for int, Int64 and BigInt.

Many of these operations (e.g. highZeroBitCount, reverseBits) need a word size.
(Intrinsics in C libraries have variants for different word sizes.)

When writing code to be portable between VM and web, the VM will need 32-bit versions of some of these operations.

I like the idea of a width as a parameter.
It generalizes to something useful for BigInt, and can be implemented on Int64 up to width 64.
I think both BigInt and Int64 implementations are important to provide easy-to-find options for correct manipulation of 64-bit quantities on the web.

Presumably the int code can be written so that compilers generate the special instructions for the special cases 32 and 64.
Maybe something like this would work on the VM:

@pragma('vm:prefer-inline') // perhaps conditional on constant [width]
int highZeroBitCount(int width) {
  if (width == 64) return _highZeroBitCount64(this); // select version that has machine instruction
  if (width == 32) return _highZeroBitCount32(this); // ""
  return _highZeroBitCountVariable(this, width); // general version for completeness.
}

(The default implementation of the fixed-width code can call the variable-width code as an aid to porting to new architectures.)

@gnudles
Copy link
Contributor

gnudles commented Aug 18, 2021

It might be out of topic, and I'm sorry for that, but some may find it useful or cheering:
A collection of fast bit operations when they are not available natively:
https://graphics.stanford.edu/~seander/bithacks.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core type-enhancement A request for a change that isn't a bug type-performance Issue relates to performance or code size
Projects
Development

No branches or pull requests

10 participants