New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimal "numberOfTrailingZeros" method #38346
Comments
@mraleph @mkustermann @a-siva what is the proper procedure to add methods to the library, do we have a review committee for that? The steps to be taken
|
Lasse (@lrhn) owns core libraries, so he decides where things should go. |
Adding methods to If I read the GCC docs correctly, "trailing zeros" are the low bits (not at all obvious if you don't know it already, especially when most platforms are little-endian). It's not clear what the JS compiled version of this would be, but counting low zero bits on doubles is definitely possible (negative numbers may need twiddling), but more likely they will just convert to 32-bit integers before doing any bitwise operations. Now, names are hard. Does it matter that our numbers are signed (it doesn't affect the bit patterns, but will it perhaps confuse users that So, we can add these operations. If there is sufficient support for doing so (and people willing to do the implementation on all platforms), then I'm fine with doing it. |
+1 for "{low|high}ZeroBitCount" . Not sure how I feel about signed numbers, but if you look at the quoted issue (now closed); you can do ntz with a const array, you'd just want to make it work for 2^53? https://gist.github.com/jtmcdole/297434f327077dbfe5fb19da3b4ef5be The fixnum package already had some API as well: https://pub.dev/documentation/fixnum/latest/fixnum/Int64/numberOfTrailingZeros.html |
The difference between signed and unsigned is a bit moot for the DartVM, since the operations are purely defined in terms of the bits in the 64-bit int (with possible two's complement interpretation). Big or little endian does not play a role at all here, since that only relates to the order in which the individual bytes appear to the outside world (memory, network, etc). The bits in a 64-bit int are well-defined from lsb to msb on all architectures. Or, if my words are unclear, let my table do the talking :-) Below I generated a table with all interesting bit patterns (in hex), the unsigned and signed interpretation of these bits, and the expected values of NTZ, NLZ, and popcount. I think the only bike shedding we can do on the int behavior is what should be returned for the corner case (0x0000000000000000) for NTZ and NLZ (all zeros).
|
I am okay with lowZeroBitCount/highZeroBitCount/nonZeroBitCount. Slightly non-standard nomenclature, but that is perhaps the Dart way :-) If we want to be more conventional, numberOfTrailingZeros/numberOfLeadingZeros would be the choice. Also, no bike shedding the NTZ/NTL(0)? My suggestions is to keep it simple and return 64 in both cases..... Also, I am mainly interested in adding this to int (not to num or double of course). Also, I am willing to do the implementation, provided I understand correctly what that involves at the library side (which is still a bit new to me). The DartVM work is certainly on me! |
I'd prefer we didn't add methods to count-leading-zeros assumes a word length.
Would it be possible to better optimize the existing |
I've assume that the JS operations would be bitwise operations and convert to Int32 before counting. The problem with using the JS double's representation for anything is that most of the operations gives different results even for numbers that are (negative) 32 bit integers. We might be able to find a reasonable compromise that still gives the VM a useful result, which also allowing dart2js to not truncate to 32 bits, but I'm not sure it's worth it. Truncating is probably better (or at least more consistent, I'm not particularly fond of of that discrepancy to begin with). Leading zeros is the only one which assumes a finite length for all operands (trailing zeros only does so for The population count can work for positive doubles, but for negative values the result is different from the same on the VM if we just count bits in mantissa. We can flip the result for negative numbers, but only if we know the bit-size, so again, maybe: That leaves the low zero bits. Converting to a finite width isn't as useful here, this is more like Just truncating to 32-bit operations for JS is easier. It gives different results for counting leading zeros, but I think that's as acceptable as what we do for and/or/xor. |
Rationale: We are adding library support for bit operations: lowZeroBitCount/highZeroBitCount/nonZeroBitCount The added instructions for IA32 and X64 will enable very efficient direct native implementations of these operations in the DartVM codegen. #38346 Change-Id: I04da6138eed45eb7b21cc310594c5c7ba7942750 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/117585 Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: We are adding library support for bit operations: lowZeroBitCount/highZeroBitCount/nonZeroBitCount The added features will enable very efficient direct native implementations of these operations in the DartVM codegen. #38346 Change-Id: Id734a7590666c6abdc2a683a240094c6bf6cd46a Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/117768 Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
I'm still open to adding these operations, if we can find a good design that people can agree on (one that people can agree on and that I think is good, obviously 😄). The operations are not VM-only if they exist on @leafpetersen We have generally not used "assembler mnemonic" names for operations in Dart , so Another option, which is not as general, is to introduce a VM-only library, |
ddc/dart2js The implementation is written to take advantage of JavaScript's If we provide @mraleph If we decided on the word-bit-length being a parameter, would you think it a reasonable burden to put on the compiler(s) to recognize Counting trailing zeros, of course, is much more sensitive to rounding errors. |
Rationale: We are adding library support for bit operations: lowZeroBitCount/highZeroBitCount/nonZeroBitCount The rbit on AM64 will enable very efficient direct native implementations of these operations in the DartVM codegen. #38346 Change-Id: I4456a622d7bad1a2cfbf9383927c76983c1de9ab Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119586 Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: We are adding library support for bit operations: lowZeroBitCount/highZeroBitCount/nonZeroBitCount The rbit on AM32 will enable very efficient direct native implementations of these operations in the DartVM codegen. #38346 Change-Id: Icedf4af301b5c8011922fccb1530a9c28e5c0964 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119600 Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: Useful to implement the simulator, but also useful in the future when compiler wants to constant fold operations related to bit reversal. #38346 Change-Id: I2f09ba6b1897202b5c07a2cff771afd7bbc99495 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119703 Reviewed-by: Alexander Markov <alexmarkov@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
While implementing the VM part of this feature, I had to add the rbit instruction to ARM. Given this direct native support, we may want to consider adding the int.reverse() method (reverse bits lsb <-> msb) as well. |
I'm fine with the names. Are we in alignment on the VM and JS sides now? |
I have completed the backend work for the VM (waiting for the library methods of course). Here are the results on a x86_64 with a quick prototype recognizer that maps John's ntz64 to the internal bitwise operator. On JIT and AOT, this roughly gives 3x and 2.3x speedup, respectively. I am starting a perf job to get some results across other architectures. JIT
DartNTZ.NTZ32(RunTime): 437.4761430446194 us.
DartNTZ.NTZ64(RunTime): 373.9482237801458 us. -> 124.38441940298509 us.
AOT
DartNTZ.NTZ32(RunTime): 166.85242671227164 us.
DartNTZ.NTZ64(RunTime): 168.2568121477244 us. -> 72.05758981841764 us. |
And very promising results with the prototype on x86 and arm on our performance cluster (% improvement). JIT
DartNTZ.NTZ64 (Intel Core i5) 53.62% (3.1 noise)
DartNTZ.NTZ64 (Intel Xeon) 79.46% (9.8 noise)
DartNTZ.NTZ64 (Odroid-C2) 76.70% (2.6 noise)
AOT
DartNTZ.NTZ64 (Intel Xeon) 121.5% (17.1 noise)
DartNTZ.NTZ64 (Intel Core i5) 160.9% (79.4 noise)
DartNTZ.NTZ64 (Odroid-C2) 164.9% (6.8 noise) |
Is this with your vm version of ntz?
|
@jtmcdole yes, lacking the library support, I have a prototype that naively replaces every "ntz64" call with the VM internal implementation of NTZ (bsf on intel, rbit/clz on arm). So the speedup obtained is relative to your table-optimized implementation. Once we have the library methods, I can also provide a relative speedup over these reference implementations, of course. |
Rationale: Useful in the future when compiler wants to constant fold operations related to leading/trailing/counting (non)zeros. #38346 Change-Id: I158aca125d3a15019af20ca7c2a516b74e4d84c2 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119920 Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Can we make progress please? I initially proposed to add the library methods myself but the library team decided to take over. I am fine with that but let's not stall for no apparent reason. I have provided comments on the pending library CL and I have another CL ready with all the required backend VM support. |
I have a high-level question: what is the motivating use case for this beyond a microbenchmark? @jtmcdole John, could you share what are you trying to speed up? |
two things:
(1) software encryption i was working on as a learning exercise.
(2) walking over bit patterns sent isochronously from embedded devices for the "fun" tag.
|
In ART (Android Runtime) we added a lot of bit manipulating intrinsics for the public API (besides the one above, also reverse, lowestOneBit, highestOneBit). Some of these were responsible for great improvements in my 8x8 checkers app on Google Play, something that was mentioned on Google I/O. Just saying ;-) |
Any hope of this progressing anytime soon? It looks like all the backend work is in place. I'd be happy to help if there is a willingness to move this forward. |
@rokob This is hanging in the limbo - due to the lack of practical motivation (i.e. outside of some microbenchmarks it does not look like there are a lot of use cases that can benefit). Any particular reason you are interested in this API? |
I was implementing a specialized variant of a skip list and needed to be able to compute the value N for which an integer i is divisible by 2^k for all k in [1,N], which is basically the count of trailing zeros. I don't much mind using the lookup table approach, but I have used the intrinsic before and decided to look if it was available on the int type. I saw bitLength and assumed this was similar enough to just be an omission. I eventually found my way to this issue. |
I also thought it already existed and I just realized why. I have a need for 64-bit cross platform ints, so I end up using the fixnum package which has https://pub.dev/documentation/fixnum/latest/fixnum/Int64/numberOfTrailingZeros.html |
Rationale: We are adding library support for bit operations: lowZeroBitCount/highZeroBitCount/nonZeroBitCount The added instructions for IA32 and X64 will enable very efficient direct native implementations of these operations in the DartVM codegen. dart-lang#38346 Change-Id: I04da6138eed45eb7b21cc310594c5c7ba7942750 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/117585 Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: We are adding library support for bit operations: lowZeroBitCount/highZeroBitCount/nonZeroBitCount The added features will enable very efficient direct native implementations of these operations in the DartVM codegen. dart-lang#38346 Change-Id: Id734a7590666c6abdc2a683a240094c6bf6cd46a Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/117768 Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: We are adding library support for bit operations: lowZeroBitCount/highZeroBitCount/nonZeroBitCount The rbit on AM64 will enable very efficient direct native implementations of these operations in the DartVM codegen. dart-lang#38346 Change-Id: I4456a622d7bad1a2cfbf9383927c76983c1de9ab Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119586 Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: We are adding library support for bit operations: lowZeroBitCount/highZeroBitCount/nonZeroBitCount The rbit on AM32 will enable very efficient direct native implementations of these operations in the DartVM codegen. dart-lang#38346 Change-Id: Icedf4af301b5c8011922fccb1530a9c28e5c0964 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119600 Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: Useful in the future when compiler wants to constant fold operations related to leading/trailing/counting (non)zeros. dart-lang#38346 Change-Id: I158aca125d3a15019af20ca7c2a516b74e4d84c2 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/119920 Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
We are also very interested with popcnt for Clojuredart. I would be grateful to have updates on this issue. Thanks a lot 👌 |
@lrhn any idea on how we can get https://dart-review.googlesource.com/c/sdk/+/117321 moving? Alternatively we could just add an "intrinsic" mechanism to |
I'm not sure we ever reached complete agreement on the naming or operations. Should we have Should |
@mraleph that would be 🏆 💯 🥇 |
Should it return an If it gives a -1 when asked on zero, then the analyzer is not able to help me ensure I deal with this edge case. If it returns |
I think from performance perspective it might be enough to provide |
I hope we can find a solution that works consistently and well for Many of these operations (e.g. When writing code to be portable between VM and web, the VM will need 32-bit versions of some of these operations. I like the idea of a width as a parameter. Presumably the @pragma('vm:prefer-inline') // perhaps conditional on constant [width]
int highZeroBitCount(int width) {
if (width == 64) return _highZeroBitCount64(this); // select version that has machine instruction
if (width == 32) return _highZeroBitCount32(this); // ""
return _highZeroBitCountVariable(this, width); // general version for completeness.
} (The default implementation of the fixed-width code can call the variable-width code as an aid to porting to new architectures.) |
It might be out of topic, and I'm sorry for that, but some may find it useful or cheering: |
#37789 Did a lot of speed up NTZ, but it was suggested that this basic operation could/should be provided by the SDK as the most optimal for the given platform. Example GCC:
The text was updated successfully, but these errors were encountered: