[fix][misc] Fix bugs in hash bucket algorithm implementation in Pulsar collections#21107
[fix][misc] Fix bugs in hash bucket algorithm implementation in Pulsar collections#21107lhotari wants to merge 2 commits intoapache:masterfrom
Conversation
…multiple elements of array Fixes apache#21106 - Fix a bug in ConcurrentLongLongPairHashMap, ConcurrentLongPairSet and ConcurrentOpenHashMap where the hash bucket's storage index was incorrectly calculated. - the impact of the bug has been that the implementations have been rather unoptimal because of of frequent collisions in storing the item to the underlying array. Essentially the hash bucket algorithm hasn't been properly used.
|
The |
| if (mod < 0) { | ||
| mod += divisor; | ||
| } |
There was a problem hiding this comment.
As max-1 > 0, we won't get a negative result with (int) n & (max - 1) even if n is negative. This is unnecessary.
| static final int signSafeMod(long n, int max) { | ||
| return (int) n & (max - 1); | ||
| static final int signSafeMod(long dividend, int divisor) { | ||
| int mod = (int) (dividend % divisor); |
There was a problem hiding this comment.
Using & is faster than the %, which is a trick to speed up.
| int bucket = signSafeMod(keyHash, capacity); | ||
| int bucketIndex = signSafeMod(keyHash, capacity) * ITEM_SIZE; |
There was a problem hiding this comment.
* 4 is equivalent to << 2, moving this logic out of signSafeMod method seems meaningless?
There was a problem hiding this comment.
The problem with the old code is that it is cryptic. In the original code, signSafeMod function's name is misleading. In the cases where the item takes 2 or 4 elements, there's a bitwise shift to left to multiply by 2 (<< 1) or 4 (<< 2), directly in the signSafeMod function. It's bad from maintainability perspective to have functions with names that don't match the implementation. I doubt that using bitwise operations to optimize the speed of execution is worthwhile in Java code. It should be left to the compiler to do such optimizations.
There was a problem hiding this comment.
Change method name signSafeMod to hash may be better, right?
And i wonder whether the compiler will do such optimizations, could you verify this? thanks.
IMO, these util class adopt some kind of ideas from common util class like HashMap, which use & to avoid % too.
There was a problem hiding this comment.
The modern Java compilers are smart enough to recognize when multiplication is equivalent to a bit shift and can optimize the multiplication operation to use a bit shift instead. It's better to focus on writing clear, readable code.
| static final int signSafeMod(long dividend, int divisor) { | ||
| int mod = (int) (dividend % divisor); | ||
|
|
||
| if (mod < 0) { | ||
| mod += divisor; | ||
| } | ||
|
|
||
| return mod; |
There was a problem hiding this comment.
Can we move to a Utility Class for computing the sign safe mod?
We have a similar Unitity Class in pulsar-client, not pulsar-client-api.
Is it better to move it to pulsar-common to reduce the duplicated code?
There was a problem hiding this comment.
@lhotari And, could you please provide more context about
the impact of the bug has been that the implementations have been rather unoptimal because of of frequent collisions in storing the item to the underlying array. Essentially the hash bucket algorithm hasn't been properly used.
The HashMap in JDK is also used the same way to calculate the table index. Why % is better than & in this case to avoid the frequent collisions?
Here is the code snippet in JDK 17:
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);|
I also left a comment on another related PR. Maybe we should consider to replace the customized map with ConcurrentHashMap. |
|
The pr had no activity for 30 days, mark with Stale label. |
Fixes #21106
Fixes #21108
Motivation
signSafeModfunction which is duplicated in all classes. The buggy implementation didn't return correct results for negative input values.Modifications
buckettobucketIndexto indicate that it's the bucket index in the storage array that needs to be calculated.signSafeModfunction which wasn't sign safe at all. Currently the logic is duplicated in all classes.Documentation
docdoc-requireddoc-not-neededdoc-complete