[fix][misc] Fix bugs in hash bucket algorithm implementation in Pulsar collections by lhotari · Pull Request #21107 · apache/pulsar

lhotari · 2023-09-01T08:54:13Z

Motivation

Fix a bug in ConcurrentLongLongPairHashMap, ConcurrentLongPairSet and ConcurrentOpenHashMap where the hash bucket's storage index was incorrectly calculated.
- the impact of the bug has been that the implementations have been rather unoptimal because of of frequent collisions in storing the item to the underlying array. Essentially the hash bucket algorithm hasn't been properly used.
Fix a bug in signSafeMod function which is duplicated in all classes. The buggy implementation didn't return correct results for negative input values.

Modifications

rename bucket to bucketIndex to indicate that it's the bucket index in the storage array that needs to be calculated.
calculate bucketIndex by multiplying the hash bucket by the item size (4 or 2).
fix the signSafeMod function which wasn't sign safe at all. Currently the logic is duplicated in all classes.

Documentation

doc
doc-required
doc-not-needed
doc-complete

…multiple elements of array Fixes apache#21106 - Fix a bug in ConcurrentLongLongPairHashMap, ConcurrentLongPairSet and ConcurrentOpenHashMap where the hash bucket's storage index was incorrectly calculated. - the impact of the bug has been that the implementations have been rather unoptimal because of of frequent collisions in storing the item to the underlying array. Essentially the hash bucket algorithm hasn't been properly used.

lhotari · 2023-09-01T09:27:16Z

The signSafeMod bug must have impacted also all other collection classes where the invalid implementation was used.

thetumbled · 2023-09-01T12:40:00Z

...ar-common/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentLongHashMap.java

+        if (mod < 0) {
+            mod += divisor;
+        }


As max-1 > 0, we won't get a negative result with (int) n & (max - 1) even if n is negative. This is unnecessary.

thetumbled · 2023-09-01T12:40:37Z

...ar-common/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentLongHashMap.java

-    static final int signSafeMod(long n, int max) {
-        return (int) n & (max - 1);
+    static final int signSafeMod(long dividend, int divisor) {
+        int mod = (int) (dividend % divisor);


Using & is faster than the %, which is a trick to speed up.

thetumbled · 2023-09-01T12:44:35Z

...n/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentLongLongPairHashMap.java

-            int bucket = signSafeMod(keyHash, capacity);
+            int bucketIndex = signSafeMod(keyHash, capacity) * ITEM_SIZE;


* 4 is equivalent to << 2, moving this logic out of signSafeMod method seems meaningless?

The problem with the old code is that it is cryptic. In the original code, signSafeMod function's name is misleading. In the cases where the item takes 2 or 4 elements, there's a bitwise shift to left to multiply by 2 (<< 1) or 4 (<< 2), directly in the signSafeMod function. It's bad from maintainability perspective to have functions with names that don't match the implementation. I doubt that using bitwise operations to optimize the speed of execution is worthwhile in Java code. It should be left to the compiler to do such optimizations.

Change method name signSafeMod to hash may be better, right?
And i wonder whether the compiler will do such optimizations, could you verify this? thanks.
IMO, these util class adopt some kind of ideas from common util class like HashMap, which use & to avoid % too.

The modern Java compilers are smart enough to recognize when multiplication is equivalent to a bit shift and can optimize the multiplication operation to use a bit shift instead. It's better to focus on writing clear, readable code.

codelipenghui · 2023-09-05T13:32:49Z

...ar-common/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentOpenHashSet.java

+    static final int signSafeMod(long dividend, int divisor) {
+        int mod = (int) (dividend % divisor);
+
+        if (mod < 0) {
+            mod += divisor;
+        }
+
+        return mod;


Can we move to a Utility Class for computing the sign safe mod?
We have a similar Unitity Class in pulsar-client, not pulsar-client-api.
Is it better to move it to pulsar-common to reduce the duplicated code?

@lhotari And, could you please provide more context about

the impact of the bug has been that the implementations have been rather unoptimal because of of frequent collisions in storing the item to the underlying array. Essentially the hash bucket algorithm hasn't been properly used.

The HashMap in JDK is also used the same way to calculate the table index. Why % is better than & in this case to avoid the frequent collisions?

Here is the code snippet in JDK 17:

if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null);

codelipenghui · 2023-09-05T14:15:56Z

I also left a comment on another related PR. Maybe we should consider to replace the customized map with ConcurrentHashMap.

github-actions · 2023-10-06T01:47:32Z

The pr had no activity for 30 days, mark with Stale label.

lhotari added the type/bug The PR fixed a bug or issue reported a bug label Sep 1, 2023

lhotari added this to the 3.2.0 milestone Sep 1, 2023

lhotari requested review from Technoboy-, codelipenghui, lordcheng10 and merlimat September 1, 2023 08:54

lhotari self-assigned this Sep 1, 2023

lhotari added the ready-to-test label Sep 1, 2023

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 1, 2023

lhotari changed the title ~~[fix][common] Fix bug in hash bucket algorithm implementation with Long pair keys~~ [fix][misc] Fix bug in hash bucket algorithm implementation with Long pair keys in Pulsar collections Sep 1, 2023

Fix signSafeMod which wasn't sign safe

eab2d24

lhotari changed the title ~~[fix][misc] Fix bug in hash bucket algorithm implementation with Long pair keys in Pulsar collections~~ [fix][misc] Fix bugs in hash bucket algorithm implementation in Pulsar collections Sep 1, 2023

This was referenced Sep 1, 2023

[fix][broker] fix bug caused by optimistic locking #18390

Merged

Fix bugs in hash bucket algorithm implementation in collection classes apache/bookkeeper#4067

Closed

lhotari requested review from BewareMyPower, eolivelli, hangc0276, mattisonchao, michaeljmarshall and rdhabalia September 1, 2023 10:00

thetumbled reviewed Sep 1, 2023

View reviewed changes

codelipenghui reviewed Sep 5, 2023

View reviewed changes

github-actions bot added the Stale label Oct 6, 2023

Technoboy- modified the milestones: 3.2.0, 3.3.0 Dec 22, 2023

coderzc removed this from the 3.3.0 milestone May 8, 2024

coderzc added this to the 3.4.0 milestone May 8, 2024

lhotari modified the milestones: 4.0.0, 4.1.0 Oct 11, 2024

lhotari added the release/4.0.1 label Oct 11, 2024

lhotari added release/4.0.2 and removed release/4.0.1 labels Dec 6, 2024

lhotari removed the release/4.0.2 label Jan 6, 2025

lhotari removed this from the 4.1.0 milestone Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][misc] Fix bugs in hash bucket algorithm implementation in Pulsar collections#21107

[fix][misc] Fix bugs in hash bucket algorithm implementation in Pulsar collections#21107
lhotari wants to merge 2 commits intoapache:masterfrom
lhotari:lh-fix-bucket-index

lhotari commented Sep 1, 2023 •

edited

Loading

Uh oh!

lhotari commented Sep 1, 2023

Uh oh!

thetumbled Sep 1, 2023

Uh oh!

thetumbled Sep 1, 2023

Uh oh!

thetumbled Sep 1, 2023 •

edited

Loading

Uh oh!

lhotari Sep 1, 2023 •

edited

Loading

Uh oh!

thetumbled Sep 2, 2023

Uh oh!

codelipenghui Sep 5, 2023

Uh oh!

codelipenghui Sep 5, 2023

Uh oh!

codelipenghui Sep 5, 2023

Uh oh!

codelipenghui commented Sep 5, 2023

Uh oh!

github-actions bot commented Oct 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		int bucket = signSafeMod(keyHash, capacity);
		int bucketIndex = signSafeMod(keyHash, capacity) * ITEM_SIZE;

Conversation

lhotari commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Documentation

Uh oh!

lhotari commented Sep 1, 2023

Uh oh!

thetumbled Sep 1, 2023

Choose a reason for hiding this comment

Uh oh!

thetumbled Sep 1, 2023

Choose a reason for hiding this comment

Uh oh!

thetumbled Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lhotari Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thetumbled Sep 2, 2023

Choose a reason for hiding this comment

Uh oh!

codelipenghui Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

codelipenghui Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

codelipenghui Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

codelipenghui commented Sep 5, 2023

Uh oh!

github-actions bot commented Oct 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lhotari commented Sep 1, 2023 •

edited

Loading

thetumbled Sep 1, 2023 •

edited

Loading

lhotari Sep 1, 2023 •

edited

Loading