Optimize memory：Support shrinking in ConcurrentLongLongPairHashMap #3061

lordcheng10 · 2022-02-17T06:11:54Z

Motivation

We found that the pulsar broker frequently appeared Full Gc, and found by dumping the heap memory: org.apache.pulsar.broker.service.Consumer#pendingAcks occupies 9.9G of memory. The pendingAcks variable is defined as follows:

The heap memory usage is as follows：

The old age continues to rise until the FULL GC is triggered, which causes the connection between the broker and the zookeeper to time out, and finally causes the pulsar broker process to exit:

Full GC:

zookkepeer session timeout:

I found that ConcurrentLongLongPairHashMap only supports expansion, not shrinkage, and most of the memory is not used and wasted.

Of course, this structure is used not only in pulsar, but also in the read and write cache of bookkeeper, which will also lead to a lot of memory waste：

Changes

When removing, judge whether to shrink. If the current use size is less than resizeThresholdBelow, then shrinking is required.
if (size < resizeThresholdBelow) {
try {
// shrink the hashmap
rehash(capacity / 2);
} finally {
unlockWrite(stamp);
}
} else {
unlockWrite(stamp);
}

lordcheng10 · 2022-02-17T06:12:21Z

@merlimat @hangc0276 @eolivelli PTAL,thanks!

lordcheng10 · 2022-02-17T06:21:43Z

in addition，I added the following logic to reduce unnecessary rehash in method of cleanBucket: @merlimat @hangc0276 @eolivelli

       // Reduce unnecessary rehash
        bucket = (bucket - 4) & (table.length - 1);
        while (table[bucket] == DeletedKey) {
            table[bucket] = EmptyKey;
            table[bucket + 1] = EmptyKey;
            table[bucket + 2] = ValueNotFound;
            table[bucket + 3] = ValueNotFound;
            --usedBuckets;

            bucket = (bucket - 4) & (table.length - 1);
        }

lordcheng10 · 2022-02-17T06:41:57Z

When clear, also shrink? or fallback to initialize size ？@merlimat

 void clear() {
        long stamp = writeLock();
        try {
            Arrays.fill(table, EmptyKey);
            this.size = 0;
            this.usedBuckets = 0;
        } finally {
            unlockWrite(stamp);
        }
    }

lordcheng10 · 2022-02-17T07:20:16Z

rerun failure checks

lordcheng10 · 2022-02-17T07:20:22Z

rerun failure checks

eolivelli

Good idea.

I left one comment,
Also we should add test cases that verify that the map is behaving as coded

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java

eolivelli

What about making this new behaviour configurable?
Like adding a new boolean parameter 'autoShrink'?
The default is false, to keep previous behaviour.
My concern is that this implementation may help in your case but this is a utility class possibly used by many other applications

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java

.../src/test/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMapTest.java

1.verify that the map is able to expand after shrink; 2.does not keep shrinking at every remove() operation;

lordcheng10 · 2022-02-17T12:21:55Z

What about making this new behaviour configurable?
Like adding a new boolean parameter 'autoShrink'?
The default is false, to keep previous behaviour.
My concern is that this implementation may help in your case but this is a utility class possibly used by many other applications

@eolivelli @merlimat I agree to add autoShrink, but when we configure autoShrink=false, the following situation cannot be solved:

step1:
put k1=4,k2=4,v1=4,v2=4 :
table[10000]= {
[],
[],
......
[],
[4,4,4,4]
}
usedBuckets=1
size=1
resizeThreshold=int(0.66*10000) = 6600

step2:
put k1=3,k2=3,v1=3,v2=3 :
table[10000]= {
[],
[],
......
[3,3,3,3],
[4,4,4,4]
}
usedBuckets=2
size=2
resizeThreshold= 6600

step3:
remove k1=3,k2=3,v1=3,v2=3 :
table[10000]= {
[],
[],
......
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[4,4,4,4]
}
usedBuckets=2
size=1
resizeThreshold= 6600

....
put/remove
......

step6601:
put k1=6601,k2=6601,v1=6601,v2=6601
table[10000]= {
[],
[],
......
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[4,4,4,4]
}
usedBuckets=6601
size=1
resizeThreshold= 6600

usedBuckets=6601>resizeThreshold=6600 so expand.
However, the actual size of the map is 1, and there is still a lots of free space,like [DeletedKey,DeletedKey,ValueNotFound,ValueNotFound], so it should not be expanded.
After running for a long time, if this situation is encountered many times, then with the continuous expansion, there will be more and more free memory space.

eolivelli

@merlimat could you please take a look?

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java

lordcheng10 · 2022-02-17T12:54:26Z

I agree that the condition foe the expansion is not good and probably we should fix it, but maybe not in this PR

OK, I will create another PR to fix the scaling condition problem.
Do you have any good idea about this expansion condition? @eolivelli @merlimat

lordcheng10 · 2022-02-17T12:59:08Z

What about making this new behaviour configurable?
Like adding a new boolean parameter 'autoShrink'?

In the constructor method of this class of ConcurrentLongLongPairHashMap, pass in the parameter MapIdleFactor, the default MapIdleFactor=0, if you want to enable automatic shrink, then set MapIdleFactor>0，like:
public ConcurrentLongLongPairHashMap(int expectedItems, int concurrencyLevel,int MapIdleFactor) {
...
}

What do you think? @eolivelli

merlimat

Change looks mostly good to me. Thanks for initiating this.

I agree with Enrico that we should make it configurable wether to shrink the capacity or not, and probably we should keep the current default and change it based on the intended use.

For example, the main use case in BK code for the DbLedgerStorage WriteCache involves filling the map and clear it, constantly repeating. In this scenario we should not shrink the map because it will get re-expanded immediately.

In other use cases like the subscriptions pending acks, it does indeed make sense to shrink after a peak.

Also we could make the thresholds and step up/down factor configurable. Given that is many parameters, we should consider adding a "Builder" interface.

merlimat · 2022-02-17T19:23:39Z

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java

@@ -48,6 +48,7 @@
    private static final long ValueNotFound = -1L;

    private static final float MapFillFactor = 0.66f;
+    private static final float MapIdleFactor = 0.25f;


I think we should be cautious in avoiding constantly flickering between shrink & expand. We should try to use a smaller threshold here to limit that. Maybe 0.15?

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java

merlimat · 2022-02-17T19:28:35Z

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java

@@ -388,6 +401,18 @@ private void cleanBucket(int bucket) {
                table[bucket + 2] = ValueNotFound;
                table[bucket + 3] = ValueNotFound;
                --usedBuckets;
+
+                // Reduce unnecessary rehash


Suggested change

// Reduce unnecessary rehash

// Cleanup all the buckets that were in `DeletedKey` state, so that we can reduce unnecessary expansions

merlimat · 2022-02-17T19:55:13Z

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java

                    try {
-                        rehash();
+                        // Expand the hashmap
+                        rehash(capacity * 2);


The existing "double the size" strategy was probably not the best one for all use cases either, as it can end up wasting a considerable amount of memory in empty buckets.

We could leave it configurable (both the step up and and down), to accommodate different needs.

@merlimat like this?:
float upFactor = 0.5;
float downFactor = 0.5;

// Expand the hashmap
rehash(capacity * (1 + upFactor));

// shrink the hashmap
rehash(capacity * (1-downFactor));

Also we could make the thresholds and step up/down factor configurable.

What does the thresholds mean here？
Are you referring to the MapFillFactor and MapIdleFactor variables here?
I agree to support the configuration of these two variables. @merlimat

2.add config: ①MapFillFactor；②MapIdleFactor；③autoShrink；④expandFactor；⑤shrinkFactor

lordcheng10 · 2022-02-18T04:20:54Z

@eolivelli @merlimat PTAL,thanks!
Based on your suggestions, I have made the following adjustments:
1.add builder;
2.add config:
①MapFillFactor；②MapIdleFactor；
③autoShrink；
④expandFactor；⑤shrinkFactor

merlimat · 2022-02-18T05:29:13Z

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java

                    try {
-                        rehash();
+                        // Expand the hashmap
+                        rehash((int) (capacity * (1 + expandFactor)));


If expandFactor == 2, I would expect the size to double each time.

I think this should be:

Suggested change

rehash((int) (capacity * (1 + expandFactor)));

rehash((int) (capacity * expandFactor));

merlimat · 2022-02-18T05:29:57Z

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java

+                if (autoShrink && size < resizeThresholdBelow) {
+                    try {
+                        // shrink the hashmap
+                        rehash((int) (capacity * (1 - shrinkFactor)));


Shouldn't this be like:

Suggested change

rehash((int) (capacity * (1 - shrinkFactor)));

rehash((int) (capacity / shrinkFactor));

I agree.
In addition, should the parameter range check be added? like: @merlimat
shrinkFactor>1
expandFactor>1

fixed.
PTAL,thanks! @merlimat

2.add check : shrinkFactor>1 expandFactor>1

lordcheng10 · 2022-02-19T18:52:15Z

Should we also do some shrinking on "clear()"? We should try to follow what ConcurrentHashMap is doing in this case.

I think it is necessary to shrink when clear. If the user does not call remove, but uses clear instead, the shrinking action will not be triggered.
what do you think? @merlimat @eolivelli

lordcheng10 · 2022-02-20T03:47:18Z

rerun failure checks

①setMapFillFactor ②setMapIdleFactor ③setExpandFactor ④setShrinkFactor ⑤setAutoShrink

lordcheng10 · 2022-02-20T13:47:05Z

Since here we use bucket = (bucket + 4) & (table.length - 1) instead of (bucket + 4) % (table.length - 1) , expandFactor and shrinkFactor must be multiples of 2，like： 2,4, 5,8,10,...
do you have any idea about this ？@merlimat @eolivelli

lordcheng10 · 2022-02-20T15:52:14Z

rerun failure checks

eolivelli

Great work.

I can't merge.
@merlimat @dlg99 would you mind?

lordcheng10 · 2022-02-20T23:08:03Z

rerun failure checks

lordcheng10 · 2022-02-21T01:13:28Z

rerun failure checks

lordcheng10 · 2022-02-21T01:51:46Z

rerun failure checks

… ②reduce unnecessary shrinkage;

lordcheng10 · 2022-02-21T03:28:32Z

do some shrinking on "clear()"

added shrink in clear(). PTAL,thanks @merlimat

2. fix initCapacity value

lordcheng10 · 2022-02-21T05:34:21Z

rerun failure checks

merlimat

Nice!

…hMap of this version supports to shrink when removing, which can solve the problem of continuous memory increase and frequent FGC in pulsar broker. See the PR corresponding to bookkeeper: apache/bookkeeper#3061

lordcheng10 · 2022-02-22T05:24:00Z

The following classes have the same problem, so these changes are also required. what you think? @merlimat
ConcurrentLongHashMap: Map<long, Object>
ConcurrentLongHashSet: Set
ConcurrentLongLongHashMap: Map<long, long>
ConcurrentOpenHashMap: Map<Object, Object>
ConcurrentOpenHashSet: Set

eolivelli · 2022-02-22T06:33:43Z

@lordcheng10
Your proposal of updating all the other similar classes makes sense to me

lordcheng10 · 2022-02-22T07:07:00Z

@lordcheng10 Your proposal of updating all the other similar classes makes sense to me

Ok, I will reopen a PR

lordcheng10 · 2022-02-22T08:04:13Z

@lordcheng10 Your proposal of updating all the other similar classes makes sense to me

@eolivelli I've recreated a PR to support shrinking of other map structures：#3074

### Motivation Optimize concurrent collection's shrink and clear logic ### Changes 1. Reduce the repeated `Arrays.fill` in the clear process 2. When `capacity` is already equal to `initCapacity`,`rehash` should not be executed 3. Reduce the `rehash` logic in the `clear` process 4. Shrinking must at least ensure `initCapacity`, so as to avoid frequent shrinking and expansion near `initCapacity`, frequent shrinking and expansion, additionally opened `arrays` will consume more memory and affect GC. If this PR is accepted, I will optimize the same `concurrent collection's shrink and clear logic ` defined in pulsar. Related to #3061 and #3074

### Motivation Optimize concurrent collection's shrink and clear logic ### Changes 1. Reduce the repeated `Arrays.fill` in the clear process 2. When `capacity` is already equal to `initCapacity`,`rehash` should not be executed 3. Reduce the `rehash` logic in the `clear` process 4. Shrinking must at least ensure `initCapacity`, so as to avoid frequent shrinking and expansion near `initCapacity`, frequent shrinking and expansion, additionally opened `arrays` will consume more memory and affect GC. If this PR is accepted, I will optimize the same `concurrent collection's shrink and clear logic ` defined in pulsar. Related to #3061 and #3074 (cherry picked from commit a580547)

support shrink

f92c283

Reduce unnecessary rehash

84fd318

check style

14f7f84

eolivelli requested changes Feb 17, 2022

View reviewed changes

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java Outdated Show resolved Hide resolved

lordcheng10 requested a review from eolivelli February 17, 2022 07:34

lordcheng10 added 2 commits February 17, 2022 15:37

fix: unnecessary rehash

f4ac7db

add unit test: testExpandAndShrink

7372984

lordcheng10 changed the title ~~Support shrinking in ConcurrentLongLongPairHashMap~~ Fix memory leak：Support shrinking in ConcurrentLongLongPairHashMap Feb 17, 2022

fix unit test: testExpandAndShrink

1a92f19

eolivelli requested changes Feb 17, 2022

View reviewed changes

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java Outdated Show resolved Hide resolved

eolivelli reviewed Feb 17, 2022

View reviewed changes

.../src/test/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMapTest.java Show resolved Hide resolved

.../src/test/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMapTest.java Show resolved Hide resolved

fix test:

7be0421

1.verify that the map is able to expand after shrink; 2.does not keep shrinking at every remove() operation;

eolivelli reviewed Feb 17, 2022

View reviewed changes

...rver/src/main/java/org/apache/bookkeeper/util/collections/ConcurrentLongLongPairHashMap.java Outdated Show resolved Hide resolved

merlimat added the type/improvement label Feb 17, 2022

merlimat added this to the 4.15.0 milestone Feb 17, 2022

merlimat reviewed Feb 17, 2022

View reviewed changes

1.add builder;

cf8896e

2.add config: ①MapFillFactor；②MapIdleFactor；③autoShrink；④expandFactor；⑤shrinkFactor

lordcheng10 requested a review from eolivelli February 18, 2022 05:26

check style

ecd9762

merlimat reviewed Feb 18, 2022

View reviewed changes

1.check style;

3cc9301

2.add check : shrinkFactor>1 expandFactor>1

remove set methods:

f75fe72

①setMapFillFactor ②setMapIdleFactor ③setExpandFactor ④setShrinkFactor ⑤setAutoShrink

eolivelli approved these changes Feb 20, 2022

View reviewed changes

lordcheng10 added 3 commits February 21, 2022 10:58

Repair shrinkage conditions: ①newCapacity must be the nth power of 2;…

4e41659

… ②reduce unnecessary shrinkage;

Repair shrinkage conditions

97bf6c3

add shrinkage when clear

9802dc6

1.add test for clear shrink

79aea61

2. fix initCapacity value

zymap assigned lordcheng10 Feb 22, 2022

merlimat approved these changes Feb 22, 2022

View reviewed changes

merlimat merged commit 794cdbb into apache:master Feb 22, 2022

lordcheng10 mentioned this pull request Feb 22, 2022

[WIP] fix memory leak: Use ConcurrentLongLongPairHas that supports to shrink apache/pulsar#14407

Closed

3 tasks

This was referenced Feb 28, 2022

support shrink in ConcurrentLongHashMap apache/pulsar#14497

Merged

Optimize memory usage: support to shrink for pendingAcks map apache/pulsar#14515

Merged

lordcheng10 mentioned this pull request Mar 11, 2022

support shrink for map or set apache/pulsar#14663

Merged

3 tasks

wenbingshen mentioned this pull request Jul 21, 2022

Optimize concurrent collection's shrink logic #3417

Merged

	// Reduce unnecessary rehash
	// Cleanup all the buckets that were in `DeletedKey` state, so that we can reduce unnecessary expansions

	rehash((int) (capacity * (1 + expandFactor)));
	rehash((int) (capacity * expandFactor));

	rehash((int) (capacity * (1 - shrinkFactor)));
	rehash((int) (capacity / shrinkFactor));

Optimize memory：Support shrinking in ConcurrentLongLongPairHashMap #3061

Optimize memory：Support shrinking in ConcurrentLongLongPairHashMap #3061

Conversation

lordcheng10 commented Feb 17, 2022 • edited

Motivation

Changes

lordcheng10 commented Feb 17, 2022 • edited

lordcheng10 commented Feb 17, 2022 • edited

lordcheng10 commented Feb 17, 2022 • edited

lordcheng10 commented Feb 17, 2022

lordcheng10 commented Feb 17, 2022

eolivelli left a comment

Choose a reason for hiding this comment

eolivelli left a comment

Choose a reason for hiding this comment

lordcheng10 commented Feb 17, 2022 • edited

eolivelli left a comment

Choose a reason for hiding this comment

lordcheng10 commented Feb 17, 2022 • edited

lordcheng10 commented Feb 17, 2022 • edited

merlimat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lordcheng10 Feb 18, 2022 • edited

Choose a reason for hiding this comment

lordcheng10 Feb 18, 2022 • edited

Choose a reason for hiding this comment

lordcheng10 commented Feb 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lordcheng10 commented Feb 19, 2022 • edited

lordcheng10 commented Feb 20, 2022

lordcheng10 commented Feb 20, 2022 • edited

lordcheng10 commented Feb 20, 2022

eolivelli left a comment

Choose a reason for hiding this comment

lordcheng10 commented Feb 20, 2022

lordcheng10 commented Feb 21, 2022

lordcheng10 commented Feb 21, 2022

lordcheng10 commented Feb 21, 2022 • edited

lordcheng10 commented Feb 21, 2022

merlimat left a comment

Choose a reason for hiding this comment

lordcheng10 commented Feb 22, 2022

eolivelli commented Feb 22, 2022

lordcheng10 commented Feb 22, 2022

lordcheng10 commented Feb 22, 2022 • edited

lordcheng10 commented Feb 17, 2022 •

edited

lordcheng10 commented Feb 17, 2022 •

edited

lordcheng10 commented Feb 17, 2022 •

edited

lordcheng10 commented Feb 17, 2022 •

edited

lordcheng10 commented Feb 17, 2022 •

edited

lordcheng10 commented Feb 17, 2022 •

edited

lordcheng10 commented Feb 17, 2022 •

edited

lordcheng10 Feb 18, 2022 •

edited

lordcheng10 Feb 18, 2022 •

edited

lordcheng10 commented Feb 19, 2022 •

edited

lordcheng10 commented Feb 20, 2022 •

edited

lordcheng10 commented Feb 21, 2022 •

edited

lordcheng10 commented Feb 22, 2022 •

edited