Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory:Support shrinking in ConcurrentLongLongPairHashMap #3061

Merged

Conversation

lordcheng10
Copy link
Contributor

@lordcheng10 lordcheng10 commented Feb 17, 2022

Motivation

We found that the pulsar broker frequently appeared Full Gc, and found by dumping the heap memory: org.apache.pulsar.broker.service.Consumer#pendingAcks occupies 9.9G of memory. The pendingAcks variable is defined as follows:
image

The heap memory usage is as follows:
image

The old age continues to rise until the FULL GC is triggered, which causes the connection between the broker and the zookeeper to time out, and finally causes the pulsar broker process to exit:
image

Full GC:
image

zookkepeer session timeout:
image

I found that ConcurrentLongLongPairHashMap only supports expansion, not shrinkage, and most of the memory is not used and wasted.

Of course, this structure is used not only in pulsar, but also in the read and write cache of bookkeeper, which will also lead to a lot of memory waste:
image

Changes

When removing, judge whether to shrink. If the current use size is less than resizeThresholdBelow, then shrinking is required.
if (size < resizeThresholdBelow) {
try {
// shrink the hashmap
rehash(capacity / 2);
} finally {
unlockWrite(stamp);
}
} else {
unlockWrite(stamp);
}

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 17, 2022

@merlimat @hangc0276 @eolivelli PTAL,thanks!

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 17, 2022

in addition,I added the following logic to reduce unnecessary rehash in method of cleanBucket: @merlimat @hangc0276 @eolivelli

       // Reduce unnecessary rehash
        bucket = (bucket - 4) & (table.length - 1);
        while (table[bucket] == DeletedKey) {
            table[bucket] = EmptyKey;
            table[bucket + 1] = EmptyKey;
            table[bucket + 2] = ValueNotFound;
            table[bucket + 3] = ValueNotFound;
            --usedBuckets;

            bucket = (bucket - 4) & (table.length - 1);
        }

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 17, 2022

When clear, also shrink? or fallback to initialize size ?@merlimat

 void clear() {
        long stamp = writeLock();
        try {
            Arrays.fill(table, EmptyKey);
            this.size = 0;
            this.usedBuckets = 0;
        } finally {
            unlockWrite(stamp);
        }
    }

@lordcheng10
Copy link
Contributor Author

rerun failure checks

1 similar comment
@lordcheng10
Copy link
Contributor Author

rerun failure checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

I left one comment,
Also we should add test cases that verify that the map is behaving as coded

@lordcheng10 lordcheng10 changed the title Support shrinking in ConcurrentLongLongPairHashMap Fix memory leak:Support shrinking in ConcurrentLongLongPairHashMap Feb 17, 2022
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making this new behaviour configurable?
Like adding a new boolean parameter 'autoShrink'?
The default is false, to keep previous behaviour.
My concern is that this implementation may help in your case but this is a utility class possibly used by many other applications

1.verify that the map is able to expand after shrink;
2.does not keep shrinking at every remove() operation;
@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 17, 2022

What about making this new behaviour configurable?
Like adding a new boolean parameter 'autoShrink'?
The default is false, to keep previous behaviour.
My concern is that this implementation may help in your case but this is a utility class possibly used by many other applications

@eolivelli @merlimat I agree to add autoShrink, but when we configure autoShrink=false, the following situation cannot be solved:

step1:
put k1=4,k2=4,v1=4,v2=4 :
table[10000]= {
[],
[],
......
[],
[4,4,4,4]
}
usedBuckets=1
size=1
resizeThreshold=int(0.66*10000) = 6600

step2:
put k1=3,k2=3,v1=3,v2=3 :
table[10000]= {
[],
[],
......
[3,3,3,3],
[4,4,4,4]
}
usedBuckets=2
size=2
resizeThreshold= 6600

step3:
remove k1=3,k2=3,v1=3,v2=3 :
table[10000]= {
[],
[],
......
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[4,4,4,4]
}
usedBuckets=2
size=1
resizeThreshold= 6600

....
put/remove
......

step6601:
put k1=6601,k2=6601,v1=6601,v2=6601
table[10000]= {
[],
[],
......
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[DeletedKey,DeletedKey,ValueNotFound,ValueNotFound],
[4,4,4,4]
}
usedBuckets=6601
size=1
resizeThreshold= 6600

usedBuckets=6601>resizeThreshold=6600 so expand.
However, the actual size of the map is 1, and there is still a lots of free space,like [DeletedKey,DeletedKey,ValueNotFound,ValueNotFound], so it should not be expanded.
After running for a long time, if this situation is encountered many times, then with the continuous expansion, there will be more and more free memory space.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merlimat could you please take a look?

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 17, 2022

I agree that the condition foe the expansion is not good and probably we should fix it, but maybe not in this PR

OK, I will create another PR to fix the scaling condition problem.
Do you have any good idea about this expansion condition? @eolivelli @merlimat

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 17, 2022

What about making this new behaviour configurable?
Like adding a new boolean parameter 'autoShrink'?

In the constructor method of this class of ConcurrentLongLongPairHashMap, pass in the parameter MapIdleFactor, the default MapIdleFactor=0, if you want to enable automatic shrink, then set MapIdleFactor>0,like:
public ConcurrentLongLongPairHashMap(int expectedItems, int concurrencyLevel,int MapIdleFactor) {
...
}

What do you think? @eolivelli

@merlimat merlimat added this to the 4.15.0 milestone Feb 17, 2022
Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks mostly good to me. Thanks for initiating this.

I agree with Enrico that we should make it configurable wether to shrink the capacity or not, and probably we should keep the current default and change it based on the intended use.

For example, the main use case in BK code for the DbLedgerStorage WriteCache involves filling the map and clear it, constantly repeating. In this scenario we should not shrink the map because it will get re-expanded immediately.

In other use cases like the subscriptions pending acks, it does indeed make sense to shrink after a peak.

Also we could make the thresholds and step up/down factor configurable. Given that is many parameters, we should consider adding a "Builder" interface.

@@ -48,6 +48,7 @@
private static final long ValueNotFound = -1L;

private static final float MapFillFactor = 0.66f;
private static final float MapIdleFactor = 0.25f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be cautious in avoiding constantly flickering between shrink & expand. We should try to use a smaller threshold here to limit that. Maybe 0.15?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -388,6 +401,18 @@ private void cleanBucket(int bucket) {
table[bucket + 2] = ValueNotFound;
table[bucket + 3] = ValueNotFound;
--usedBuckets;

// Reduce unnecessary rehash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Reduce unnecessary rehash
// Cleanup all the buckets that were in `DeletedKey` state, so that we can reduce unnecessary expansions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

try {
rehash();
// Expand the hashmap
rehash(capacity * 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing "double the size" strategy was probably not the best one for all use cases either, as it can end up wasting a considerable amount of memory in empty buckets.

We could leave it configurable (both the step up and and down), to accommodate different needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merlimat like this?:
float upFactor = 0.5;
float downFactor = 0.5;

// Expand the hashmap
rehash(capacity * (1 + upFactor));

// shrink the hashmap
rehash(capacity * (1-downFactor));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we could make the thresholds and step up/down factor configurable.

What does the thresholds mean here?
Are you referring to the MapFillFactor and MapIdleFactor variables here?
I agree to support the configuration of these two variables. @merlimat

2.add config:
  ①MapFillFactor;②MapIdleFactor;③autoShrink;④expandFactor;⑤shrinkFactor
@lordcheng10
Copy link
Contributor Author

@eolivelli @merlimat PTAL,thanks!
Based on your suggestions, I have made the following adjustments:
1.add builder;
2.add config:
①MapFillFactor;②MapIdleFactor;
③autoShrink;
④expandFactor;⑤shrinkFactor

try {
rehash();
// Expand the hashmap
rehash((int) (capacity * (1 + expandFactor)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If expandFactor == 2, I would expect the size to double each time.

I think this should be:

Suggested change
rehash((int) (capacity * (1 + expandFactor)));
rehash((int) (capacity * expandFactor));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (autoShrink && size < resizeThresholdBelow) {
try {
// shrink the hashmap
rehash((int) (capacity * (1 - shrinkFactor)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be like:

Suggested change
rehash((int) (capacity * (1 - shrinkFactor)));
rehash((int) (capacity / shrinkFactor));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.
In addition, should the parameter range check be added? like: @merlimat
shrinkFactor>1
expandFactor>1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.
PTAL,thanks! @merlimat

2.add check :
shrinkFactor>1
expandFactor>1
@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 19, 2022

Should we also do some shrinking on "clear()"? We should try to follow what ConcurrentHashMap is doing in this case.

I think it is necessary to shrink when clear. If the user does not call remove, but uses clear instead, the shrinking action will not be triggered.
what do you think? @merlimat @eolivelli

@lordcheng10
Copy link
Contributor Author

rerun failure checks

①setMapFillFactor
②setMapIdleFactor
③setExpandFactor
④setShrinkFactor
⑤setAutoShrink
@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 20, 2022

Since here we use bucket = (bucket + 4) & (table.length - 1) instead of (bucket + 4) % (table.length - 1) , expandFactor and shrinkFactor must be multiples of 2,like: 2,4, 5,8,10,...
do you have any idea about this ?@merlimat @eolivelli

@lordcheng10
Copy link
Contributor Author

rerun failure checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

I can't merge.
@merlimat @dlg99 would you mind?

@lordcheng10
Copy link
Contributor Author

rerun failure checks

2 similar comments
@lordcheng10
Copy link
Contributor Author

rerun failure checks

@lordcheng10
Copy link
Contributor Author

rerun failure checks

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 21, 2022

do some shrinking on "clear()"

added shrink in clear(). PTAL,thanks @merlimat

2. fix initCapacity value
@lordcheng10
Copy link
Contributor Author

rerun failure checks

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@merlimat merlimat merged commit 794cdbb into apache:master Feb 22, 2022
lordcheng10 added a commit to lordcheng10/pulsar that referenced this pull request Feb 22, 2022
…hMap of this version supports to shrink when removing, which can solve the problem of continuous memory increase and frequent FGC in pulsar broker.

See the PR corresponding to bookkeeper: apache/bookkeeper#3061
@lordcheng10
Copy link
Contributor Author

The following classes have the same problem, so these changes are also required. what you think? @merlimat
ConcurrentLongHashMap: Map<long, Object>
ConcurrentLongHashSet: Set
ConcurrentLongLongHashMap: Map<long, long>
ConcurrentOpenHashMap: Map<Object, Object>
ConcurrentOpenHashSet: Set

@eolivelli
Copy link
Contributor

@lordcheng10
Your proposal of updating all the other similar classes makes sense to me

@lordcheng10
Copy link
Contributor Author

@lordcheng10 Your proposal of updating all the other similar classes makes sense to me

Ok, I will reopen a PR

@lordcheng10
Copy link
Contributor Author

lordcheng10 commented Feb 22, 2022

@lordcheng10 Your proposal of updating all the other similar classes makes sense to me

@eolivelli I've recreated a PR to support shrinking of other map structures:#3074

zymap pushed a commit that referenced this pull request Jul 26, 2022
### Motivation

Optimize concurrent collection's shrink and clear logic

### Changes
1. Reduce the repeated `Arrays.fill` in the clear process
2. When `capacity` is already equal to `initCapacity`,`rehash` should not be executed
3. Reduce the `rehash` logic in the `clear` process
4. Shrinking must at least ensure `initCapacity`, so as to avoid frequent shrinking and expansion near `initCapacity`, frequent shrinking and expansion, additionally opened `arrays` will consume more memory and affect GC.

If this PR is accepted, I will optimize the same `concurrent collection's shrink and clear logic ` defined in pulsar.

Related to #3061 and #3074
zymap pushed a commit that referenced this pull request Aug 2, 2022
### Motivation

Optimize concurrent collection's shrink and clear logic

### Changes
1. Reduce the repeated `Arrays.fill` in the clear process
2. When `capacity` is already equal to `initCapacity`,`rehash` should not be executed
3. Reduce the `rehash` logic in the `clear` process
4. Shrinking must at least ensure `initCapacity`, so as to avoid frequent shrinking and expansion near `initCapacity`, frequent shrinking and expansion, additionally opened `arrays` will consume more memory and affect GC.

If this PR is accepted, I will optimize the same `concurrent collection's shrink and clear logic ` defined in pulsar.

Related to #3061 and #3074

(cherry picked from commit a580547)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants