-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pulsar-common] add open Concurrent LongPair RangeSet #3818
Conversation
rerun java8 tests |
rerun java8 tests |
1 similar comment
rerun java8 tests |
@rdhabalia Sure, I forgot about this one. Looking now |
.../src/main/java/org/apache/pulsar/common/util/collections/ConcurrentOpenLongPairRangeSet.java
Outdated
Show resolved
Hide resolved
private volatile String cachedToString = "[]"; | ||
private volatile boolean updatedAfterCached = true; | ||
|
||
public ConcurrentOpenLongPairRangeSet(LongPairConsumer<T> consumer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what is the purpose of the "consumer"? Since it's not typical to have a consumer on a set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConcurrentOpenLongPairRangeSet<T>
stores T (PositionImpl)
into bits. However, many of the apis require converter to create T (PositionImpl)
from the stored bit. eg: Range<T> firstRange(), void forEach(RangeProcessor<T> action), Range<T> span(), Collection<Range<T>> asRanges()
so, instead adding Converter-Consumer
in signature of all apis, I have added as part of constructor.
Also, com.google.common.collect.RangeSet
doesn't use this converter-consumer because it directly stores T(PositionImpl)
and retrieves it from the stored data-structure. So, we have LongPairRangeSet interface which mostly follows com.google.common.collect.RangeSet
api-definition and i don't want to change them by adding converter-consumer
in each of those methods.
* Usage: | ||
* a. This can be used if one doesn't want to create object for every new inserted {@code range} | ||
* b. It creates {@link BitSet} for every unique first-key of the range. | ||
* So, this rangeSet is not suitable for large number of unique keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be improved by pointing to a static bitset for unique keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think static bitset will work because every RangeSet will have its own state.
eg:
topic T1 has two managed-cursors mc1, mc2.
mc1 and mc2 need separate BitSet to store their separate ranges. so, static bitset will not work here.
.../src/main/java/org/apache/pulsar/common/util/collections/ConcurrentOpenLongPairRangeSet.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public String toString() { | ||
if (updatedAfterCached) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The toString()
should typically never be called in "real-world" (other than tests or debug logs). Updating the volatile
flag will have a small overhead that we could avoid by removing this optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually toString()
is also called everytime when we call stats or stats-internal. so, this flag can help us to use cached output if it's not changed and can save cpu.
* @throws NegativeArraySizeException | ||
* if the specified initial size is negative | ||
*/ | ||
public ConcurrentBitSet(int nbits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify why use a bit-set instead of (first, last) pair for the ranges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I don't understand the question fully. we use bit-set instead pair-objects of ranges because we can avoid object allocation for all those pair-objects and store all of them as bits into long[]
.
Also I have added this description at the very beginning of class java-doc.
@merlimat I answered your comments and addressed the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
rerun cpp tests |
Motivation
Pulsar-broker caches individually deleted message-ids into the RangeSet which stores large number of messageId objects into main memory for longer time. Sometimes some of the clients don't behave properly and generate large number of unack messages and triggers frequent redelivery which continuously allocates large number of messageId objects and it creates very high GC pressure and broker ends up into high gc-pauses which is not acceptable.
To solve this problem, broker should cache messageId into OpenRangeSet to avoid object allocation of messageIds.
Modification
This PR introduces
ConcurrentOpenLongPairRangeSet
that stores ranges without allocating objects.Note:
This PR contains data-structure change for RangeSet and I will create separate PR to use
ConcurrentOpenLongPairRangeSet
intomanaged-ledger
.