New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimisation: Add zero-garbage deserialiser for ByteBuffer to RoaringBitmap #650
Conversation
- existing most performant way was to convert it to a BitSet and then use BitSetUtil - this adds a helper which you can use to get a RoaringBitmap directly from the byte array you read on the wire
Benchmark Setup
Show ResultsSmall bitsets, wordSize = 64 represents 4096 bits (512 bytes)
Medium sized bitsets, wordSize = 512 represents 32768 bits (~4kb)
Large bitsets, wordSize = 8192 represents 524288 bits (~64kb)
Very large bitsets, wordSize = 131072 represents 8388608 bits (~8.3 million, ~1mb)
|
Also, anyone with more experience with the codebase please confirm this. This is a bug: https://github.com/RoaringBitmap/RoaringBitmap/pull/650/files#diff-608ac1c40d6f95be23548cf97937dfcee083b21634337f6bb57617565c467f05R163 The copy should be (from, to) and not (from, from + BLOCK_LENGTH) Once we fix this, I won't need to zero-out the thread local block buffer after every use. The fixed impl will look like:
EDIT: Went ahead and made the change. As it doesn't change any existing behaviour and benchmarks showed a consistent gain of ~5% when not zeroing out the thread local word buffer after each use. |
- this removes the need to zero-out the threadlocal buffer everytime
Would you consider porting your code over to RoaringBitmap/src/main/java/org/roaringbitmap/buffer/BufferBitSetUtil.java We try to keep them in sync. It should be easy work. |
@@ -71,6 +72,71 @@ public static RoaringBitmap bitmapOf(final long[] words) { | |||
return ans; | |||
} | |||
|
|||
// To avoid memory allocation, reuse ThreadLocal buffers | |||
private static final ThreadLocal<long[]> WORD_BLOCK = ThreadLocal.withInitial(() -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 8 kB that may get allocated for each thread for one function (below) and it never gets released.
I don't think we want that.
What would be accessible is to allow the user to (optionally) pass a buffer to the function below. If the user passes a buffer, then you use it, otherwise, you allocate it.
In this manner, you give the use full control over the performance, and you don't make people pay 8 kB that they don't want to lock down for one function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will change
ans.highLowContainer.insertNewKeyValueAt(containerIndex++, Util.highbits(offset), | ||
BitSetUtil.containerOf(0, blockLength, blockCardinality, words)); | ||
} | ||
offset += (BLOCK_LENGTH * Long.SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this can be reasonably dismissed, there is the possibility that offset overflows. Make sure that the offset variable cannot overflow (hopefully it cannot due to the the max size of a Java Bitset, but please be specific, maybe with a comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't do much here, will add a comment.
As you said, this won't overflow unless the BitSet size is more than Integer.MAX_VALUE - 64
.
And even in that case, right at the end where it's not needed anymore.
This is a very good PR. Please consider my comments. |
@lemire made the changes you recommended and also ported to BufferBitSetUtil |
Regarding... private static Container containerOf(final int from, final int to, final int blockCardinality,
final long[] words) {
// find the best container available
if (blockCardinality <= ArrayContainer.DEFAULT_MAX_SIZE) {
// containers with DEFAULT_MAX_SIZE or less integers should be
// ArrayContainers
return arrayContainerOf(from, to, blockCardinality, words);
} else {
// otherwise use bitmap container
return new BitmapContainer(Arrays.copyOfRange(words, from, from + BLOCK_LENGTH),
blockCardinality);
}
} A possible alternative would be... if(to - from < BLOCK_LENGTH) {
long [] newbuffer = new long[BLOCK_LENGTH];
System.arraycopy(words, from, newbuffer, 0, to - from);
Object dest_arr, int destPos, int len)
return new BitmapContainer(newbuffer, blockCardinality);
} else {
return new BitmapContainer(Arrays.copyOfRange(words, from, from + BLOCK_LENGTH),
blockCardinality);
} (This code is untested... it is conceptually correct but could be technically wrong.) If you'd like to make this change (it will need to be done in both version of containerOf, one in the buffer package and one in the main package), or a related change, that would be fine. Please advise. |
Can't really see if that helps with anything. public static long[] copyOfRange(long[] original, int from, int to) {
int newLength = to - from;
if (newLength < 0)
throw new IllegalArgumentException(from + " > " + to);
long[] copy = new long[newLength];
System.arraycopy(original, from, copy, 0,
Math.min(original.length - from, newLength));
return copy;
} Doesn't seem to have changed since 8. So the changes I made are essentially the same, without the extra if checks: private static Container containerOf(final int from, final int to, final int blockCardinality,
final long[] words) {
// find the best container available
if (blockCardinality <= ArrayContainer.DEFAULT_MAX_SIZE) {
// containers with DEFAULT_MAX_SIZE or less integers should be
// ArrayContainers
return arrayContainerOf(from, to, blockCardinality, words);
} else {
// otherwise use bitmap container
long[] container = new long[BLOCK_LENGTH];
System.arraycopy(words, from, container, 0, to - from);
return new BitmapContainer(container, blockCardinality);
}
} |
@@ -15,7 +16,7 @@ public class BitSetUtil { | |||
|
|||
// a block consists has a maximum of 1024 words, each representing 64 bits, | |||
// thus representing at maximum 65536 bits | |||
static final private int BLOCK_LENGTH = BitmapContainer.MAX_CAPACITY / Long.SIZE; // | |||
public static final int BLOCK_LENGTH = BitmapContainer.MAX_CAPACITY / Long.SIZE; // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to make BLOCK_LENGTH public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see any other neat way to expose the information.
Since the user can provide the buffer, they need to know atleast what size it needs to be.
Can just mention it in Javadoc, the bounds check will anyways raise error if bad sized buffer is provided.
had hidden locally, forgot to uncomment
Ok. The PR looks good. Running tests. I will give some time for folks to come in and comment. This will be part of the next release. |
Hey @lemire, any plans of merging/releasing this soon? |
Merging. I will issue a release. |
### What changes were proposed in this pull request? - The pr aims to upgrade RoaringBitmap from 0.9.45 to 1.0.0. - From version 1.0.0, the `ArraysShim` class has been moved from `shims-x.x.x.jar` jar to `RoaringBitmap-x.x.x.jar` jar, so we no longer need to rely on it. ### Why are the changes needed? - The newest brings some improvments, eg: Add zero-garbage deserialiser for ByteBuffer to RoaringBitmap by shikharid in RoaringBitmap/RoaringBitmap#650 More specialized method for value decrementation by xtonik in RoaringBitmap/RoaringBitmap#640 Duplicated small array sort routine by xtonik in RoaringBitmap/RoaringBitmap#638 Avoid intermediate byte array creation by xtonik in RoaringBitmap/RoaringBitmap#635 Useless back and forth BD bytes conversion by xtonik in RoaringBitmap/RoaringBitmap#636 - The full release notes: https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.0.0 https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.49 https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.48 https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.47 https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.46 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #42143 from panbingkun/SPARK-44539. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
SUMMARY
RESULTS
Automated Checks
./gradlew test
and made sure that my PR does not break any unit test../gradlew checkstyleMain
or the equivalent and corrected the formatting warnings reported.