-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12937][SQL] bloom filter serialization #10920
Conversation
/** | ||
* Version number of the serialized binary format for bloom filter or count-min sketch. | ||
*/ | ||
public enum Version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bloom filter and count-min sketch can have different version values, but we can share same version class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move it back, because:
- The version enum is actually the best place to document the binary protocol.
- This will be really confusing when bloomfilter has v2 and yet count-min sketch has only v1.
- The amount of code duplication you save is teeny (actually you probably added more loc by having an apache licensing header).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @liancheng on point 1 - the best place to document the binary protocol is in Version!
cc @rxin @liancheng |
@@ -24,6 +27,9 @@ | |||
private long bitCount; | |||
|
|||
static int numWords(long numBits) { | |||
if (numBits <= 0) { | |||
throw new IllegalArgumentException("numBits must be positive"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also include the current value
Test build #50081 has finished for PR 10920 at commit
|
return versionNumber; | ||
} | ||
} | ||
|
||
public abstract Version version(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @liancheng , I removed this as the design doc says users should not care about the version being used.
Test build #50084 has finished for PR 10920 at commit
|
Test build #50086 has finished for PR 10920 at commit
|
I'm going to merge this first. Please move the num hash function thing in your next pr. Thanks. |
This PR adds serialization support for BloomFilter.
A version number is added to version the serialized binary format.