-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12938][SQL] DataFrame API for Bloom filter #10937
Conversation
cc @rxin @liancheng |
139e56f
to
a0dcaa8
Compare
Test build #50156 has finished for PR 10937 at commit
|
val seqOp: (BloomFilter, InternalRow) => BloomFilter = if (colType == StringType) { | ||
(filter, row) => | ||
filter.putBinary(row.getUTF8String(0).getBytes) | ||
filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add comment to explain the branching at here?
@@ -96,6 +96,16 @@ int getVersionNumber() { | |||
public abstract boolean put(Object item); | |||
|
|||
/** | |||
* A specific version of {@link #put(Object)}, that can only be used to put byte array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specific -> specialized
version -> variant
Since the two (cms and bf) are implemented by two different persons, it'd be great for one of you to go through both to make sure everything is consistent. We can do that in a follow-up pull request. |
d4e27bc
to
bd0671c
Compare
retest this please |
Test build #50208 has finished for PR 10937 at commit
|
Thanks - going to merge this. |
This PR integrates Bloom filter from spark-sketch into DataFrame. This version resorts to RDD.aggregate for building the filter. A more performant UDAF version can be built in future follow-up PRs.
This PR also add 2 specify
put
version(putBinary
andputLong
) intoBloomFilter
, which makes it easier to build a Bloom filter over aDataFrame
.