Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12937][SQL] bloom filter serialization #10920

Closed
wants to merge 3 commits into from

Conversation

cloud-fan
Copy link
Contributor

This PR adds serialization support for BloomFilter.

A version number is added to version the serialized binary format.

/**
* Version number of the serialized binary format for bloom filter or count-min sketch.
*/
public enum Version {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bloom filter and count-min sketch can have different version values, but we can share same version class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move it back, because:

  1. The version enum is actually the best place to document the binary protocol.
  2. This will be really confusing when bloomfilter has v2 and yet count-min sketch has only v1.
  3. The amount of code duplication you save is teeny (actually you probably added more loc by having an apache licensing header).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @liancheng on point 1 - the best place to document the binary protocol is in Version!

@cloud-fan
Copy link
Contributor Author

cc @rxin @liancheng

@@ -24,6 +27,9 @@
private long bitCount;

static int numWords(long numBits) {
if (numBits <= 0) {
throw new IllegalArgumentException("numBits must be positive");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also include the current value

@SparkQA
Copy link

SparkQA commented Jan 26, 2016

Test build #50081 has finished for PR 10920 at commit 4b05a35.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

return versionNumber;
}
}

public abstract Version version();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @liancheng , I removed this as the design doc says users should not care about the version being used.

@SparkQA
Copy link

SparkQA commented Jan 26, 2016

Test build #50084 has finished for PR 10920 at commit 38d674c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 26, 2016

Test build #50086 has finished for PR 10920 at commit c9b29c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 26, 2016

I'm going to merge this first. Please move the num hash function thing in your next pr. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants