Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2454][DataMap] Add fpp property for bloom datamap #2279

Closed
wants to merge 1 commit into from

Conversation

xuchuanyin
Copy link
Contributor

add fpp(false positive probability) property to configure bloom filter
that used by bloom datamap.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?
    changed only internal used interfaces
  • Any backward compatibility impacted?
    NO
  • Document update required?
    NO
  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    Tests will be added in the future
    - How it is tested? Please attach test report.
    tested in standalone 3-node cluster
    - Is it a performance related change? Please attach the performance test report.
    Yes, proper fpp can improve the query performance
    - Any additional information to help reviewers in testing this change.
    NA
  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA

add fpp(false positive probability) property to configure bloom filter
that used by bloom datamap.
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5722/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4561/

/**
* property for fpp(false-positive-probability) of bloom filter
*/
private static final String BLOOM_FPP = "bloom_fpp";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do user need to configure this? What is the relationship with DEFAULT_BLOOM_FILTER_SIZE

Copy link
Contributor Author

@xuchuanyin xuchuanyin May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is an configuration for bloom filter. In the previous implementation, p is
a fixed value 0.00001.

If
n : bloomfilterSize
p : fpp

then
p will decide the number of hash functions used internally by bloom filter
n and p together decide the length of byte used internally by bloom filter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a testcase for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User can always control the bloom filter memory size by setting n?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, by setting n and p, user can control the bloomfiltersize (for memory) and the number of hash function(for cpu).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testcase will be added after your PR #2255 is merged. It makes lots of changes to the code.

@jackylk
Copy link
Contributor

jackylk commented May 9, 2018

LGTM

@asfgit asfgit closed this in 6b94971 May 9, 2018
anubhav100 pushed a commit to anubhav100/incubator-carbondata that referenced this pull request Jun 22, 2018
add fpp(false positive probability) property to configure bloom filter
that used by bloom datamap.

This closes apache#2279
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants