Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-41: Add Bloom filter #112

Merged
merged 3 commits into from
Oct 12, 2018

Conversation

chenjunjiedada
Copy link
Contributor

Move original PR to here and add doc.

@chenjunjiedada
Copy link
Contributor Author

Hi @rdblue ,

I had add a doc here, could you please help to review?

@jbapple-cloudera
Copy link

I had many suggested revisions to the Bloom filter prose, so I thought sending you, @cjjnjust, a pull request would be easier than using Github's weak code-review tool.

chenjunjiedada#1

Grammar and structure tweaking for Bloom filter prose.
@jbapple-cloudera
Copy link

ok, patch LGTM. +1

@jbapple-cloudera
Copy link

@majetideepak Can you take a look, too?

@majetideepak
Copy link

@jbapple-cloudera sure! I will make a pass by end of today. I have to catch up on the recent updates.

following formula. The output is in bits per distinct element:

```c
-8 / log(1 - pow(p, 1.0 / 8));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the source of this formula? I tried substituting 0.5% into this and it did not come out to 11.54. I probably missed something.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the classic formula for Bloom Filters. See the Network Applications paper at the bottom for a proof.

It's very sensitive to p, so 0.4% is closer, and 0.39% closer still.

Copy link

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@jbapple-cloudera
Copy link

jbapple-cloudera commented Oct 11, 2018

@majetideepak , you're the only committer to review so far. Do you want to merge this PR?

@majetideepak majetideepak merged commit 28b84d8 into apache:bloom-filter Oct 12, 2018
majetideepak added a commit that referenced this pull request Oct 12, 2018
majetideepak pushed a commit that referenced this pull request Oct 12, 2018
* PARQUET-41: Add Bloom filter

* Grammar and structure tweaking for Bloom filter prose.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants