Skip to content

Conversation

@chenjunjiedada
Copy link
Contributor

Add bloom filter for parquet

  • Bloom filter is built upon column, and the data is stored after each row group.

  • Bloom filter size can be set by user or automatically calculated.

  • Row group filter can filter row group after evaluate predicate on bloom filter data.

Please see related code here: https://docs.google.com/document/d/1mIZ0W24Cr79QHJWN1sQ3dIUc4lAK5AVqozwSwtpFhW8/edit?usp=sharing

Notice that, this change depends on parquet-format change.

Junjie chen added 2 commits August 28, 2017 15:32
Notice that, this change depends on parquet-format change.
@chenjunjiedada
Copy link
Contributor Author

chenjunjiedada commented Aug 30, 2017

Not sure why thrift-0.7.0 build failed.

@chenjunjiedada
Copy link
Contributor Author

I close this firstly due to it should depends on parquet-format pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant