Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Support read BloomFilter in reader #15164

Closed
2 of 6 tasks
mapleFU opened this issue Jan 3, 2023 · 4 comments
Closed
2 of 6 tasks

[C++][Parquet] Support read BloomFilter in reader #15164

mapleFU opened this issue Jan 3, 2023 · 4 comments

Comments

@mapleFU
Copy link
Member

mapleFU commented Jan 3, 2023

Describe the enhancement requested

Currently, our C++ parquet support a bloom_filter class, but it breaks some standard. And, parquet C++ doesn't support reading bloom filter. I'd like to support reading bloom filter. I may split the task into these steps:

Component(s)

C++, Parquet

@rip-nsk
Copy link
Contributor

rip-nsk commented Jan 11, 2023

@mapleFU
Is BloomFilter supported in c++ writer? seems like no

@wgtmac
Copy link
Member

wgtmac commented Jan 12, 2023

@mapleFU Is BloomFilter supported in c++ writer? seems like no

@rip-nsk Quick answer is no.

Write support is already in the plan, once the read support is complete.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 12, 2023

Is BloomFilter supported in c++ writer? seems like no

No, I would like to support it during China's Spring festival. And some bug fix might be added later this week

pitrou added a commit that referenced this issue Feb 1, 2023
#33776)

The original Parquet Bloom Filter spec, added in 2018 (PARQUET-41), was based on the Murmur3 hash function. The spec was later changed to use the XXH64 hash function from xxHash (PARQUET-1609); however, Parquet C++ wasn't updated for this and kept implementing the original spec.

This PR switches the bloom filter implementation to the current version of the Bloom Filter spec (as of apache/parquet-format@3fb10e0). Conformance is tested using a dedicated test file in the parquet-testing repository (`bloom_filter.xxhash.bin`).

Lead-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 12.0.0 milestone Feb 1, 2023
@pitrou
Copy link
Member

pitrou commented Feb 1, 2023

Issue resolved by pull request 33776
#33776

@pitrou pitrou closed this as completed Feb 1, 2023
sjperkins pushed a commit to sjperkins/arrow that referenced this issue Feb 10, 2023
…er spec (apache#33776)

The original Parquet Bloom Filter spec, added in 2018 (PARQUET-41), was based on the Murmur3 hash function. The spec was later changed to use the XXH64 hash function from xxHash (PARQUET-1609); however, Parquet C++ wasn't updated for this and kept implementing the original spec.

This PR switches the bloom filter implementation to the current version of the Bloom Filter spec (as of apache/parquet-format@3fb10e0). Conformance is tested using a dedicated test file in the parquet-testing repository (`bloom_filter.xxhash.bin`).

Lead-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
gringasalpastor pushed a commit to gringasalpastor/arrow that referenced this issue Feb 17, 2023
…er spec (apache#33776)

The original Parquet Bloom Filter spec, added in 2018 (PARQUET-41), was based on the Murmur3 hash function. The spec was later changed to use the XXH64 hash function from xxHash (PARQUET-1609); however, Parquet C++ wasn't updated for this and kept implementing the original spec.

This PR switches the bloom filter implementation to the current version of the Bloom Filter spec (as of apache/parquet-format@3fb10e0). Conformance is tested using a dedicated test file in the parquet-testing repository (`bloom_filter.xxhash.bin`).

Lead-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
fatemehp pushed a commit to fatemehp/arrow that referenced this issue Feb 24, 2023
…er spec (apache#33776)

The original Parquet Bloom Filter spec, added in 2018 (PARQUET-41), was based on the Murmur3 hash function. The spec was later changed to use the XXH64 hash function from xxHash (PARQUET-1609); however, Parquet C++ wasn't updated for this and kept implementing the original spec.

This PR switches the bloom filter implementation to the current version of the Bloom Filter spec (as of apache/parquet-format@3fb10e0). Conformance is tested using a dedicated test file in the parquet-testing repository (`bloom_filter.xxhash.bin`).

Lead-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants