-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Merging Parquet stats can preserve distinct counts when either side is zero #37014
Comments
sfc-gh-ebrossard
added a commit
to sfc-gh-ebrossard/arrow
that referenced
this issue
Aug 3, 2023
sfc-gh-ebrossard
changed the title
[C++] Merging Parquet stats can preserve distinct counts when either side is zero
[C++][Parquet] Merging Parquet stats can preserve distinct counts when either side is zero
Aug 3, 2023
pitrou
added a commit
that referenced
this issue
Aug 31, 2023
…erging stats (#37016) ### Rationale for this change Statistics merging is unnecessarily conservative with respect to preserving distinct counts. If either side is 0, the merge stats can preserve the distinct count, instead of clearing it. ### What changes are included in this PR? Distinct count is preserved if either the lhs or rhs has a distinct count of 0 when merging stats. ### Are these changes tested? Yes, added a unit test. ### Are there any user-facing changes? I don't think this change to stats is considered to be user-facing. * Closes: #37014 Lead-authored-by: Elliott Brossard <elliott.brossard@snowflake.com> Co-authored-by: Elliott Brossard <64754120+sfc-gh-ebrossard@users.noreply.github.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
…when merging stats (apache#37016) ### Rationale for this change Statistics merging is unnecessarily conservative with respect to preserving distinct counts. If either side is 0, the merge stats can preserve the distinct count, instead of clearing it. ### What changes are included in this PR? Distinct count is preserved if either the lhs or rhs has a distinct count of 0 when merging stats. ### Are these changes tested? Yes, added a unit test. ### Are there any user-facing changes? I don't think this change to stats is considered to be user-facing. * Closes: apache#37014 Lead-authored-by: Elliott Brossard <elliott.brossard@snowflake.com> Co-authored-by: Elliott Brossard <64754120+sfc-gh-ebrossard@users.noreply.github.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this issue
Feb 19, 2024
…when merging stats (apache#37016) ### Rationale for this change Statistics merging is unnecessarily conservative with respect to preserving distinct counts. If either side is 0, the merge stats can preserve the distinct count, instead of clearing it. ### What changes are included in this PR? Distinct count is preserved if either the lhs or rhs has a distinct count of 0 when merging stats. ### Are these changes tested? Yes, added a unit test. ### Are there any user-facing changes? I don't think this change to stats is considered to be user-facing. * Closes: apache#37014 Lead-authored-by: Elliott Brossard <elliott.brossard@snowflake.com> Co-authored-by: Elliott Brossard <64754120+sfc-gh-ebrossard@users.noreply.github.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
arrow/cpp/src/parquet/statistics.cc
Line 565 in 710f960
this
orother
has a distinct count of 0.I'm preparing a PR, assuming this sounds reasonable to fix.
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: