Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Page header does not save statistics once page index is enabled #34375

Closed
wgtmac opened this issue Feb 28, 2023 · 0 comments · Fixed by #35455
Closed

[C++][Parquet] Page header does not save statistics once page index is enabled #34375

wgtmac opened this issue Feb 28, 2023 · 0 comments · Fixed by #35455
Assignees
Milestone

Comments

@wgtmac
Copy link
Member

wgtmac commented Feb 28, 2023

Describe the enhancement requested

Once writing page index is supported, we should not save page statistics in the data page header as it is duplicated.

Although page stats is disabled by parquet-mr globally (link), we cannot do that because a lot of test cases in the parquet-cpp relies on the stats from page header.

Component(s)

C++, Parquet

wgtmac added a commit to wgtmac/arrow that referenced this issue May 6, 2023
wgtmac added a commit to wgtmac/arrow that referenced this issue May 8, 2023
@wjones127 wjones127 added the Breaking Change Includes a breaking change to the API label May 9, 2023
wgtmac added a commit to wgtmac/arrow that referenced this issue May 24, 2023
wgtmac added a commit to wgtmac/arrow that referenced this issue Jun 6, 2023
@pitrou pitrou added this to the 13.0.0 milestone Jun 6, 2023
pitrou pushed a commit that referenced this issue Jun 6, 2023
…bled (#35455)

### Rationale for this change

Page-level statistics are probably not used in production, and after adding column indexes they are useless. 
parquet-mr already stopped writing them in https://issues.apache.org/jira/browse/PARQUET-1365.

### What changes are included in this PR?

Once page index is enabled for one column, it does not write page stats to the header any more.

### Are these changes tested?

Added a test to check page stats have been skipped.

### Are there any user-facing changes?

Yes (behavior change when page index is enabled).

* Closes: #34375

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants