Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Fix updating page statistics for WriteArrowDictionary #34106

Closed
wgtmac opened this issue Feb 9, 2023 · 0 comments · Fixed by #34107
Closed

[C++][Parquet] Fix updating page statistics for WriteArrowDictionary #34106

wgtmac opened this issue Feb 9, 2023 · 0 comments · Fixed by #34107

Comments

@wgtmac
Copy link
Member

wgtmac commented Feb 9, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Commit for this issue #15042 has fixed the missing statistics, but it still has some issues. For example, if a single write has been split into more than one batch which results in several data pages, the statistics of each page is not correctly updated.

Component(s)

C++, Parquet

@wgtmac wgtmac changed the title [C++][Parquet] Fix [C++][Parquet] Fix updating page statistics for WriteArrowDictionary Feb 9, 2023
wjones127 pushed a commit that referenced this issue Feb 21, 2023
…nary (#34107)

### Rationale for this change

`ColumnWriter::WriteArrowDictionary` has tried to update stats but has problem if a single write has been split into batches and more than one page is written.

### What changes are included in this PR?

Make sure every write of batch has updated the stats.

### Are these changes tested?

Add test case which fails without the fix.

### Are there any user-facing changes?

No.
* Closes: #34106

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
@wjones127 wjones127 added this to the 12.0.0 milestone Feb 21, 2023
fatemehp pushed a commit to fatemehp/arrow that referenced this issue Feb 24, 2023
…Dictionary (apache#34107)

### Rationale for this change

`ColumnWriter::WriteArrowDictionary` has tried to update stats but has problem if a single write has been split into batches and more than one page is written.

### What changes are included in this PR?

Make sure every write of batch has updated the stats.

### Are these changes tested?

Add test case which fails without the fix.

### Are there any user-facing changes?

No.
* Closes: apache#34106

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants