-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-8127: [C++] [Parquet] Incorrect column chunk metadata for multi…
…page batch writes For buffered column writers and non-dictionary encoded columns, Parquet column chunks that span more than one page get the wrong data_page_offset recorded in the column chunk metadata. This causes the file to be unreadable and unscannable. (Some more info and a test case [here](https://issues.apache.org/jira/browse/ARROW-8127).) This patch fixes the error by setting the metadata values from information in the underlying ("final") stream sink. It reorders but retains similar logic for dictionary page offsets introduced in [PARQUET-1706](#5922). Closes #6637 from tpboudreau/ARROW-8127 Authored-by: TP Boudreau <tpboudreau@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>
- Loading branch information
1 parent
76fd44c
commit 774a9a4
Showing
2 changed files
with
54 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters