-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-7080: [C++][Parquet] Read and write "field_id" attribute in Par…
…quet files, propagate to Arrow field metadata. Assorted additional changes The `field_id` is used for schema evolution and other things. It is surfaced in Python in the `Field.metadata` as `b'PARQUET:field_id'` * `ChunkedArray::Equals` would fail if a child field had unequal metadata, now it does not check the metadata * Improved diffing output in AssertTablesEqual in testing/gtest_util.h (may need some more tests around this) * Added a generic binary ChunkedArray iterator (see `internal::MultipleChunkIterator`) and helpful applicator `internal::ApplyToChunkOverlaps`. I retrofitted `ChunkedArray::Equals` to use this (needed it to improve the diffing output in AssertTablesEqual) * Add `KeyValueMetadata::Merge` method * Add `Field::WithMergedMetadata` method that calls `KeyValueMetadata::Merge` * Print metadata in `Field::ToString` * Add `parquet.ParquetFile.schema_arrow` property to return the effective Arrow schema * Print field_ids in `parquet::SchemaPrinter` This also adds a flag `print_metadata` to `Field::ToString` and `Schema::ToString` with default `false` whether to print out the key value metadata, per ARROW-7063. I figure it's OK to merge this change and then decide whether we want to keep it like that before releasing the software Closes #6408 from wesm/ARROW-7080 and squashes the following commits: e0c7396 <Yosuke Shiro> Fix test cases 239932c <Wes McKinney> Remove field metadata outputs from GLib unit test 03f2f18 <Wes McKinney> Add print_metadata option to Field::ToString / Schema::ToString and use expect_equivalent in R unit tests 169f274 <Yosuke Shiro> Use check_metadata instead of metadata 7b1f5a9 <Yosuke Shiro> Use true as the default argument 222af57 <Yosuke Shiro> Fix document of garrow_table_equal() 14fde57 <Yosuke Shiro> Add metadata parameter instead of using true 45f0c79 <Yosuke Shiro> Fix schema equality check 0ce996e <Wes McKinney> export internal::MultipleChunkIterator 2c3f3ac <Wes McKinney> Correct inconsistent comments about null field_id's 6e3bdfd <Wes McKinney> Fix dataset Parquet unit tests fd099f9 <Wes McKinney> Code review comments f220767 <Wes McKinney> Start working on properly preserving and deserializing field_id in C++. Some field_id round trips working Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Yosuke Shiro <yosuke.shiro615@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>
- Loading branch information
Showing
32 changed files
with
774 additions
and
463 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.