Skip to content

ARROW-3728: [Python] Ignore differences in schema custom metadata when writing table to ParquetWriter#3029

Closed
kszucs wants to merge 3 commits intoapache:masterfrom
kszucs:ARROW-3728
Closed

ARROW-3728: [Python] Ignore differences in schema custom metadata when writing table to ParquetWriter#3029
kszucs wants to merge 3 commits intoapache:masterfrom
kszucs:ARROW-3728

Conversation

@kszucs
Copy link
Member

@kszucs kszucs commented Nov 25, 2018

Merging tables with schemas identical field-wise but different in metadata fails.

@wesm
Copy link
Member

wesm commented Nov 25, 2018

Can you explain the issue in the PR description? Thank you :)

@kszucs
Copy link
Member Author

kszucs commented Nov 25, 2018

Do We want to provide an API for merging parquet files?

@codecov-io
Copy link

Codecov Report

Merging #3029 into master will increase coverage by 1.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3029      +/-   ##
==========================================
+ Coverage   86.99%   88.03%   +1.04%     
==========================================
  Files         494      425      -69     
  Lines       70410    64819    -5591     
==========================================
- Hits        61253    57066    -4187     
+ Misses       9061     7753    -1308     
+ Partials       96        0      -96
Impacted Files Coverage Δ
python/pyarrow/parquet.py 93.89% <100%> (+0.19%) ⬆️
python/pyarrow/tests/test_parquet.py 97.39% <100%> (+0.02%) ⬆️
cpp/src/arrow/util/thread-pool-test.cc 98.91% <0%> (-0.55%) ⬇️
rust/src/record_batch.rs
go/arrow/array/table.go
rust/src/array.rs
go/arrow/math/uint64_amd64.go
go/arrow/internal/testing/tools/bool.go
go/arrow/array/bufferbuilder.go
go/arrow/internal/bitutil/bitutil.go
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7281731...2e12f24. Read the comment docs.

@wesm
Copy link
Member

wesm commented Nov 25, 2018

Could you clarify what you mean by "merging parquet files"? Like reading multiple files and writing out a new file? It might be useful, feel free to open a JIRA issue

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks! I opened https://issues.apache.org/jira/browse/ARROW-3876 to think more about schema normalization on write in general. We need to be careful about preserving metadata without being too much of a nuisance to the user

@wesm wesm changed the title ARROW-3728: [Python] Merging Parquet Files - Pandas Meta in Schema Mismatch ARROW-3728: [Python] Ignore differences in schema custom metadata when writing table to ParquetWriter Nov 25, 2018
@wesm wesm closed this in 10b204e Nov 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants