ARROW-3728: [Python] Ignore differences in schema custom metadata when writing table to ParquetWriter#3029
ARROW-3728: [Python] Ignore differences in schema custom metadata when writing table to ParquetWriter#3029kszucs wants to merge 3 commits intoapache:masterfrom
Conversation
|
Can you explain the issue in the PR description? Thank you :) |
|
Do We want to provide an API for merging parquet files? |
Codecov Report
@@ Coverage Diff @@
## master #3029 +/- ##
==========================================
+ Coverage 86.99% 88.03% +1.04%
==========================================
Files 494 425 -69
Lines 70410 64819 -5591
==========================================
- Hits 61253 57066 -4187
+ Misses 9061 7753 -1308
+ Partials 96 0 -96
Continue to review full report at Codecov.
|
|
Could you clarify what you mean by "merging parquet files"? Like reading multiple files and writing out a new file? It might be useful, feel free to open a JIRA issue |
wesm
left a comment
There was a problem hiding this comment.
+1, thanks! I opened https://issues.apache.org/jira/browse/ARROW-3876 to think more about schema normalization on write in general. We need to be careful about preserving metadata without being too much of a nuisance to the user
Merging tables with schemas identical field-wise but different in metadata fails.