-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10301] [SPARK-10428] [SQL] [BRANCH-1.5] Fixes schema merging for nested structs #8583
Conversation
cc @yhuai @cloud-fan |
Test build #41965 has finished for PR 8583 at commit
|
Test build #41963 has finished for PR 8583 at commit
|
Here is a simple repo of mine, used for experiment with various Parquet compatibility and interoperability issues https://github.com/liancheng/parquet-compat The most convenient part is that, you can create Parquet files with arbitrary physical structures using a DSL interactively with the help of SBT Scala console. Could be helpful for reviewers who would like to verify various corner cases. |
Test build #41972 has finished for PR 8583 at commit
|
Can you change the title of this PR to something like |
@yhuai Done. |
LGTM |
Test build #42007 has finished for PR 8583 at commit
|
LGTM |
} | ||
} | ||
|
||
ignore("SPARK-10301 requested schema clipping - schemas with disjoint sets of fields") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is ignored because of a bug probably coming from parquet-mr side. I'm verifying this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that this is a parquet-mr bug.
Test build #42123 has finished for PR 8583 at commit
|
Test build #42131 has finished for PR 8583 at commit
|
Test build #42139 has finished for PR 8583 at commit
|
Test build #42140 has finished for PR 8583 at commit
|
test cases are good. |
LGTM |
Thanks. I am merging it to branch 1.5. |
…or nested structs We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR #8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR #8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428. This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR #8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR #8515 depends on it. This should be safe since #8454 only touches testing code. Author: Cheng Lian <lian@databricks.com> Closes #8583 from liancheng/spark-10301/for-1.5.
ok. It has been merged. |
Thanks all for your review efforts! |
…8509 for master Author: Cheng Lian <lian@databricks.com> Closes #8670 from liancheng/spark-10301/address-pr-comments.
…or nested structs We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR apache#8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR apache#8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428. This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR apache#8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR apache#8515 depends on it. This should be safe since apache#8454 only touches testing code. Author: Cheng Lian <lian@databricks.com> Closes apache#8583 from liancheng/spark-10301/for-1.5. (cherry picked from commit fca16c5) Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR #8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR #8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428.
This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR #8454 is also backported here because the
ParquetInteroperabilitySuite
introduced in PR #8515 depends on it. This should be safe since #8454 only touches testing code.