Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10301] [SPARK-10428] [SQL] [BRANCH-1.5] Fixes schema merging for nested structs #8583

Closed
wants to merge 8 commits into from

Conversation

liancheng
Copy link
Contributor

We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR #8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR #8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428.

This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR #8454 is also backported here because the ParquetInteroperabilitySuite introduced in PR #8515 depends on it. This should be safe since #8454 only touches testing code.

@liancheng
Copy link
Contributor Author

cc @yhuai @cloud-fan

@SparkQA
Copy link

SparkQA commented Sep 3, 2015

Test build #41965 has finished for PR 8583 at commit 2d7949c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 3, 2015

Test build #41963 has finished for PR 8583 at commit c2a6177.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Here is a simple repo of mine, used for experiment with various Parquet compatibility and interoperability issues https://github.com/liancheng/parquet-compat The most convenient part is that, you can create Parquet files with arbitrary physical structures using a DSL interactively with the help of SBT Scala console. Could be helpful for reviewers who would like to verify various corner cases.

@SparkQA
Copy link

SparkQA commented Sep 3, 2015

Test build #41972 has finished for PR 8583 at commit bc36329.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Sep 4, 2015

Can you change the title of this PR to something like [SPARK-10301] [SPARK-10428] [SQL] [BRANCH-1.5] Fixes schema merging for nested structs?

@liancheng liancheng changed the title [SPARK-10301] [SPARK-10428] Backports PR #8509 to branch-1.5 [SPARK-10301] [SPARK-10428] [SQL] [BRANCH-1.5] Fixes schema merging for nested structs Sep 4, 2015
@liancheng
Copy link
Contributor Author

@yhuai Done.

@liancheng
Copy link
Contributor Author

@davies @yhuai I addressed your comments in #8509 here. Will open a separate PR later to fix them for master.

@davies
Copy link
Contributor

davies commented Sep 4, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Sep 4, 2015

Test build #42007 has finished for PR 8583 at commit cab8701.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM

}
}

ignore("SPARK-10301 requested schema clipping - schemas with disjoint sets of fields") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is ignored because of a bug probably coming from parquet-mr side. I'm verifying this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that this is a parquet-mr bug.

@SparkQA
Copy link

SparkQA commented Sep 8, 2015

Test build #42123 has finished for PR 8583 at commit cd91a26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 8, 2015

Test build #42131 has finished for PR 8583 at commit 7eb7e09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 8, 2015

Test build #42139 has finished for PR 8583 at commit 85ce339.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 8, 2015

Test build #42140 has finished for PR 8583 at commit 8812f64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Sep 8, 2015

test cases are good.

@yhuai
Copy link
Contributor

yhuai commented Sep 8, 2015

LGTM

@yhuai
Copy link
Contributor

yhuai commented Sep 9, 2015

Thanks. I am merging it to branch 1.5.

asfgit pushed a commit that referenced this pull request Sep 9, 2015
…or nested structs

We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR #8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR #8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428.

This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR #8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR #8515 depends on it. This should be safe since #8454 only touches testing code.

Author: Cheng Lian <lian@databricks.com>

Closes #8583 from liancheng/spark-10301/for-1.5.
@yhuai
Copy link
Contributor

yhuai commented Sep 9, 2015

ok. It has been merged.

@liancheng
Copy link
Contributor Author

Thanks all for your review efforts!

@liancheng liancheng closed this Sep 9, 2015
@liancheng liancheng deleted the spark-10301/for-1.5 branch September 9, 2015 07:19
liancheng added a commit to liancheng/spark that referenced this pull request Sep 9, 2015
asfgit pushed a commit that referenced this pull request Sep 10, 2015
…8509 for master

Author: Cheng Lian <lian@databricks.com>

Closes #8670 from liancheng/spark-10301/address-pr-comments.
ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
…or nested structs

We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR apache#8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR apache#8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428.

This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR apache#8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR apache#8515 depends on it. This should be safe since apache#8454 only touches testing code.

Author: Cheng Lian <lian@databricks.com>

Closes apache#8583 from liancheng/spark-10301/for-1.5.

(cherry picked from commit fca16c5)

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants