Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21884][SQL][BRANCH-2.2] Fix StackOverflowError on MetadataOnlyQuery #19094

Closed
wants to merge 1 commit into from
Closed

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 31, 2017

What changes were proposed in this pull request?

This PR aims to fix StackOverflowError in branch-2.2. This happens when OptimizeMetadataOnlyQuery returns LocalRelation with partition informations without materializations, e.g. for Data source tables (Parquet/ORC) or Hive table stored by Parquet with convertMetastore.
master branch has the same logic, but it doesn't throw StackOverflowError due to the other differences.

scala> spark.version
res0: String = 2.2.0   // 2.2.1-SNAPSHOT is the same.

scala> sql("CREATE TABLE t_1000 (a INT, p INT) USING PARQUET PARTITIONED BY (p)")
res1: org.apache.spark.sql.DataFrame = []

scala> (1 to 1000).foreach(p => sql(s"ALTER TABLE t_1000 ADD PARTITION (p=$p)"))

scala> sql("SELECT COUNT(DISTINCT p) FROM t_1000").collect
java.lang.StackOverflowError
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522)

How was this patch tested?

Pass the Jenkins with a new test case.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-21884][SQL] Fix StackOverflowError on MetadataOnlyQuery [SPARK-21884][SQL][BRANCH-2.2] Fix StackOverflowError on MetadataOnlyQuery Aug 31, 2017
@SparkQA
Copy link

SparkQA commented Aug 31, 2017

Test build #81278 has finished for PR 19094 at commit 07126f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @lianhuiwang and @hvanhovell .
Could you review this PR? When this was introduced at 2.1.0, there was no problem.
When the underlying classes of fsRelation.location.listFiles changed in 2.2.0, this happens.

@gatorsmile
Copy link
Member

#18686

My fix resolves your issue, right?

@dongjoon-hyun
Copy link
Member Author

Thank you!

@dongjoon-hyun
Copy link
Member Author

I close this issue. Thank you again.

asfgit pushed a commit that referenced this pull request Sep 1, 2017
…'s input data transient

This PR is to backport #18686 for resolving the issue in #19094

---

## What changes were proposed in this pull request?
This PR is to mark the parameter `rows` and `unsafeRow` of LocalTableScanExec transient. It can avoid serializing the unneeded objects.

## How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes #19101 from gatorsmile/backport-21477.
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…'s input data transient

This PR is to backport apache#18686 for resolving the issue in apache#19094

---

## What changes were proposed in this pull request?
This PR is to mark the parameter `rows` and `unsafeRow` of LocalTableScanExec transient. It can avoid serializing the unneeded objects.

## How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#19101 from gatorsmile/backport-21477.
@dongjoon-hyun dongjoon-hyun deleted the SPARK-21884 branch January 7, 2019 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants