Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27439][SQL] Use analyzed plan when explaining Dataset #24415

Closed
wants to merge 2 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Apr 19, 2019

What changes were proposed in this pull request?

Because a review is resolved during analysis when we create a dataset, the content of the view is determined when the dataset is created, not when it is evaluated. Now the explain result of a dataset is not correctly consistent with the collected result of it, because we use pre-analyzed logical plan of the dataset in explain command. The explain command will analyzed the logical plan passed in. So if a view is changed after the dataset was created, the plans shown by explain command aren't the same with the plan of the dataset.

scala> spark.range(10).createOrReplaceTempView("test")
scala> spark.range(5).createOrReplaceTempView("test2")
scala> spark.sql("select * from test").createOrReplaceTempView("tmp001")
scala> val df = spark.sql("select * from tmp001")
scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001")
scala> df.show
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+
scala> df.explain

Before:

== Physical Plan ==
*(1) Range (0, 5, step=1, splits=12)

After:

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)

How was this patch tested?

Manually test and unit test.

@SparkQA
Copy link

SparkQA commented Apr 19, 2019

Test build #104747 has finished for PR 24415 at commit e1fe974.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 19, 2019

cc @cloud-fan @dongjoon-hyun

// Because views are possibly resolved in the analyzed plan of this dataset. We use analyzed
// plan in `ExplainCommand`, for consistency. Otherwise, the plans shown by explain command
// might be inconsistent with the evaluated data of this dataset.
val explain = ExplainCommand(queryExecution.analyzed, extended = extended)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pinging me, @viirya . I like this PR. First of all, could you add a test case for this please? I'll take a look this more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I originally thought manual test might be enough. Will add one unit test soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Yes. This is important fix. I want to prevent the future regressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an unit test. Thanks for review.

@SparkQA
Copy link

SparkQA commented Apr 20, 2019

Test build #104771 has finished for PR 24415 at commit c7b99c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
Thank you, @viirya and @HyukjinKwon.

@viirya
Copy link
Member Author

viirya commented Apr 22, 2019

Thanks!

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too.

@dongjoon-hyun
Copy link
Member

This is reverted due to the regression. Please see the JIRA for the detail.

@viirya
Copy link
Member Author

viirya commented Apr 26, 2019

Thanks @dongjoon-hyun. I will look into how to fix it.

@HyukjinKwon
Copy link
Member

Yes, it looks it should be reverted, and pass the logical plan as is. Thanks for quick action, @dongjoon-hyun.

@viirya viirya deleted the SPARK-27439 branch December 27, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants