Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2918] [SQL] [WIP] Support the extended & native command for EXPLAIN #1847

Closed
wants to merge 1 commit into from

Conversation

chenghao-intel
Copy link
Contributor

Currently, EXPLAIN doesn't support the SQL native command, or printing the logical plan. This PR will solve this.
For examples:

spark-sql> explain create table temp__ as select * from src;
Physical execution plan:InsertIntoHiveTable (MetastoreRelation default, temp__, None), Map(), false
 HiveTableScan [key#6,value#7], (MetastoreRelation default, src, None), None

spark-sql> explain  Extended create table temp__a as select * from src;
Logical execution plan (Parsed):InsertIntoCreatedTable None, temp__a
 Project [*]
  UnresolvedRelation None, src, None

Logical execution plan (Analyzed):InsertIntoCreatedTable None, temp__a
 Project [key#1,value#2]
  MetastoreRelation default, src, None

Logical execution plan (Optimized):InsertIntoTable Map(), false
 MetastoreRelation default, temp__a, None
 MetastoreRelation default, src, None

Physical execution plan:InsertIntoHiveTable (MetastoreRelation default, temp__a, None), Map(), false
 HiveTableScan [key#1,value#2], (MetastoreRelation default, src, None), None

spark-sql> explain extended select key+3>2+3 as b from src;
Logical execution plan (Parsed):Project [(('key + 3) > (2 + 3)) AS b#10]
 UnresolvedRelation None, src, None

Logical execution plan (Analyzed):Project [((CAST(key#13, DoubleType) + CAST(3, DoubleType)) > CAST((2 + 3), DoubleType)) AS b#10]
 MetastoreRelation default, src, None

Logical execution plan (Optimized):Project [((CAST(key#13, DoubleType) + 3.0) > 5.0) AS b#10]
 MetastoreRelation default, src, None

Physical execution plan:Project [((CAST(key#13, DoubleType) + 3.0) > 5.0) AS b#10]
 HiveTableScan [key#13], (MetastoreRelation default, src, None), None

BTW, this PR depends on #1846, it will keep failure before #1846 merged.

@SparkQA
Copy link

SparkQA commented Aug 8, 2014

QA tests have started for PR 1847. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18177/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 8, 2014

QA results for PR 1847:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
case class ExplainCommand(plan: LogicalPlan, extended: Boolean = false) extends Command {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18177/consoleFull

@transient context: SQLContext)
extends LeafNode with Command {

// Run through the optimizer to generate the physical plan.
override protected[sql] lazy val sideEffectResult: Seq[String] = try {
"Physical execution plan:" +: context.executePlan(logicalPlan).executedPlan.toString.split("\n")
// TODO in Hive, the "extended" ExplainCommand prints the AST as well, and detailed properties.
val analyzed = context.executePlan(logicalPlan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: analyzed seems like a weird name here since its actually the whole queryExecution, not just analysis.

@marmbrus
Copy link
Contributor

Thanks for doing this. I was thinking about adding this myself this morning!

@marmbrus
Copy link
Contributor

Will you have time to update this / address the comments? Would be great to include in 1.1

@chenghao-intel
Copy link
Contributor Author

Thank you @marmbrus for reviewing this. I will update the code as you suggested, but this PR depends on #1846, as we don't want the table created during the logical plan generating while run SQL like explain create table xx as select xxx, can you take a look the test failure in #1846? Thanks.

@chenghao-intel
Copy link
Contributor Author

@marmbrus I've create another PR #1962 which only provide the extended support (but doesn't support the explain CTAS), hope we can merge that first. I am not sure if we still able to merge the explain CTAS in spark 1.1 release, as there is another dependency failed in unit test.

@chenghao-intel
Copy link
Contributor Author

@marmbrus can you review #1846 and #1962? This PR depends on them.

asfgit pushed a commit that referenced this pull request Aug 26, 2014
Provide `extended` keyword support for `explain` command in SQL. e.g.
```
explain extended select key as a1, value as a2 from src where key=1;
== Parsed Logical Plan ==
Project ['key AS a1#3,'value AS a2#4]
 Filter ('key = 1)
  UnresolvedRelation None, src, None

== Analyzed Logical Plan ==
Project [key#8 AS a1#3,value#9 AS a2#4]
 Filter (CAST(key#8, DoubleType) = CAST(1, DoubleType))
  MetastoreRelation default, src, None

== Optimized Logical Plan ==
Project [key#8 AS a1#3,value#9 AS a2#4]
 Filter (CAST(key#8, DoubleType) = 1.0)
  MetastoreRelation default, src, None

== Physical Plan ==
Project [key#8 AS a1#3,value#9 AS a2#4]
 Filter (CAST(key#8, DoubleType) = 1.0)
  HiveTableScan [key#8,value#9], (MetastoreRelation default, src, None), None

Code Generation: false
== RDD ==
(2) MappedRDD[14] at map at HiveContext.scala:350
  MapPartitionsRDD[13] at mapPartitions at basicOperators.scala:42
  MapPartitionsRDD[12] at mapPartitions at basicOperators.scala:57
  MapPartitionsRDD[11] at mapPartitions at TableReader.scala:112
  MappedRDD[10] at map at TableReader.scala:240
  HadoopRDD[9] at HadoopRDD at TableReader.scala:230
```

It's the sub task of #1847. But can go without any dependency.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #1962 from chenghao-intel/explain_extended and squashes the following commits:

295db74 [Cheng Hao] Fix bug in printing the simple execution plan
48bc989 [Cheng Hao] Support EXTENDED for EXPLAIN

(cherry picked from commit 156eb39)
Signed-off-by: Michael Armbrust <michael@databricks.com>
asfgit pushed a commit that referenced this pull request Aug 26, 2014
Provide `extended` keyword support for `explain` command in SQL. e.g.
```
explain extended select key as a1, value as a2 from src where key=1;
== Parsed Logical Plan ==
Project ['key AS a1#3,'value AS a2#4]
 Filter ('key = 1)
  UnresolvedRelation None, src, None

== Analyzed Logical Plan ==
Project [key#8 AS a1#3,value#9 AS a2#4]
 Filter (CAST(key#8, DoubleType) = CAST(1, DoubleType))
  MetastoreRelation default, src, None

== Optimized Logical Plan ==
Project [key#8 AS a1#3,value#9 AS a2#4]
 Filter (CAST(key#8, DoubleType) = 1.0)
  MetastoreRelation default, src, None

== Physical Plan ==
Project [key#8 AS a1#3,value#9 AS a2#4]
 Filter (CAST(key#8, DoubleType) = 1.0)
  HiveTableScan [key#8,value#9], (MetastoreRelation default, src, None), None

Code Generation: false
== RDD ==
(2) MappedRDD[14] at map at HiveContext.scala:350
  MapPartitionsRDD[13] at mapPartitions at basicOperators.scala:42
  MapPartitionsRDD[12] at mapPartitions at basicOperators.scala:57
  MapPartitionsRDD[11] at mapPartitions at TableReader.scala:112
  MappedRDD[10] at map at TableReader.scala:240
  HadoopRDD[9] at HadoopRDD at TableReader.scala:230
```

It's the sub task of #1847. But can go without any dependency.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #1962 from chenghao-intel/explain_extended and squashes the following commits:

295db74 [Cheng Hao] Fix bug in printing the simple execution plan
48bc989 [Cheng Hao] Support EXTENDED for EXPLAIN
@marmbrus
Copy link
Contributor

Hi @chenghao-intel, what remains in this PR now that #1962 is merged? Mind updating the description?

@chenghao-intel
Copy link
Contributor Author

@marmbrus thanks for merging #1962, this PR aims to support showing the logical plan and physical plan for native command, particularly like explain create table xxx as select xxxx, hence it depends on #1846, not sure if you have more comment on #1846 . I can make a quick rebase after #1846 being merged.

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Provide `extended` keyword support for `explain` command in SQL. e.g.
```
explain extended select key as a1, value as a2 from src where key=1;
== Parsed Logical Plan ==
Project ['key AS a1#3,'value AS a2#4]
 Filter ('key = 1)
  UnresolvedRelation None, src, None

== Analyzed Logical Plan ==
Project [key#8 AS a1#3,value#9 AS a2#4]
 Filter (CAST(key#8, DoubleType) = CAST(1, DoubleType))
  MetastoreRelation default, src, None

== Optimized Logical Plan ==
Project [key#8 AS a1#3,value#9 AS a2#4]
 Filter (CAST(key#8, DoubleType) = 1.0)
  MetastoreRelation default, src, None

== Physical Plan ==
Project [key#8 AS a1#3,value#9 AS a2#4]
 Filter (CAST(key#8, DoubleType) = 1.0)
  HiveTableScan [key#8,value#9], (MetastoreRelation default, src, None), None

Code Generation: false
== RDD ==
(2) MappedRDD[14] at map at HiveContext.scala:350
  MapPartitionsRDD[13] at mapPartitions at basicOperators.scala:42
  MapPartitionsRDD[12] at mapPartitions at basicOperators.scala:57
  MapPartitionsRDD[11] at mapPartitions at TableReader.scala:112
  MappedRDD[10] at map at TableReader.scala:240
  HadoopRDD[9] at HadoopRDD at TableReader.scala:230
```

It's the sub task of apache#1847. But can go without any dependency.

Author: Cheng Hao <hao.cheng@intel.com>

Closes apache#1962 from chenghao-intel/explain_extended and squashes the following commits:

295db74 [Cheng Hao] Fix bug in printing the simple execution plan
48bc989 [Cheng Hao] Support EXTENDED for EXPLAIN
@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have started for PR 1847 at commit e52090a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have finished for PR 1847 at commit e52090a.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

I will close this PR, since most of work was done in #1846 & #1962, and native command support for EXPLAIN probably not necessary, even Hive doesn't support it.

@chenghao-intel chenghao-intel deleted the explain branch September 15, 2014 07:51
@chenghao-intel chenghao-intel restored the explain branch November 19, 2014 02:18
viirya added a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
…ng config names (apache#1847)

This is to address the comment https://github.pie.apple.com/IPR/apache-spark/pull/1845/files#r9742208.
This patch removes BosonConf dependency from core to reduce possible dependency issues.

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants