[SPARK-31673][SQL] QueryExection.debug.toFile() to take an addtional explain mode param #28493

dilipbiswal · 2020-05-10T23:04:27Z

What changes were proposed in this pull request?

Currently QueryExecution.debug.toFile dumps the query plan information in a fixed format. This PR adds an additional explain mode parameter that writes the debug information as per the user supplied format.

df.queryExecution.debug.toFile("/tmp/plan.txt", explainMode = ExplainMode.fromString("formatted"))

== Physical Plan ==
* Filter (2)
+- Scan hive default.s1 (1)


(1) Scan hive default.s1
Output [2]: [c1#15, c2#16]
Arguments: [c1#15, c2#16], HiveTableRelation `default`.`s1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#15, c2#16]

(2) Filter [codegen id : 1]
Input [2]: [c1#15, c2#16]
Condition : (isnotnull(c1#15) AND (c1#15 > 0))


== Whole Stage Codegen ==
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 (maxMethodCodeSize:220; maxConstantPoolSize:105(0.16% used); numInnerClasses:0) ==
*(1) Filter (isnotnull(c1#15) AND (c1#15 > 0))
+- Scan hive default.s1 [c1#15, c2#16], HiveTableRelation `default`.`s1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#15, c2#16]

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ // codegenStageId=1
/* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */   private Object[] references;
/* 008 */   private scala.collection.Iterator[] inputs;
/* 009 */   private scala.collection.Iterator inputadapter_input_0;
/* 010 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] filter_mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[1];
/* 011 */
/* 012 */   public GeneratedIteratorForCodegenStage1(Object[] references) {
/* 013 */     this.references = references;
/* 014 */   }
/* 015 */
/* 016 */   public void init(int index, scala.collection.Iterator[] inputs) {
/* 017 */     partitionIndex = index;
/* 018 */     this.inputs = inputs;
/* 019 */     inputadapter_input_0 = inputs[0];
/* 020 */     filter_mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(2, 0);
/* 021 */
/* 022 */   }
/* 023 */
/* 024 */   protected void processNext() throws java.io.IOException {
/* 025 */     while ( inputadapter_input_0.hasNext()) {
/* 026 */       InternalRow inputadapter_row_0 = (InternalRow) inputadapter_input_0.next();
/* 027 */
/* 028 */       do {
/* 029 */         boolean inputadapter_isNull_0 = inputadapter_row_0.isNullAt(0);
/* 030 */         int inputadapter_value_0 = inputadapter_isNull_0 ?
/* 031 */         -1 : (inputadapter_row_0.getInt(0));
/* 032 */
/* 033 */         boolean filter_value_2 = !inputadapter_isNull_0;
/* 034 */         if (!filter_value_2) continue;
/* 035 */
/* 036 */         boolean filter_value_3 = false;
/* 037 */         filter_value_3 = inputadapter_value_0 > 0;
/* 038 */         if (!filter_value_3) continue;
/* 039 */
/* 040 */         ((org.apache.spark.sql.execution.metric.SQLMetric) references[0] /* numOutputRows */).add(1);
/* 041 */
/* 042 */         boolean inputadapter_isNull_1 = inputadapter_row_0.isNullAt(1);
/* 043 */         int inputadapter_value_1 = inputadapter_isNull_1 ?
/* 044 */         -1 : (inputadapter_row_0.getInt(1));
/* 045 */         filter_mutableStateArray_0[0].reset();
/* 046 */
/* 047 */         filter_mutableStateArray_0[0].zeroOutNullBytes();
/* 048 */
/* 049 */         filter_mutableStateArray_0[0].write(0, inputadapter_value_0);
/* 050 */
/* 051 */         if (inputadapter_isNull_1) {
/* 052 */           filter_mutableStateArray_0[0].setNullAt(1);
/* 053 */         } else {
/* 054 */           filter_mutableStateArray_0[0].write(1, inputadapter_value_1);
/* 055 */         }
/* 056 */         append((filter_mutableStateArray_0[0].getRow()));
/* 057 */
/* 058 */       } while(false);
/* 059 */       if (shouldStop()) return;
/* 060 */     }
/* 061 */   }
/* 062 */
/* 063 */ }

Why are the changes needed?

Hopefully enhances the usability of debug.toFile(..)

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added a test in QueryExecutionSuite

SparkQA · 2020-05-11T03:27:16Z

Test build #122484 has finished for PR 28493 at commit 62dbc83.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2020-05-11T05:11:20Z

cc @gatorsmile @maropu

maropu · 2020-05-11T05:53:54Z

The change looks useful to me. cc: @MaxGekk

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

HyukjinKwon · 2020-05-11T09:03:15Z

+1 too

SparkQA · 2020-05-12T05:05:36Z

Test build #122522 has finished for PR 28493 at commit e5e360f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu

Looks okay.

HyukjinKwon · 2020-05-24T06:02:52Z

retest this please

SparkQA · 2020-05-24T07:05:02Z

Test build #123048 has finished for PR 28493 at commit e5e360f.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-05-24T07:20:23Z

retest this please

SparkQA · 2020-05-24T11:49:02Z

Test build #123054 has finished for PR 28493 at commit e5e360f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-05-25T01:19:20Z

@MaxGekk did you find some time to take a look? If not, I will just merge after double checking.

MaxGekk · 2020-05-25T13:30:21Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

+    }
+  }
+
+  private def simpleString(formatted: Boolean,


nit: put formatted: Boolean, to the next line

MaxGekk · 2020-05-25T13:33:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

     */
-    def toFile(path: String, maxFields: Int = Int.MaxValue): Unit = {
+    def toFile(path: String,


put path: String to the next line. See https://github.com/databricks/scala-style-guide#spacing-and-indentation

MaxGekk · 2020-05-25T13:37:21Z

sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala

@@ -53,6 +53,21 @@ class QueryExecutionSuite extends SharedSparkSession {
      s"*(1) Range (0, $expected, step=1, splits=2)",
      ""))
  }
+
+  def checkDumpedPlansInFormattedMode(path: String, expected: Int): Unit = {


The function is used only in one test. Could you embed its body to the test.

MaxGekk · 2020-05-25T13:39:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

@@ -249,18 +277,26 @@ class QueryExecution(
     * Dumps debug information about query execution into the specified file.
     *


Could you describe path too.

MaxGekk · 2020-05-25T13:42:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

-    def toFile(path: String, maxFields: Int = Int.MaxValue): Unit = {
+    def toFile(path: String,
+        maxFields: Int = Int.MaxValue,
+        explainMode: Option[String] = None): Unit = {
      val filePath = new Path(path)
      val fs = filePath.getFileSystem(sparkSession.sessionState.newHadoopConf())
      val writer = new BufferedWriter(new OutputStreamWriter(fs.create(filePath)))
      val append = (s: String) => {


Please, remove append, and pass writer.write directly:

explainString(mode, maxFields, writer.write)

…in mode param.

SparkQA · 2020-05-25T22:53:51Z

Test build #123095 has finished for PR 28493 at commit 1fea0e2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-05-26T05:40:32Z

Merged to master.

dilipbiswal · 2020-05-26T06:58:23Z

Thank you very much @HyukjinKwon @maropu @MaxGekk

probot-autolabeler bot added the SQL label May 10, 2020

HyukjinKwon reviewed May 11, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated Show resolved Hide resolved

maropu approved these changes May 12, 2020

View reviewed changes

maropu changed the title ~~[SPARK-31673][SQL] QueryExection.debug.toFile() to take an addtional explain mode param.~~ [SPARK-31673][SQL] QueryExection.debug.toFile() to take an addtional explain mode param May 12, 2020

MaxGekk reviewed May 25, 2020

View reviewed changes

dilipbiswal added 3 commits May 25, 2020 10:31

[SPARK-31673] QueryExection.debug.toFile() to take an addtional expla…

410952c

…in mode param.

Code review

2232944

Code review

1fea0e2

dilipbiswal force-pushed the write_to_file branch from e5e360f to 1fea0e2 Compare May 25, 2020 17:51

HyukjinKwon closed this in b44acee May 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31673][SQL] QueryExection.debug.toFile() to take an addtional explain mode param #28493

[SPARK-31673][SQL] QueryExection.debug.toFile() to take an addtional explain mode param #28493

dilipbiswal commented May 10, 2020

SparkQA commented May 11, 2020

dilipbiswal commented May 11, 2020

maropu commented May 11, 2020

HyukjinKwon commented May 11, 2020

SparkQA commented May 12, 2020

maropu left a comment

HyukjinKwon commented May 24, 2020

SparkQA commented May 24, 2020

maropu commented May 24, 2020

SparkQA commented May 24, 2020

HyukjinKwon commented May 25, 2020

MaxGekk May 25, 2020

MaxGekk May 25, 2020

MaxGekk May 25, 2020

MaxGekk May 25, 2020

MaxGekk May 25, 2020

SparkQA commented May 25, 2020

HyukjinKwon commented May 26, 2020

dilipbiswal commented May 26, 2020

		@@ -249,18 +277,26 @@ class QueryExecution(
		* Dumps debug information about query execution into the specified file.
		*

[SPARK-31673][SQL] QueryExection.debug.toFile() to take an addtional explain mode param #28493

[SPARK-31673][SQL] QueryExection.debug.toFile() to take an addtional explain mode param #28493

Conversation

dilipbiswal commented May 10, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented May 11, 2020

dilipbiswal commented May 11, 2020

maropu commented May 11, 2020

HyukjinKwon commented May 11, 2020

SparkQA commented May 12, 2020

maropu left a comment

Choose a reason for hiding this comment

HyukjinKwon commented May 24, 2020

SparkQA commented May 24, 2020

maropu commented May 24, 2020

SparkQA commented May 24, 2020

HyukjinKwon commented May 25, 2020

MaxGekk May 25, 2020

Choose a reason for hiding this comment

MaxGekk May 25, 2020

Choose a reason for hiding this comment

MaxGekk May 25, 2020

Choose a reason for hiding this comment

MaxGekk May 25, 2020

Choose a reason for hiding this comment

MaxGekk May 25, 2020

Choose a reason for hiding this comment

SparkQA commented May 25, 2020

HyukjinKwon commented May 26, 2020

dilipbiswal commented May 26, 2020