New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23303][SQL] improve the explain result for data source v2 relations #20477

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@cloud-fan
Contributor

cloud-fan commented Feb 1, 2018

What changes were proposed in this pull request?

The current explain result for data source v2 relation is unreadable:

== Parsed Logical Plan ==
'Filter ('i > 6)
+- AnalysisBarrier
      +- Project [j#1]
         +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Filter (i#0 > 6)
   +- Project [j#1, i#0]
      +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

== Optimized Logical Plan ==
Project [j#1]
+- Filter isnotnull(i#0)
   +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

== Physical Plan ==
*(1) Project [j#1]
+- *(1) Filter isnotnull(i#0)
   +- *(1) DataSourceV2Scan [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

after this PR

== Parsed Logical Plan ==
'Project [unresolvedalias('j, None)]
+- AnalysisBarrier
      +- Relation AdvancedDataSourceV2[i#0, j#1]

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Relation AdvancedDataSourceV2[i#0, j#1]

== Optimized Logical Plan ==
Relation AdvancedDataSourceV2[j#1]

== Physical Plan ==
*(1) Scan AdvancedDataSourceV2[j#1]

== Analyzed Logical Plan ==
i: int, j: int
Filter (i#88 > 3)
+- Relation JavaAdvancedDataSourceV2[i#88, j#89]

== Optimized Logical Plan ==
Filter isnotnull(i#88)
+- Relation JavaAdvancedDataSourceV2[i#88, j#89] (PushedFilter: [GreaterThan(i,3)])

== Physical Plan ==
*(1) Filter isnotnull(i#88)
+- *(1) Scan JavaAdvancedDataSourceV2[i#88, j#89] (PushedFilter: [GreaterThan(i,3)])

an example for streaming query

== Parsed Logical Plan ==
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
   +- MapElements <function1>, class java.lang.String, [StructField(value,StringType,true)], obj#5: java.lang.String
      +- DeserializeToObject cast(value#25 as string).toString, obj#4: java.lang.String
         +- Streaming Relation FakeDataSourceV2$[value#25]

== Analyzed Logical Plan ==
value: string, count(1): bigint
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
   +- MapElements <function1>, class java.lang.String, [StructField(value,StringType,true)], obj#5: java.lang.String
      +- DeserializeToObject cast(value#25 as string).toString, obj#4: java.lang.String
         +- Streaming Relation FakeDataSourceV2$[value#25]

== Optimized Logical Plan ==
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
   +- MapElements <function1>, class java.lang.String, [StructField(value,StringType,true)], obj#5: java.lang.String
      +- DeserializeToObject value#25.toString, obj#4: java.lang.String
         +- Streaming Relation FakeDataSourceV2$[value#25]

== Physical Plan ==
*(4) HashAggregate(keys=[value#6], functions=[count(1)], output=[value#6, count(1)#11L])
+- StateStoreSave [value#6], state info [ checkpoint = *********(redacted)/cloud/dev/spark/target/tmp/temporary-549f264b-2531-4fcb-a52f-433c77347c12/state, runId = f84d9da9-2f8c-45c1-9ea1-70791be684de, opId = 0, ver = 0, numPartitions = 5], Complete, 0
   +- *(3) HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#16L])
      +- StateStoreRestore [value#6], state info [ checkpoint = *********(redacted)/cloud/dev/spark/target/tmp/temporary-549f264b-2531-4fcb-a52f-433c77347c12/state, runId = f84d9da9-2f8c-45c1-9ea1-70791be684de, opId = 0, ver = 0, numPartitions = 5]
         +- *(2) HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#16L])
            +- Exchange hashpartitioning(value#6, 5)
               +- *(1) HashAggregate(keys=[value#6], functions=[partial_count(1)], output=[value#6, count#16L])
                  +- *(1) SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
                     +- *(1) MapElements <function1>, obj#5: java.lang.String
                        +- *(1) DeserializeToObject value#25.toString, obj#4: java.lang.String
                           +- *(1) Scan FakeDataSourceV2$[value#25]

How was this patch tested?

N/A

@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan
Contributor

cloud-fan commented Feb 1, 2018

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 1, 2018

Test build #86937 has finished for PR 20477 at commit d7cf774.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 1, 2018

Test build #86937 has finished for PR 20477 at commit d7cf774.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 2, 2018

Test build #86957 has finished for PR 20477 at commit 1f61965.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 2, 2018

Test build #86957 has finished for PR 20477 at commit 1f61965.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Feb 2, 2018

Contributor

retest this please

Contributor

cloud-fan commented Feb 2, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 2, 2018

Test build #86962 has finished for PR 20477 at commit 1f61965.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 2, 2018

Test build #86962 has finished for PR 20477 at commit 1f61965.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 2, 2018

Test build #86963 has finished for PR 20477 at commit 1f61965.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 2, 2018

Test build #86963 has finished for PR 20477 at commit 1f61965.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 2, 2018

Test build #86974 has finished for PR 20477 at commit 4ca2c40.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 2, 2018

Test build #86974 has finished for PR 20477 at commit 4ca2c40.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Feb 2, 2018

Contributor

retest this please

Contributor

cloud-fan commented Feb 2, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 2, 2018

Test build #86986 has finished for PR 20477 at commit 4ca2c40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 2, 2018

Test build #86986 has finished for PR 20477 at commit 4ca2c40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 5, 2018

Test build #87064 has finished for PR 20477 at commit a40d18e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 5, 2018

Test build #87064 has finished for PR 20477 at commit a40d18e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 6, 2018

Test build #87087 has finished for PR 20477 at commit 1556a9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 6, 2018

Test build #87087 has finished for PR 20477 at commit 1556a9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@huaxingao

This comment has been minimized.

Show comment
Hide comment
@huaxingao

huaxingao Feb 7, 2018

Contributor

@cloud-fan
I have a question about the Optimized Logical Plan. In the "What changed were proposed" section, it is said that after this PR, the Optimized Logical Plan will be as following

== Optimized Logical Plan ==
Relation AdvancedDataSourceV2[i#0, j#1]

== Physical Plan ==
*(1) Scan AdvancedDataSourceV2[i#0, j#1] (PushedFilter: [IsNotNull(i), GreaterThan(i,3)])

It seems to me that push down is happened at optimization. Should the optimized logical plan also contain the pushed filter like this?

== Optimized Logical Plan ==
Relation AdvancedDataSourceV2[i#0, j#1] (PushedFilter: [IsNotNull(i), GreaterThan(i,3)])
Contributor

huaxingao commented Feb 7, 2018

@cloud-fan
I have a question about the Optimized Logical Plan. In the "What changed were proposed" section, it is said that after this PR, the Optimized Logical Plan will be as following

== Optimized Logical Plan ==
Relation AdvancedDataSourceV2[i#0, j#1]

== Physical Plan ==
*(1) Scan AdvancedDataSourceV2[i#0, j#1] (PushedFilter: [IsNotNull(i), GreaterThan(i,3)])

It seems to me that push down is happened at optimization. Should the optimized logical plan also contain the pushed filter like this?

== Optimized Logical Plan ==
Relation AdvancedDataSourceV2[i#0, j#1] (PushedFilter: [IsNotNull(i), GreaterThan(i,3)])
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Feb 7, 2018

Contributor

The result was out-dated, I've updated the PR description, please check again, thanks!

Contributor

cloud-fan commented Feb 7, 2018

The result was out-dated, I've updated the PR description, please check again, thanks!

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 7, 2018

Test build #87145 has finished for PR 20477 at commit c4bfbf4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 7, 2018

Test build #87145 has finished for PR 20477 at commit c4bfbf4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 7, 2018

Test build #87146 has finished for PR 20477 at commit c0c5895.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 7, 2018

Test build #87146 has finished for PR 20477 at commit c0c5895.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Feb 7, 2018

Contributor

retest this please

Contributor

cloud-fan commented Feb 7, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 7, 2018

Test build #87152 has finished for PR 20477 at commit c0c5895.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 7, 2018

Test build #87152 has finished for PR 20477 at commit c0c5895.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Feb 7, 2018

Contributor

retest this please

Contributor

cloud-fan commented Feb 7, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 7, 2018

Test build #87158 has finished for PR 20477 at commit c0c5895.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 7, 2018

Test build #87158 has finished for PR 20477 at commit c0c5895.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 8, 2018

Test build #87189 has finished for PR 20477 at commit 2b4a095.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 8, 2018

Test build #87189 has finished for PR 20477 at commit 2b4a095.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan
Contributor

cloud-fan commented Feb 8, 2018

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 8, 2018

Test build #87197 has finished for PR 20477 at commit 0efd5d3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 8, 2018

Test build #87197 has finished for PR 20477 at commit 0efd5d3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Feb 8, 2018

Contributor

retest this please

Contributor

cloud-fan commented Feb 8, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 8, 2018

Test build #87208 has finished for PR 20477 at commit 0efd5d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 8, 2018

Test build #87208 has finished for PR 20477 at commit 0efd5d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 8, 2018

Test build #87220 has finished for PR 20477 at commit 4bff16d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 8, 2018

Test build #87220 has finished for PR 20477 at commit 4bff16d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 9, 2018

Test build #87242 has finished for PR 20477 at commit 0cc0600.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 9, 2018

Test build #87242 has finished for PR 20477 at commit 0cc0600.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Feb 9, 2018

Contributor

retest this please

Contributor

cloud-fan commented Feb 9, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 9, 2018

Test build #87247 has finished for PR 20477 at commit 0cc0600.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 9, 2018

Test build #87247 has finished for PR 20477 at commit 0cc0600.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@gatorsmile

This comment has been minimized.

Show comment
Hide comment
@gatorsmile

gatorsmile Feb 12, 2018

Member

retest this please

Member

gatorsmile commented Feb 12, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 13, 2018

Test build #87350 has finished for PR 20477 at commit 0cc0600.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 13, 2018

Test build #87350 has finished for PR 20477 at commit 0cc0600.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@kiszk

This comment has been minimized.

Show comment
Hide comment
@kiszk

kiszk Feb 13, 2018

Contributor

retest this please

Contributor

kiszk commented Feb 13, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 13, 2018

Test build #87358 has finished for PR 20477 at commit 0cc0600.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 13, 2018

Test build #87358 has finished for PR 20477 at commit 0cc0600.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@gatorsmile

This comment has been minimized.

Show comment
Hide comment
@gatorsmile

gatorsmile Feb 13, 2018

Member

LGTM Merged to master.

Member

gatorsmile commented Feb 13, 2018

LGTM Merged to master.

@asfgit asfgit closed this in f17b936 Feb 13, 2018

@gatorsmile

This comment has been minimized.

Show comment
Hide comment
@gatorsmile

gatorsmile Feb 14, 2018

Member

As pointed out by @tdas , since this PR impacts the streaming, I am reverting this PR from master. Thanks!

Member

gatorsmile commented Feb 14, 2018

As pointed out by @tdas , since this PR impacts the streaming, I am reverting this PR from master. Thanks!

@tdas

This comment has been minimized.

Show comment
Hide comment
@tdas

tdas Feb 14, 2018

Contributor

To be clear, the MicrobatchReader -> DataSourceV2 map added to MicroBatchExecution has potential implications in the scenario of self-joins (that I am trying to debug in #20598).

Contributor

tdas commented Feb 14, 2018

To be clear, the MicrobatchReader -> DataSourceV2 map added to MicroBatchExecution has potential implications in the scenario of self-joins (that I am trying to debug in #20598).

@gatorsmile

This comment has been minimized.

Show comment
Hide comment
@gatorsmile

gatorsmile Feb 14, 2018

Member

Thanks! The PR has been reverted.

Member

gatorsmile commented Feb 14, 2018

Thanks! The PR has been reverted.

@tdas

This comment has been minimized.

Show comment
Hide comment
@tdas

tdas Feb 14, 2018

Contributor

Thank you very much @gatorsmile, I promise I will do a proper review of the streaming side when you reopen this PR.

Contributor

tdas commented Feb 14, 2018

Thank you very much @gatorsmile, I promise I will do a proper review of the streaming side when you reopen this PR.

wangyum added a commit to wangyum/spark that referenced this pull request Mar 5, 2018

[SPARK-23303][SQL] improve the explain result for data source v2 rela…
…tions

## What changes were proposed in this pull request?

The current explain result for data source v2 relation is unreadable:
```
== Parsed Logical Plan ==
'Filter ('i > 6)
+- AnalysisBarrier
      +- Project [j#1]
         +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader3b415940

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Filter (i#0 > 6)
   +- Project [j#1, i#0]
      +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader3b415940

== Optimized Logical Plan ==
Project [j#1]
+- Filter isnotnull(i#0)
   +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader3b415940

== Physical Plan ==
*(1) Project [j#1]
+- *(1) Filter isnotnull(i#0)
   +- *(1) DataSourceV2Scan [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader3b415940
```

after this PR
```
== Parsed Logical Plan ==
'Project [unresolvedalias('j, None)]
+- AnalysisBarrier
      +- Relation AdvancedDataSourceV2[i#0, j#1]

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Relation AdvancedDataSourceV2[i#0, j#1]

== Optimized Logical Plan ==
Relation AdvancedDataSourceV2[j#1]

== Physical Plan ==
*(1) Scan AdvancedDataSourceV2[j#1]
```
-------
```
== Analyzed Logical Plan ==
i: int, j: int
Filter (i#88 > 3)
+- Relation JavaAdvancedDataSourceV2[i#88, j#89]

== Optimized Logical Plan ==
Filter isnotnull(i#88)
+- Relation JavaAdvancedDataSourceV2[i#88, j#89] (PushedFilter: [GreaterThan(i,3)])

== Physical Plan ==
*(1) Filter isnotnull(i#88)
+- *(1) Scan JavaAdvancedDataSourceV2[i#88, j#89] (PushedFilter: [GreaterThan(i,3)])
```

an example for streaming query
```
== Parsed Logical Plan ==
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
   +- MapElements <function1>, class java.lang.String, [StructField(value,StringType,true)], obj#5: java.lang.String
      +- DeserializeToObject cast(value#25 as string).toString, obj#4: java.lang.String
         +- Streaming Relation FakeDataSourceV2$[value#25]

== Analyzed Logical Plan ==
value: string, count(1): bigint
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
   +- MapElements <function1>, class java.lang.String, [StructField(value,StringType,true)], obj#5: java.lang.String
      +- DeserializeToObject cast(value#25 as string).toString, obj#4: java.lang.String
         +- Streaming Relation FakeDataSourceV2$[value#25]

== Optimized Logical Plan ==
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
   +- MapElements <function1>, class java.lang.String, [StructField(value,StringType,true)], obj#5: java.lang.String
      +- DeserializeToObject value#25.toString, obj#4: java.lang.String
         +- Streaming Relation FakeDataSourceV2$[value#25]

== Physical Plan ==
*(4) HashAggregate(keys=[value#6], functions=[count(1)], output=[value#6, count(1)#11L])
+- StateStoreSave [value#6], state info [ checkpoint = *********(redacted)/cloud/dev/spark/target/tmp/temporary-549f264b-2531-4fcb-a52f-433c77347c12/state, runId = f84d9da9-2f8c-45c1-9ea1-70791be684de, opId = 0, ver = 0, numPartitions = 5], Complete, 0
   +- *(3) HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#16L])
      +- StateStoreRestore [value#6], state info [ checkpoint = *********(redacted)/cloud/dev/spark/target/tmp/temporary-549f264b-2531-4fcb-a52f-433c77347c12/state, runId = f84d9da9-2f8c-45c1-9ea1-70791be684de, opId = 0, ver = 0, numPartitions = 5]
         +- *(2) HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#16L])
            +- Exchange hashpartitioning(value#6, 5)
               +- *(1) HashAggregate(keys=[value#6], functions=[partial_count(1)], output=[value#6, count#16L])
                  +- *(1) SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
                     +- *(1) MapElements <function1>, obj#5: java.lang.String
                        +- *(1) DeserializeToObject value#25.toString, obj#4: java.lang.String
                           +- *(1) Scan FakeDataSourceV2$[value#25]
```
## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #20477 from cloud-fan/explain.

jcuquemelle pushed a commit to jcuquemelle/spark that referenced this pull request Mar 11, 2018

[SPARK-23303][SQL] improve the explain result for data source v2 rela…
…tions

## What changes were proposed in this pull request?

The current explain result for data source v2 relation is unreadable:
```
== Parsed Logical Plan ==
'Filter ('i > 6)
+- AnalysisBarrier
      +- Project [j#1]
         +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader3b415940

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Filter (i#0 > 6)
   +- Project [j#1, i#0]
      +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader3b415940

== Optimized Logical Plan ==
Project [j#1]
+- Filter isnotnull(i#0)
   +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader3b415940

== Physical Plan ==
*(1) Project [j#1]
+- *(1) Filter isnotnull(i#0)
   +- *(1) DataSourceV2Scan [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader3b415940
```

after this PR
```
== Parsed Logical Plan ==
'Project [unresolvedalias('j, None)]
+- AnalysisBarrier
      +- Relation AdvancedDataSourceV2[i#0, j#1]

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Relation AdvancedDataSourceV2[i#0, j#1]

== Optimized Logical Plan ==
Relation AdvancedDataSourceV2[j#1]

== Physical Plan ==
*(1) Scan AdvancedDataSourceV2[j#1]
```
-------
```
== Analyzed Logical Plan ==
i: int, j: int
Filter (i#88 > 3)
+- Relation JavaAdvancedDataSourceV2[i#88, j#89]

== Optimized Logical Plan ==
Filter isnotnull(i#88)
+- Relation JavaAdvancedDataSourceV2[i#88, j#89] (PushedFilter: [GreaterThan(i,3)])

== Physical Plan ==
*(1) Filter isnotnull(i#88)
+- *(1) Scan JavaAdvancedDataSourceV2[i#88, j#89] (PushedFilter: [GreaterThan(i,3)])
```

an example for streaming query
```
== Parsed Logical Plan ==
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
   +- MapElements <function1>, class java.lang.String, [StructField(value,StringType,true)], obj#5: java.lang.String
      +- DeserializeToObject cast(value#25 as string).toString, obj#4: java.lang.String
         +- Streaming Relation FakeDataSourceV2$[value#25]

== Analyzed Logical Plan ==
value: string, count(1): bigint
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
   +- MapElements <function1>, class java.lang.String, [StructField(value,StringType,true)], obj#5: java.lang.String
      +- DeserializeToObject cast(value#25 as string).toString, obj#4: java.lang.String
         +- Streaming Relation FakeDataSourceV2$[value#25]

== Optimized Logical Plan ==
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
   +- MapElements <function1>, class java.lang.String, [StructField(value,StringType,true)], obj#5: java.lang.String
      +- DeserializeToObject value#25.toString, obj#4: java.lang.String
         +- Streaming Relation FakeDataSourceV2$[value#25]

== Physical Plan ==
*(4) HashAggregate(keys=[value#6], functions=[count(1)], output=[value#6, count(1)#11L])
+- StateStoreSave [value#6], state info [ checkpoint = *********(redacted)/cloud/dev/spark/target/tmp/temporary-549f264b-2531-4fcb-a52f-433c77347c12/state, runId = f84d9da9-2f8c-45c1-9ea1-70791be684de, opId = 0, ver = 0, numPartitions = 5], Complete, 0
   +- *(3) HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#16L])
      +- StateStoreRestore [value#6], state info [ checkpoint = *********(redacted)/cloud/dev/spark/target/tmp/temporary-549f264b-2531-4fcb-a52f-433c77347c12/state, runId = f84d9da9-2f8c-45c1-9ea1-70791be684de, opId = 0, ver = 0, numPartitions = 5]
         +- *(2) HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#16L])
            +- Exchange hashpartitioning(value#6, 5)
               +- *(1) HashAggregate(keys=[value#6], functions=[partial_count(1)], output=[value#6, count#16L])
                  +- *(1) SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
                     +- *(1) MapElements <function1>, obj#5: java.lang.String
                        +- *(1) DeserializeToObject value#25.toString, obj#4: java.lang.String
                           +- *(1) Scan FakeDataSourceV2$[value#25]
```
## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #20477 from cloud-fan/explain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment