[SPARK-20281][SQL] Print the identical Range parameters of SparkContext APIs and SQL in explain #17670

maropu · 2017-04-18T14:20:02Z

What changes were proposed in this pull request?

This pr modified code to print the identical Range parameters of SparkContext APIs and SQL in explain output. In the current master, they internally use defaultParallelism for splits by default though, they print different strings in explain output;

scala> spark.range(4).explain
== Physical Plan ==
*Range (0, 4, step=1, splits=Some(8))

scala> sql("select * from range(4)").explain
== Physical Plan ==
*Range (0, 4, step=1, splits=None)

How was this patch tested?

Added tests in SQLQuerySuite and modified some results in the existing tests.

SparkQA · 2017-04-18T16:25:16Z

Test build #75900 has finished for PR 17670 at commit 18360ff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jaceklaskowski · 2017-04-18T19:27:35Z

I think the change should rather be here where the built-in table-valued function range is resolved.

maropu · 2017-04-18T23:37:00Z

@gatorsmile WDYT?

gatorsmile · 2017-04-20T05:04:01Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+    val scRange = sqlContext.range(10)
+    val sqlRange = sqlContext.sql("SELECT * FROM range(10)")
+    assert(explainStr(scRange) === explainStr(sqlRange))
+  }


I think this test case is not needed.

I'll revert

gatorsmile · 2017-04-20T05:14:22Z

As @jaceklaskowski said, it would be good to fill numSlices in the rule ResolveTableValuedFunctions, if it is None.

maropu · 2017-04-20T05:30:14Z

okay, I'll fix soon. Thanks!

maropu · 2017-04-20T06:07:44Z

Looking around the related code, I think we cannot easily set defaultParallelism inside catalyst.plans.logical because there is no obvious way to access SQLContext there. So, IIUC, we cannot easily set the value at numSlice in ResolveTableValuedFunctions . Another approach to share the default Range numSlice of SparkContext APIs and SQL is that we do not set the default value in SparkSession and we set the default value in RangeExec for both cases. Thought? @jaceklaskowski @gatorsmile

SparkQA · 2017-04-20T06:12:34Z

Test build #75971 has started for PR 17670 at commit a65d5c6.

gatorsmile · 2017-04-20T06:43:13Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

@@ -527,7 +527,7 @@ class SparkSession private(
  @Experimental
  @InterfaceStability.Evolving
  def range(start: Long, end: Long, step: Long): Dataset[java.lang.Long] = {
-    range(start, end, step, numPartitions = sparkContext.defaultParallelism)


How about reverting the changes in this file? We can make the PR small enough. We can backport it to 2.2

ya, good to me. I'll revert

gatorsmile · 2017-04-20T06:43:34Z

Ok, I am fine to keep the existing way.

gatorsmile · 2017-04-20T06:44:03Z

LGTM except a comment.

SparkQA · 2017-04-20T06:57:36Z

Test build #75974 has started for PR 17670 at commit e940c6f.

maropu · 2017-04-20T06:57:47Z

better to open another pr to backport into v2.2?

maropu · 2017-04-20T07:17:20Z

Jenkins, retest this please.

SparkQA · 2017-04-20T09:33:01Z

Test build #75975 has finished for PR 17670 at commit e940c6f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-04-21T01:25:19Z

ping

gatorsmile · 2017-04-21T02:40:49Z

Thanks! Merging to master/2.2

…xt APIs and SQL in explain ## What changes were proposed in this pull request? This pr modified code to print the identical `Range` parameters of SparkContext APIs and SQL in `explain` output. In the current master, they internally use `defaultParallelism` for `splits` by default though, they print different strings in explain output; ``` scala> spark.range(4).explain == Physical Plan == *Range (0, 4, step=1, splits=Some(8)) scala> sql("select * from range(4)").explain == Physical Plan == *Range (0, 4, step=1, splits=None) ``` ## How was this patch tested? Added tests in `SQLQuerySuite` and modified some results in the existing tests. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes #17670 from maropu/SPARK-20281. (cherry picked from commit 48d760d) Signed-off-by: Xiao Li <gatorsmile@gmail.com>

…xt APIs and SQL in explain ## What changes were proposed in this pull request? This pr modified code to print the identical `Range` parameters of SparkContext APIs and SQL in `explain` output. In the current master, they internally use `defaultParallelism` for `splits` by default though, they print different strings in explain output; ``` scala> spark.range(4).explain == Physical Plan == *Range (0, 4, step=1, splits=Some(8)) scala> sql("select * from range(4)").explain == Physical Plan == *Range (0, 4, step=1, splits=None) ``` ## How was this patch tested? Added tests in `SQLQuerySuite` and modified some results in the existing tests. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes apache#17670 from maropu/SPARK-20281.

Print the identical range parameters of SparkContext and SQL in EXPLAIN

18360ff

gatorsmile reviewed Apr 20, 2017

View reviewed changes

Remove the test

95e1f0d

Drop code to set the default value of numSlice for range in SparkSession

a65d5c6

maropu force-pushed the SPARK-20281 branch from 6e4b7ab to a65d5c6 Compare April 20, 2017 06:11

gatorsmile reviewed Apr 20, 2017

View reviewed changes

Revert code

e940c6f

asfgit closed this in 48d760d Apr 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20281][SQL] Print the identical Range parameters of SparkContext APIs and SQL in explain #17670

[SPARK-20281][SQL] Print the identical Range parameters of SparkContext APIs and SQL in explain #17670

maropu commented Apr 18, 2017

SparkQA commented Apr 18, 2017

jaceklaskowski commented Apr 18, 2017

maropu commented Apr 18, 2017

gatorsmile Apr 20, 2017

maropu Apr 20, 2017

gatorsmile commented Apr 20, 2017

maropu commented Apr 20, 2017

maropu commented Apr 20, 2017

SparkQA commented Apr 20, 2017

gatorsmile Apr 20, 2017 •

edited

Loading

maropu Apr 20, 2017

gatorsmile commented Apr 20, 2017

gatorsmile commented Apr 20, 2017

SparkQA commented Apr 20, 2017

maropu commented Apr 20, 2017 •

edited

Loading

maropu commented Apr 20, 2017

SparkQA commented Apr 20, 2017

maropu commented Apr 21, 2017

gatorsmile commented Apr 21, 2017

[SPARK-20281][SQL] Print the identical Range parameters of SparkContext APIs and SQL in explain #17670

[SPARK-20281][SQL] Print the identical Range parameters of SparkContext APIs and SQL in explain #17670

Conversation

maropu commented Apr 18, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Apr 18, 2017

jaceklaskowski commented Apr 18, 2017

maropu commented Apr 18, 2017

gatorsmile Apr 20, 2017

Choose a reason for hiding this comment

maropu Apr 20, 2017

Choose a reason for hiding this comment

gatorsmile commented Apr 20, 2017

maropu commented Apr 20, 2017

maropu commented Apr 20, 2017

SparkQA commented Apr 20, 2017

gatorsmile Apr 20, 2017 • edited Loading

Choose a reason for hiding this comment

maropu Apr 20, 2017

Choose a reason for hiding this comment

gatorsmile commented Apr 20, 2017

gatorsmile commented Apr 20, 2017

SparkQA commented Apr 20, 2017

maropu commented Apr 20, 2017 • edited Loading

maropu commented Apr 20, 2017

SparkQA commented Apr 20, 2017

maropu commented Apr 21, 2017

gatorsmile commented Apr 21, 2017

gatorsmile Apr 20, 2017 •

edited

Loading

maropu commented Apr 20, 2017 •

edited

Loading