Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20281][SQL] Print the identical Range parameters of SparkContext APIs and SQL in explain #17670

Closed
wants to merge 4 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Apr 18, 2017

What changes were proposed in this pull request?

This pr modified code to print the identical Range parameters of SparkContext APIs and SQL in explain output. In the current master, they internally use defaultParallelism for splits by default though, they print different strings in explain output;

scala> spark.range(4).explain
== Physical Plan ==
*Range (0, 4, step=1, splits=Some(8))

scala> sql("select * from range(4)").explain
== Physical Plan ==
*Range (0, 4, step=1, splits=None)

How was this patch tested?

Added tests in SQLQuerySuite and modified some results in the existing tests.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75900 has finished for PR 17670 at commit 18360ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jaceklaskowski
Copy link
Contributor

I think the change should rather be here where the built-in table-valued function range is resolved.

@maropu
Copy link
Member Author

maropu commented Apr 18, 2017

@gatorsmile WDYT?

val scRange = sqlContext.range(10)
val sqlRange = sqlContext.sql("SELECT * FROM range(10)")
assert(explainStr(scRange) === explainStr(sqlRange))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test case is not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert

@gatorsmile
Copy link
Member

As @jaceklaskowski said, it would be good to fill numSlices in the rule ResolveTableValuedFunctions, if it is None.

@maropu
Copy link
Member Author

maropu commented Apr 20, 2017

okay, I'll fix soon. Thanks!

@maropu
Copy link
Member Author

maropu commented Apr 20, 2017

Looking around the related code, I think we cannot easily set defaultParallelism inside catalyst.plans.logical because there is no obvious way to access SQLContext there. So, IIUC, we cannot easily set the value at numSlice in ResolveTableValuedFunctions . Another approach to share the default Range numSlice of SparkContext APIs and SQL is that we do not set the default value in SparkSession and we set the default value in RangeExec for both cases. Thought? @jaceklaskowski @gatorsmile

@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #75971 has started for PR 17670 at commit a65d5c6.

@@ -527,7 +527,7 @@ class SparkSession private(
@Experimental
@InterfaceStability.Evolving
def range(start: Long, end: Long, step: Long): Dataset[java.lang.Long] = {
range(start, end, step, numPartitions = sparkContext.defaultParallelism)
Copy link
Member

@gatorsmile gatorsmile Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about reverting the changes in this file? We can make the PR small enough. We can backport it to 2.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, good to me. I'll revert

@gatorsmile
Copy link
Member

Ok, I am fine to keep the existing way.

@gatorsmile
Copy link
Member

LGTM except a comment.

@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #75974 has started for PR 17670 at commit e940c6f.

@maropu
Copy link
Member Author

maropu commented Apr 20, 2017

better to open another pr to backport into v2.2?

@maropu
Copy link
Member Author

maropu commented Apr 20, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #75975 has finished for PR 17670 at commit e940c6f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 21, 2017

ping

@gatorsmile
Copy link
Member

Thanks! Merging to master/2.2

asfgit pushed a commit that referenced this pull request Apr 21, 2017
…xt APIs and SQL in explain

## What changes were proposed in this pull request?
This pr modified code to print the identical `Range` parameters of SparkContext APIs and SQL in `explain` output. In the current master, they internally use `defaultParallelism` for `splits` by default though, they print different strings in explain output;

```
scala> spark.range(4).explain
== Physical Plan ==
*Range (0, 4, step=1, splits=Some(8))

scala> sql("select * from range(4)").explain
== Physical Plan ==
*Range (0, 4, step=1, splits=None)
```

## How was this patch tested?
Added tests in `SQLQuerySuite` and modified some results in the existing tests.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #17670 from maropu/SPARK-20281.

(cherry picked from commit 48d760d)
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
@asfgit asfgit closed this in 48d760d Apr 21, 2017
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…xt APIs and SQL in explain

## What changes were proposed in this pull request?
This pr modified code to print the identical `Range` parameters of SparkContext APIs and SQL in `explain` output. In the current master, they internally use `defaultParallelism` for `splits` by default though, they print different strings in explain output;

```
scala> spark.range(4).explain
== Physical Plan ==
*Range (0, 4, step=1, splits=Some(8))

scala> sql("select * from range(4)").explain
== Physical Plan ==
*Range (0, 4, step=1, splits=None)
```

## How was this patch tested?
Added tests in `SQLQuerySuite` and modified some results in the existing tests.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes apache#17670 from maropu/SPARK-20281.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants