[SPARK-24336][SQL] Support 'pass through' transformation in BasicOperators #21388

HeartSaVioR · 2018-05-21T23:35:54Z

What changes were proposed in this pull request?

Enable 'pass through' transformation in BasicOperators via reflection, so that every pairs of transformation which only requires converting LogicalPlan to SparkPlan via calling planLater() can be transformed automatically. It just needs to add the pair of transformation in map.

How was this patch tested?

Unit tests on existing tests.

…ators

SparkQA · 2018-05-22T01:26:30Z

Test build #90924 has finished for PR 21388 at commit 971abb6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-05-22T01:37:48Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

+
+    lazy val operatorToTargetConstructor: Map[Class[_ <: LogicalPlan], Constructor[_]] =
+      passThroughOperators.map {
+        case (srcOpCls, tgtOpCls) =>


HyukjinKwon · 2018-05-22T01:38:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

+
+    lazy val operatorToConstructorParameters: Map[Class[_ <: LogicalPlan], Seq[(String, Type)]] =
+      passThroughOperators.map {
+        case (srcOpCls, _) =>


nit:

.map { case (srcOpCls, _) =>

per https://github.com/databricks/scala-style-guide#pattern-matching

HyukjinKwon · 2018-05-22T01:40:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala

+  def getClassFromTypeHandleArray(tpe: Type): Class[_] = cleanUpReflectionObjects {
+    tpe.dealias match {
+      case ty if ty <:< localTypeOf[Array[_]] =>
+        def arrayClassFromType(tpe: `Type`): Class[_] =


hmm .. Could we avoid this nested function?

SparkQA · 2018-05-22T03:11:07Z

Test build #90922 has finished for PR 21388 at commit 139aefa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-05-22T08:27:51Z

Thanks @HyukjinKwon for reviewing. Addressed review comments.

SparkQA · 2018-05-22T09:52:20Z

Test build #90946 has finished for PR 21388 at commit 6e6c375.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-05-22T10:01:41Z

retest this please

hvanhovell · 2018-05-22T11:03:54Z

I don’t think it is a good a idea to put reflection magic in the planner. If you want to add cases to the planner please use the existing hooks (SparkSessionExtensions, ExperimentalMethods or override SparkSessionBuilder). -1 on this PR.

SparkQA · 2018-05-22T12:00:45Z

Test build #90952 has finished for PR 21388 at commit 6e6c375.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-05-22T13:32:33Z

@hvanhovell
I also think someone might not want to have reflection magic (I was the one but realized I should do it), so I'm happy to close the PR when others voice same opinion on this too.

For me, reflection looks like the only way to achieve Can we automate these 'pass through' operations? (I might be wrong since I'm not expert on Scala), so if we decide to reject the approach, we might be better to either remove the line, or add description on restriction(s) instead, unless we have another immediate idea to achieve it without reflection.

Btw, I'd be very happy if you don't mind to spend some time to explain which points make you being concerned about reflection in planner. Maybe adding the description explicitly would avoid the similar trial on contributors and save our time.

hvanhovell · 2018-06-13T08:23:35Z

@HeartSaVioR I really don't think we should automate these things at all. The planner is a pretty critical component, and I'd rather be explicit on how a LogicalPlan maps to a SparkPlan and have the benefit of compile time checks, then have some reflection glue doing this at runtime (which can potentially blow up).

What is the problem you are trying to solve here?

HeartSaVioR · 2018-06-13T12:46:16Z

@hvanhovell
To be honest, I found the rationalization of the issue from a comment in Spark code:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

Lines 496 to 497 in 4c388bc


	// Can we automate these 'pass through' operations?

and I thought the comment makes sense: it would be beneficial if we just couple matching pair of (LogicalPlan, SparkPlan) for the cases which don't require some transformations while transforming.

For the first time, I tried my best to stick with compile-time things, but realized it is not possible to achieve without runtime reflection (at least for me) after couple of hours. So another couple of hours were spent on resolving.

I have no strong opinion to adopt reflection on planner (so happy to see the approach got rejected), but if we agree it cannot be handled without reflection, the origin comment should be removed, or describing limitations on addressing it so that others might try out with avoiding limitations.

* The option is no longer preferred one as below comment * apache#21388 (comment) * Removing this to prevent contributors to waste their times

HeartSaVioR · 2018-06-20T01:54:13Z

I just provided new patch to remove the comment, as it looks like no longer preferred option.
#21595

Closing this one.

[SPARK-24336][SQL] Support 'pass through' transformation in BasicOper…

139aefa

…ators

HeartSaVioR force-pushed the SPARK-24336 branch from 4c2f700 to 139aefa Compare May 21, 2018 23:36

remove case which is already defined in 'pass through'

971abb6

HyukjinKwon reviewed May 22, 2018

View reviewed changes

Respect style guide, move nested function to private method

6e6c375

HeartSaVioR mentioned this pull request Jun 20, 2018

[MINOR][SQL] Remove invalid comment from SparkStrategies #21595

Closed

HeartSaVioR closed this Jun 20, 2018

HeartSaVioR deleted the SPARK-24336 branch January 25, 2019 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-24336][SQL] Support 'pass through' transformation in BasicOperators #21388

[SPARK-24336][SQL] Support 'pass through' transformation in BasicOperators #21388

HeartSaVioR commented May 21, 2018

SparkQA commented May 22, 2018

HyukjinKwon May 22, 2018

HyukjinKwon May 22, 2018

HyukjinKwon May 22, 2018

SparkQA commented May 22, 2018

HeartSaVioR commented May 22, 2018

SparkQA commented May 22, 2018

HeartSaVioR commented May 22, 2018

hvanhovell commented May 22, 2018

SparkQA commented May 22, 2018

HeartSaVioR commented May 22, 2018 •

edited

hvanhovell commented Jun 13, 2018

HeartSaVioR commented Jun 13, 2018

HeartSaVioR commented Jun 20, 2018

[SPARK-24336][SQL] Support 'pass through' transformation in BasicOperators #21388

[SPARK-24336][SQL] Support 'pass through' transformation in BasicOperators #21388

Conversation

HeartSaVioR commented May 21, 2018

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented May 22, 2018

HyukjinKwon May 22, 2018

Choose a reason for hiding this comment

HyukjinKwon May 22, 2018

Choose a reason for hiding this comment

HyukjinKwon May 22, 2018

Choose a reason for hiding this comment

SparkQA commented May 22, 2018

HeartSaVioR commented May 22, 2018

SparkQA commented May 22, 2018

HeartSaVioR commented May 22, 2018

hvanhovell commented May 22, 2018

SparkQA commented May 22, 2018

HeartSaVioR commented May 22, 2018 • edited

hvanhovell commented Jun 13, 2018

HeartSaVioR commented Jun 13, 2018

HeartSaVioR commented Jun 20, 2018

HeartSaVioR commented May 22, 2018 •

edited