Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24336][SQL] Support 'pass through' transformation in BasicOperators #21388

Closed
wants to merge 3 commits into from

Conversation

HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

Enable 'pass through' transformation in BasicOperators via reflection, so that every pairs of transformation which only requires converting LogicalPlan to SparkPlan via calling planLater() can be transformed automatically. It just needs to add the pair of transformation in map.

How was this patch tested?

Unit tests on existing tests.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90924 has finished for PR 21388 at commit 971abb6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


lazy val operatorToTargetConstructor: Map[Class[_ <: LogicalPlan], Constructor[_]] =
passThroughOperators.map {
case (srcOpCls, tgtOpCls) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


lazy val operatorToConstructorParameters: Map[Class[_ <: LogicalPlan], Seq[(String, Type)]] =
passThroughOperators.map {
case (srcOpCls, _) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def getClassFromTypeHandleArray(tpe: Type): Class[_] = cleanUpReflectionObjects {
tpe.dealias match {
case ty if ty <:< localTypeOf[Array[_]] =>
def arrayClassFromType(tpe: `Type`): Class[_] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm .. Could we avoid this nested function?

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90922 has finished for PR 21388 at commit 139aefa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

Thanks @HyukjinKwon for reviewing. Addressed review comments.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90946 has finished for PR 21388 at commit 6e6c375.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this please

@hvanhovell
Copy link
Contributor

I don’t think it is a good a idea to put reflection magic in the planner. If you want to add cases to the planner please use the existing hooks (SparkSessionExtensions, ExperimentalMethods or override SparkSessionBuilder). -1 on this PR.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90952 has finished for PR 21388 at commit 6e6c375.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented May 22, 2018

@hvanhovell
I also think someone might not want to have reflection magic (I was the one but realized I should do it), so I'm happy to close the PR when others voice same opinion on this too.

For me, reflection looks like the only way to achieve Can we automate these 'pass through' operations? (I might be wrong since I'm not expert on Scala), so if we decide to reject the approach, we might be better to either remove the line, or add description on restriction(s) instead, unless we have another immediate idea to achieve it without reflection.

Btw, I'd be very happy if you don't mind to spend some time to explain which points make you being concerned about reflection in planner. Maybe adding the description explicitly would avoid the similar trial on contributors and save our time.

@hvanhovell
Copy link
Contributor

@HeartSaVioR I really don't think we should automate these things at all. The planner is a pretty critical component, and I'd rather be explicit on how a LogicalPlan maps to a SparkPlan and have the benefit of compile time checks, then have some reflection glue doing this at runtime (which can potentially blow up).

What is the problem you are trying to solve here?

@HeartSaVioR
Copy link
Contributor Author

@hvanhovell
To be honest, I found the rationalization of the issue from a comment in Spark code:

// Can we automate these 'pass through' operations?

and I thought the comment makes sense: it would be beneficial if we just couple matching pair of (LogicalPlan, SparkPlan) for the cases which don't require some transformations while transforming.

For the first time, I tried my best to stick with compile-time things, but realized it is not possible to achieve without runtime reflection (at least for me) after couple of hours. So another couple of hours were spent on resolving.

I have no strong opinion to adopt reflection on planner (so happy to see the approach got rejected), but if we agree it cannot be handled without reflection, the origin comment should be removed, or describing limitations on addressing it so that others might try out with avoiding limitations.

HeartSaVioR added a commit to HeartSaVioR/spark that referenced this pull request Jun 20, 2018
* The option is no longer preferred one as below comment
  * apache#21388 (comment)
* Removing this to prevent contributors to waste their times
@HeartSaVioR
Copy link
Contributor Author

I just provided new patch to remove the comment, as it looks like no longer preferred option.
#21595

Closing this one.

@HeartSaVioR HeartSaVioR deleted the SPARK-24336 branch January 25, 2019 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants