Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12991] [SQL] Establish a link between SparkPlan and LogicalPlan nodes #11036

Closed

Conversation

mbautin
Copy link
Contributor

@mbautin mbautin commented Feb 2, 2016

This is a prerequisite for reusing RDDs corresponding to shared query
fragments between different Spark SQL queries, which helps improve
performance significantly on many analytical workloads even without
explicitly caching any tables.

mbautin and others added 2 commits February 2, 2016 15:48
…n nodes

This is a prerequisite for reusing RDDs corresponding to shared query
fragments between different Spark SQL queries, which helps improve
performance significantly on many analytical workloads even without
explicitly caching any tables.
Add an accessor method for `_logicalPlan`.
@cloud-fan
Copy link
Contributor

Did you link to the corrected JIRA ticket? And what's your whole design? This is a big change and we need to discuss if it worth.

@SparkQA
Copy link

SparkQA commented Feb 3, 2016

Test build #50614 has finished for PR 11036 at commit b79fd07.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mbautin mbautin changed the title [SPARK-12291] [SQL] Establish a link between SparkPlan and LogicalPlan nodes [SPARK-12991] [SQL] Establish a link between SparkPlan and LogicalPlan nodes Feb 3, 2016
@mbautin
Copy link
Contributor Author

mbautin commented Feb 3, 2016

Sorry -- linked to the wrong JIRA ticket. Should be https://issues.apache.org/jira/browse/SPARK-12991.

@mbautin
Copy link
Contributor Author

mbautin commented Feb 3, 2016

@cloud-fan : the whole design for the feature this is needed for (query fragment RDD reuse) is at https://issues.apache.org/jira/browse/SPARK-11838, but this seems to be the only part that cannot be done without modifying the Spark SQL code, because we need to find logical plans corresponding to generated RDDs somehow.

@rxin
Copy link
Contributor

rxin commented Feb 3, 2016

This is not going to work at all with whole-stage codgen, in which we collapse all pipelinable operators into a single generated function.

@mbautin
Copy link
Contributor Author

mbautin commented Feb 3, 2016

Even in that case, we could still obtain RDDs corresponding to SparkPlan nodes at stage boundaries, right? We would still find that useful in our query workload.

@rxin
Copy link
Contributor

rxin commented Feb 4, 2016

How would the change here help you with that?

@SparkQA
Copy link

SparkQA commented Feb 21, 2016

Test build #51606 has finished for PR 11036 at commit b79fd07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants