Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20392][SQL][followup] should not add extra AnalysisBarrier #20094

Closed
wants to merge 3 commits into from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Dec 27, 2017

What changes were proposed in this pull request?

I found this problem while auditing the analyzer code. It's dangerous to introduce extra AnalysisBarrer during analysis, as the plan inside it will bypass all analysis afterward, which may not be expected. We should only preserve AnalysisBarrer but not introduce new ones.

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @viirya @gatorsmile

if aggregate.resolved =>
case Filter(cond, AnalysisBarrier(agg: Aggregate)) =>
apply(Filter(cond, agg)).mapChildren(AnalysisBarrier)
case f @ Filter(cond, agg @ Aggregate(grouping, originalAggExprs, child)) if agg.resolved =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make the names shorter

val resolved = resolveExpression(expr, plan)
if (resolved.resolved) {
resolved
private def resolveExprsAndAddMissingAttrs(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the code to resolve expressions and add missing attributes in one shot, so that we have a central place to deal with analysis barrier and to decide which operator is supported and which is not.

@SparkQA
Copy link

SparkQA commented Dec 27, 2017

Test build #85439 has finished for PR 20094 at commit 64709fc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -665,14 +664,18 @@ class Analyzer(
* Generate a new logical plan for the right child with different expression IDs
* for all conflicting attributes.
*/
private def dedupRight (left: LogicalPlan, originalRight: LogicalPlan): LogicalPlan = {
// Remove analysis barrier if any.
val right = EliminateBarriers(originalRight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If right plan is wrapped (e.g., we join two datasets) in an analysis barrier, the later right.collect doesn't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see, you have recursively dedupRight on it.

@@ -723,7 +726,7 @@ class Analyzer(
s.withNewPlan(dedupOuterReferencesInSubquery(s.plan, attributeRewrites))
}
}
AnalysisBarrier(newRight)
newRight
Copy link
Member

@viirya viirya Dec 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newRight is introduced before to be wrapped in AnalysisBarrier. We can get rid of this redundant variable now.

case d: Distinct =>
(exprs.map(resolveExpression(_, d)), d)

case u: UnaryNode =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we stop at SubqueryAlias as before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch! I missed that because the logic was in resolveExpressionRecursively instead of addMissingAttr.

It indicates that it's more clear to merge these 2 methods :)

@viirya
Copy link
Member

viirya commented Dec 28, 2017

LGTM with two minor comments.

@SparkQA
Copy link

SparkQA commented Dec 28, 2017

Test build #85452 has finished for PR 20094 at commit 8879870.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 28, 2017

Test build #85450 has finished for PR 20094 at commit cd39760.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 28, 2017

Test build #85453 has finished for PR 20094 at commit 6a25d60.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 755f2f5 Dec 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants