[SPARK-20392][SQL][followup] should not add extra AnalysisBarrier #20094

cloud-fan · 2017-12-27T18:57:49Z

What changes were proposed in this pull request?

I found this problem while auditing the analyzer code. It's dangerous to introduce extra AnalysisBarrer during analysis, as the plan inside it will bypass all analysis afterward, which may not be expected. We should only preserve AnalysisBarrer but not introduce new ones.

How was this patch tested?

existing tests

cloud-fan · 2017-12-27T18:57:59Z

cc @viirya @gatorsmile

cloud-fan · 2017-12-27T19:00:59Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-          if aggregate.resolved =>
+      case Filter(cond, AnalysisBarrier(agg: Aggregate)) =>
+        apply(Filter(cond, agg)).mapChildren(AnalysisBarrier)
+      case f @ Filter(cond, agg @ Aggregate(grouping, originalAggExprs, child)) if agg.resolved =>


just make the names shorter

cloud-fan · 2017-12-27T19:01:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-      val resolved = resolveExpression(expr, plan)
-      if (resolved.resolved) {
-        resolved
+    private def resolveExprsAndAddMissingAttrs(


I refactored the code to resolve expressions and add missing attributes in one shot, so that we have a central place to deal with analysis barrier and to decide which operator is supported and which is not.

SparkQA · 2017-12-27T20:41:58Z

Test build #85439 has finished for PR 20094 at commit 64709fc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-12-27T23:21:23Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -665,14 +664,18 @@ class Analyzer(
     * Generate a new logical plan for the right child with different expression IDs
     * for all conflicting attributes.
     */
-    private def dedupRight (left: LogicalPlan, originalRight: LogicalPlan): LogicalPlan = {
-      // Remove analysis barrier if any.
-      val right = EliminateBarriers(originalRight)


If right plan is wrapped (e.g., we join two datasets) in an analysis barrier, the later right.collect doesn't work.

oh, I see, you have recursively dedupRight on it.

viirya · 2017-12-28T02:14:15Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -723,7 +726,7 @@ class Analyzer(
                s.withNewPlan(dedupOuterReferencesInSubquery(s.plan, attributeRewrites))
            }
          }
-          AnalysisBarrier(newRight)
+          newRight


newRight is introduced before to be wrapped in AnalysisBarrier. We can get rid of this redundant variable now.

viirya · 2017-12-28T02:37:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+          case d: Distinct =>
+            (exprs.map(resolveExpression(_, d)), d)
+
+          case u: UnaryNode =>


Shouldn't we stop at SubqueryAlias as before?

ah good catch! I missed that because the logic was in resolveExpressionRecursively instead of addMissingAttr.

It indicates that it's more clear to merge these 2 methods :)

viirya · 2017-12-28T02:54:31Z

LGTM with two minor comments.

SparkQA · 2017-12-28T04:43:17Z

Test build #85452 has finished for PR 20094 at commit 8879870.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-28T05:11:33Z

Test build #85450 has finished for PR 20094 at commit cd39760.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-28T06:10:17Z

Test build #85453 has finished for PR 20094 at commit 6a25d60.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

LGTM

gatorsmile · 2017-12-28T13:30:20Z

Thanks! Merged to master.

cloud-fan commented Dec 27, 2017

View reviewed changes

viirya reviewed Dec 27, 2017

View reviewed changes

should not add extra AnalysisBarrier

cd39760

cloud-fan force-pushed the barrier branch from 64709fc to cd39760 Compare December 28, 2017 02:25

viirya reviewed Dec 28, 2017

View reviewed changes

cloud-fan added 2 commits December 28, 2017 10:55

address comments

8879870

one more comment

6a25d60

gatorsmile reviewed Dec 28, 2017

View reviewed changes

asfgit closed this in 755f2f5 Dec 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20392][SQL][followup] should not add extra AnalysisBarrier #20094

[SPARK-20392][SQL][followup] should not add extra AnalysisBarrier #20094

cloud-fan commented Dec 27, 2017 •

edited

Loading

cloud-fan commented Dec 27, 2017

cloud-fan Dec 27, 2017

cloud-fan Dec 27, 2017

SparkQA commented Dec 27, 2017

viirya Dec 27, 2017

viirya Dec 27, 2017

viirya Dec 28, 2017 •

edited

Loading

viirya Dec 28, 2017

cloud-fan Dec 28, 2017

viirya commented Dec 28, 2017

SparkQA commented Dec 28, 2017

SparkQA commented Dec 28, 2017

SparkQA commented Dec 28, 2017

gatorsmile left a comment

gatorsmile commented Dec 28, 2017

[SPARK-20392][SQL][followup] should not add extra AnalysisBarrier #20094

[SPARK-20392][SQL][followup] should not add extra AnalysisBarrier #20094

Conversation

cloud-fan commented Dec 27, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

cloud-fan commented Dec 27, 2017

cloud-fan Dec 27, 2017

Choose a reason for hiding this comment

cloud-fan Dec 27, 2017

Choose a reason for hiding this comment

SparkQA commented Dec 27, 2017

viirya Dec 27, 2017

Choose a reason for hiding this comment

viirya Dec 27, 2017

Choose a reason for hiding this comment

viirya Dec 28, 2017 • edited Loading

Choose a reason for hiding this comment

viirya Dec 28, 2017

Choose a reason for hiding this comment

cloud-fan Dec 28, 2017

Choose a reason for hiding this comment

viirya commented Dec 28, 2017

SparkQA commented Dec 28, 2017

SparkQA commented Dec 28, 2017

SparkQA commented Dec 28, 2017

gatorsmile left a comment

Choose a reason for hiding this comment

gatorsmile commented Dec 28, 2017

cloud-fan commented Dec 27, 2017 •

edited

Loading

viirya Dec 28, 2017 •

edited

Loading