-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30872][SQL] Constraints inferred from inferred attributes #27632
Conversation
Test build #118663 has finished for PR 27632 at commit
|
retest this please |
Test build #118668 has started for PR 27632 at commit |
Test build #118693 has finished for PR 27632 at commit
|
@@ -75,7 +85,7 @@ trait ConstraintHelper { | |||
inferredConstraints ++= replaceConstraints(predicates - eq, l, r) | |||
case _ => // No inference | |||
} | |||
inferredConstraints -- constraints | |||
(inferredConstraints -- constraints).filterNot(i => constraints.exists(_.semanticEquals(i))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the constraint
contains a = b
. This change is to filter out b = a
.
cc @cloud-fan |
Test build #118842 has finished for PR 27632 at commit
|
retest this please |
Test build #118855 has finished for PR 27632 at commit
|
# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -3404,6 +3404,15 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark | |||
""".stripMargin) | |||
checkAnswer(df, Row(Row(1, 2)) :: Nil) | |||
} | |||
|
|||
test("SPARK-30872: Constraints inferred from inferred attributes") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will throw TreeNodeException: Once strategy's idempotence is broken for batch Infer Filters
before this PR:
[info] - SPARK-30872: Constraints inferred from inferred attributes *** FAILED *** (146 milliseconds)
[info] org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Once strategy's idempotence is broken for batch Infer Filters
[info] Aggregate [count(1) AS count(1)#19182L] Aggregate [count(1) AS count(1)#19182L]
[info] +- Project +- Project
[info] ! +- Filter ((((((a#19179L = c#19181L) AND isnotnull(b#19180L)) AND isnotnull(c#19181L)) AND ((b#19180L = 3) OR (b#19180L = 13))) AND isnotnull(a#19179L)) AND (((a#19179L = b#19180L) AND (b#19180L = c#19181L)) AND ((c#19181L = 3) OR (c#19181L = 13)))) +- Filter (((a#19179L = 3) OR (a#19179L = 13)) AND ((((((a#19179L = c#19181L) AND isnotnull(b#19180L)) AND isnotnull(c#19181L)) AND ((b#19180L = 3) OR (b#19180L = 13))) AND isnotnull(a#19179L)) AND (((a#19179L = b#19180L) AND (b#19180L = c#19181L)) AND ((c#19181L = 3) OR (c#19181L = 13)))))
[info] +- Relation[a#19179L,b#19180L,c#19181L] parquet +- Relation[a#19179L,b#19180L,c#19181L] parquet
[info] , tree:
[info] Aggregate [count(1) AS count(1)#19182L]
[info] +- Project
[info] +- Filter (((a#19179L = 3) OR (a#19179L = 13)) AND ((((((a#19179L = c#19181L) AND isnotnull(b#19180L)) AND isnotnull(c#19181L)) AND ((b#19180L = 3) OR (b#19180L = 13))) AND isnotnull(a#19179L)) AND (((a#19179L = b#19180L) AND (b#19180L = c#19181L)) AND ((c#19181L = 3) OR (c#19181L = 13)))))
[info] +- Relation[a#19179L,b#19180L,c#19181L] parquet
[info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.checkBatchIdempotence(RuleExecutor.scala:100)
[info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:187)
[info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:132)
[info] at scala.collection.immutable.List.foreach(List.scala:392)
[info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:132)
[info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:111)
[info] at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
[info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:111)
[info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:82)
[info] at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
[info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:119)
[info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:762)
[info] at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:119)
[info] at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:82)
[info] at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:79)
[info] at org.apache.spark.sql.QueryTest.assertEmptyMissingInput(QueryTest.scala:231)
[info] at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:154)
[info] at org.apache.spark.sql.SQLQuerySuite.$anonfun$new$746(SQLQuerySuite.scala:3413)
[info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
Test build #118963 has finished for PR 27632 at commit
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR fix a special case about infer additional constraints. How to reproduce this issue:
We can infer more constraints:
(a#34L = 3) OR (a#34L = 13)
.Why are the changes needed?
Improve query performance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test and benchmark test.
Benchmark code and benchmark result: