-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17271] [SQL] Planner adds un-necessary Sort even if child ordering is semantically same as required ordering #14841
Conversation
…ring is semantically same as required ordering
ok to test |
Test build #64512 has finished for PR 14841 at commit
|
cc'ing @rxin and @hvanhovell for review |
LGTM - merging to master. Thanks! |
@@ -61,6 +61,9 @@ case class SortOrder(child: Expression, direction: SortDirection) | |||
override def sql: String = child.sql + " " + direction.sql | |||
|
|||
def isAscending: Boolean = direction == Ascending | |||
|
|||
def semanticEquals(other: SortOrder): Boolean = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expression
has a default implementation of semanticEquals
, doesn't it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan : If you look at the old version of EnsureRequirements
below at L253, it compared raw SortOrder
objects which will use equals()
generated for it. In scala, equals()
for case classes is merely doing equals()
over all its fields so that lead to Expression
's equals()
being used instead of its semanticEquals()
.
My fix here was to introduce a semanticEquals
in SortOrder
which compares the underlying Expression
semantically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I understand in EnsureRequirements
we should use semanticEquals
instead of ==
to compare SortOrder
, but why we need to implement samanticEquals
again in SortOrder
? What's wrong with the default implementation?
I mean, there is no need to "introduce" a semanticEquals
in SortOrder
, it already has, because SortOrder
is also Expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan : I see what you were trying to say before. I tried that and it worked. I have created a PR to clean it up : #14910 Thanks for pointing this out !!
## What changes were proposed in this pull request? Removing `semanticEquals()` from `SortOrder` because it can use the `semanticEquals()` provided by its parent class (`Expression`). This was as per suggestion by cloud-fan at https://github.com/apache/spark/pull/14841/files/7192418b3a26a14642fc04fc92bf496a954ffa5d#r77106801 ## How was this patch tested? Ran the test added in apache#14841 Author: Tejas Patil <tejasp@fb.com> Closes apache#14910 from tejasapatil/SPARK-17271_remove_semantic_ordering.
Should this go into 2.0? |
yea, I think so, it's pretty safe. @tejasapatil do you mind sending a new PR against branch 2.0 to backport this PR and the follow-up one? thanks! |
## What changes were proposed in this pull request? Ports #14841 and #14910 from `master` to `branch-2.0` Jira : https://issues.apache.org/jira/browse/SPARK-17271 Planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 `SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects. eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` Expression in required SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId, qualifier = Some("a") ) ``` Expression in child SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId) ``` Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order. This PR includes following changes: - Added a `semanticEquals` method to `SortOrder` so that it can compare underlying child expressions semantically (and not using default Object.equals) - Fixed `EnsureRequirements` to use semantic comparison of SortOrder ## How was this patch tested? - Added a test case to `PlannerSuite`. Ran rest tests in `PlannerSuite` Author: Tejas Patil <tejasp@fb.com> Closes #14920 from tejasapatil/SPARK-17271_2.0_port.
What changes were proposed in this pull request?
Jira : https://issues.apache.org/jira/browse/SPARK-17271
Planner is adding un-needed SORT operation due to bug in the way comparison for
SortOrder
is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253SortOrder
needs to be compared semantically becauseExpression
within twoSortOrder
can be "semantically equal" but not literally equal objects.eg. In case of
sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")
Expression in required SortOrder:
Expression in child SortOrder:
Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.
This PR includes following changes:
semanticEquals
method toSortOrder
so that it can compare underlying child expressions semantically (and not using default Object.equals)EnsureRequirements
to use semantic comparison of SortOrderHow was this patch tested?
PlannerSuite
. Ran rest tests inPlannerSuite