Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17271] [SQL] Planner adds un-necessary Sort even if child ordering is semantically same as required ordering #14841

Conversation

tejasapatil
Copy link
Contributor

@tejasapatil tejasapatil commented Aug 27, 2016

What changes were proposed in this pull request?

Jira : https://issues.apache.org/jira/browse/SPARK-17271

Planner is adding un-needed SORT operation due to bug in the way comparison for SortOrder is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253
SortOrder needs to be compared semantically because Expression within two SortOrder can be "semantically equal" but not literally equal objects.

eg. In case of sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")

Expression in required SortOrder:

      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId,
        qualifier = Some("a")
      )

Expression in child SortOrder:

      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId)

Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.

This PR includes following changes:

  • Added a semanticEquals method to SortOrder so that it can compare underlying child expressions semantically (and not using default Object.equals)
  • Fixed EnsureRequirements to use semantic comparison of SortOrder

How was this patch tested?

  • Added a test case to PlannerSuite. Ran rest tests in PlannerSuite

…ring is semantically same as required ordering
@tejasapatil
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented Aug 27, 2016

Test build #64512 has finished for PR 14841 at commit 7192418.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor Author

cc'ing @rxin and @hvanhovell for review

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@@ -61,6 +61,9 @@ case class SortOrder(child: Expression, direction: SortDirection)
override def sql: String = child.sql + " " + direction.sql

def isAscending: Boolean = direction == Ascending

def semanticEquals(other: SortOrder): Boolean =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expression has a default implementation of semanticEquals, doesn't it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan : If you look at the old version of EnsureRequirements below at L253, it compared raw SortOrder objects which will use equals() generated for it. In scala, equals() for case classes is merely doing equals() over all its fields so that lead to Expression's equals() being used instead of its semanticEquals().

My fix here was to introduce a semanticEquals in SortOrder which compares the underlying Expression semantically.

Copy link
Contributor

@cloud-fan cloud-fan Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I understand in EnsureRequirements we should use semanticEquals instead of == to compare SortOrder, but why we need to implement samanticEquals again in SortOrder? What's wrong with the default implementation?

I mean, there is no need to "introduce" a semanticEquals in SortOrder, it already has, because SortOrder is also Expression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan : I see what you were trying to say before. I tried that and it worked. I have created a PR to clean it up : #14910 Thanks for pointing this out !!

ghost pushed a commit to dbtsai/spark that referenced this pull request Sep 1, 2016
## What changes were proposed in this pull request?

Removing `semanticEquals()` from `SortOrder` because it can use the `semanticEquals()` provided by its parent class (`Expression`). This was as per suggestion by cloud-fan at https://github.com/apache/spark/pull/14841/files/7192418b3a26a14642fc04fc92bf496a954ffa5d#r77106801

## How was this patch tested?

Ran the test added in apache#14841

Author: Tejas Patil <tejasp@fb.com>

Closes apache#14910 from tejasapatil/SPARK-17271_remove_semantic_ordering.
@rxin
Copy link
Contributor

rxin commented Sep 1, 2016

Should this go into 2.0?

@cloud-fan
Copy link
Contributor

yea, I think so, it's pretty safe. @tejasapatil do you mind sending a new PR against branch 2.0 to backport this PR and the follow-up one? thanks!

asfgit pushed a commit that referenced this pull request Sep 1, 2016
## What changes were proposed in this pull request?

Ports #14841 and #14910 from `master` to `branch-2.0`

Jira : https://issues.apache.org/jira/browse/SPARK-17271

Planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253
`SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects.

eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")`

Expression in required SortOrder:
```
      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId,
        qualifier = Some("a")
      )
```

Expression in child SortOrder:
```
      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId)
```

Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.

This PR includes following changes:
- Added a `semanticEquals` method to `SortOrder` so that it can compare underlying child expressions semantically (and not using default Object.equals)
- Fixed `EnsureRequirements` to use semantic comparison of SortOrder

## How was this patch tested?

- Added a test case to `PlannerSuite`. Ran rest tests in `PlannerSuite`

Author: Tejas Patil <tejasp@fb.com>

Closes #14920 from tejasapatil/SPARK-17271_2.0_port.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants