Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17271] [SQL] Planner adds un-necessary Sort even if child orde… #14920

Closed

Conversation

tejasapatil
Copy link
Contributor

@tejasapatil tejasapatil commented Sep 1, 2016

What changes were proposed in this pull request?

Ports #14841 and #14910 from master to branch-2.0

Jira : https://issues.apache.org/jira/browse/SPARK-17271

Planner is adding un-needed SORT operation due to bug in the way comparison for SortOrder is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253
SortOrder needs to be compared semantically because Expression within two SortOrder can be "semantically equal" but not literally equal objects.

eg. In case of sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")

Expression in required SortOrder:

      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId,
        qualifier = Some("a")
      )

Expression in child SortOrder:

      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId)

Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.

This PR includes following changes:

  • Added a semanticEquals method to SortOrder so that it can compare underlying child expressions semantically (and not using default Object.equals)
  • Fixed EnsureRequirements to use semantic comparison of SortOrder

How was this patch tested?

  • Added a test case to PlannerSuite. Ran rest tests in PlannerSuite

…ring is semantically same as required ordering
@tejasapatil
Copy link
Contributor Author

ok to test

@hvanhovell
Copy link
Contributor

LGTM - pending jenkins.

@SparkQA
Copy link

SparkQA commented Sep 1, 2016

Test build #64786 has finished for PR 14920 at commit 2c9ab99.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Sep 1, 2016
## What changes were proposed in this pull request?

Ports #14841 and #14910 from `master` to `branch-2.0`

Jira : https://issues.apache.org/jira/browse/SPARK-17271

Planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253
`SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects.

eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")`

Expression in required SortOrder:
```
      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId,
        qualifier = Some("a")
      )
```

Expression in child SortOrder:
```
      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId)
```

Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.

This PR includes following changes:
- Added a `semanticEquals` method to `SortOrder` so that it can compare underlying child expressions semantically (and not using default Object.equals)
- Fixed `EnsureRequirements` to use semantic comparison of SortOrder

## How was this patch tested?

- Added a test case to `PlannerSuite`. Ran rest tests in `PlannerSuite`

Author: Tejas Patil <tejasp@fb.com>

Closes #14920 from tejasapatil/SPARK-17271_2.0_port.
@hvanhovell
Copy link
Contributor

Merging to branch-2.0. Thanks for the backport! Could you close the PR (the merge script can only close PRs against master)?

@tejasapatil tejasapatil closed this Sep 1, 2016
@tejasapatil tejasapatil deleted the SPARK-17271_2.0_port branch September 1, 2016 17:13
@tejasapatil
Copy link
Contributor Author

Thanks @hvanhovell !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants