Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct in Subqueries leads to unnecessary sort operations #349

Open
joka921 opened this issue Oct 9, 2020 · 0 comments
Open

Distinct in Subqueries leads to unnecessary sort operations #349

joka921 opened this issue Oct 9, 2020 · 0 comments

Comments

@joka921
Copy link
Member

joka921 commented Oct 9, 2020

If we have a subquery with a DISTINCT clause and the result of this subquery is reordered afterwards (because of joins or an order-By clause,
the query planner will often generate the following operations: (TODO: example)

SORT-A (by all columns to enable distict) -> DISTINCT -> SORT-B (according to the way the result is used afterwards).

If the sort oder of the columns of SORT-A would be chosen correctly, the SORT-B operation could be dropped.

There are several ways to achieve this:

  • Let the query planner create all possible permutations of sort columns for SORT-A, then the query planner would favor the one(s),
    that don't require the SORT-B operation. Unfortunately, this would require n! different entries for SORT-A where n is the number of columns. On the other hand it would be relatively straightforward to implement.

  • Make the query planner aware of the Sort order required by the SORT-B operation when creating the SORT-A + distinct.
    This would be the cleanest approach, but would require quite some changes to the current bottom up DP approach.

  • Let the query planner create the plans as it does at the moments, and afterwards check for DISTINCT operations with sorts before and after them, and rewire SORT-A. This should be relatively straightforward and solve the issue at Hand. However this might lead to nonoptimal join orders up the Execution tree.

  • Change the optimizer, s.T. it can assume that after a DISTINCT the result is sorted in any order it wants. (this is generally true, the special case that the SORT-A operation is not needed because the input to the DISTINCT operation is already ordered is rare and can easily be detected). The we get optimal query plans and afterwards insert the correct sort operation before the DISTINCT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant