Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: Unnecessary sort operator added by SELECT with WHERE clause #33023

Open
andy-kimball opened this issue Dec 11, 2018 · 6 comments
Open

opt: Unnecessary sort operator added by SELECT with WHERE clause #33023

andy-kimball opened this issue Dec 11, 2018 · 6 comments
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team

Comments

@andy-kimball
Copy link
Contributor

andy-kimball commented Dec 11, 2018

Repro:

CREATE TABLE xyz (x INT PRIMARY KEY, y INT, z INT);
CREATE TABLE abcd (a INT, b INT, c INT, d INT, INDEX ab(a, b) STORING (c, d), INDEX cd(c, d) STORING (a, b));
EXPLAIN (OPT) SELECT * FROM [INSERT INTO xyz SELECT b, c, d FROM abcd ORDER BY c, d LIMIT 100 RETURNING *] WHERE x=y ORDER BY y;

Expected: No explicit sort should be necessary
Actual: An explicit sort is added.

  sort
   ├── columns: x:5(int!null) y:6(int!null) z:7(int)
   ├── ordering: +(5|6) [provided: +5]
   └── select
        ├── columns: b:5(int!null) c:6(int!null) d:7(int)
        ├── insert xyz
        │    ├── columns: b:5(int!null) c:6(int) d:7(int)
        │    ├── insert-mapping:
        │    │    ├──  b:5(int!null) => x:1(int)
        │    │    ├──  c:6(int) => y:2(int)
        │    │    └──  d:7(int) => z:3(int)
        │    ├── internal-ordering: +6,+7
        │    └── scan abcd@cd
        │         ├── columns: b:5(int) c:6(int) d:7(int)
        │         ├── limit: 100
        │         ├── stats: [rows=100, distinct(5)=65.132156, null(5)=1, distinct(6)=65.132156, null(6)=1]
        │         └── ordering: +6,+7
        └── filters
             └── eq [type=bool, outer=(5,6), constraints=(/5: (/NULL - ]; /6: (/NULL - ]), fd=(5)==(6), (6)==(5)]
                  ├── variable: b [type=int]
                  └── variable: c [type=int]

Jira issue: CRDB-4704

@andy-kimball andy-kimball added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Dec 11, 2018
@andy-kimball andy-kimball added this to To do in BACKLOG, NO NEW ISSUES: SQL Optimizer via automation Dec 11, 2018
@andy-kimball
Copy link
Contributor Author

Note that removing the WHERE x=y clause no longer causes the sort to be added.

@RaduBerinde
Copy link
Member

Fixed by #33023:

  select                                                                                                      
   ├── columns: x:5(int!null) y:6(int!null) z:7(int)                                                          
   ├── cardinality: [0 - 100]                                                                                 
   ├── side-effects, mutations                                                                                
   ├── stats: [rows=1.53533993, distinct(5)=1.53533993, null(5)=0, distinct(6)=1.53533993, null(6)=0]         
   ├── cost: 109.03                                                                                           
   ├── fd: (5)==(6), (6)==(5)                                                                                 
   ├── ordering: +(5|6) [provided: +6]                                                                        
   ├── insert xyz                                                                                             
   │    ├── columns: b:5(int!null) c:6(int) d:7(int)                                                          
   │    ├── insert-mapping:                                                                                   
   │    │    ├──  b:5(int!null) => x:1(int)                                                                   
   │    │    ├──  c:6(int) => y:2(int)                                                                        
   │    │    └──  d:7(int) => z:3(int)                                                                        
   │    ├── internal-ordering: +6,+7                                                                          
   │    ├── cardinality: [0 - 100]                                                                            
   │    ├── side-effects, mutations                                                                           
   │    ├── stats: [rows=100]                                                                                 
   │    ├── cost: 108.02                                                                                      
   │    ├── ordering: +(5|6) [provided: +6]                                                                   
   │    └── scan abcd@cd                                                                                      
   │         ├── columns: b:5(int) c:6(int) d:7(int)                                                          
   │         ├── limit: 100                                                                                   
   │         ├── stats: [rows=100, distinct(5)=65.132156, null(5)=1, distinct(6)=65.132156, null(6)=1]        
   │         ├── cost: 108.01                                                                                 
   │         └── ordering: +6,+7                                                                              
   └── filters                                                                                                
        └── eq [type=bool, outer=(5,6), constraints=(/5: (/NULL - ]; /6: (/NULL - ]), fd=(5)==(6), (6)==(5)]  
             ├── variable: b [type=int]                                                                       
             └── variable: c [type=int]                                                                       

BACKLOG, NO NEW ISSUES: SQL Optimizer automation moved this from To do to Done Dec 11, 2018
@RaduBerinde
Copy link
Member

I am backing out the "fix". Columns in an OrderingChoice column group must be equivalent. There are subtle issues that show up if we relax that, and queries where we have equalities between columns (other than joins) shouldn't be very common. In the Select case, the columns are not equivalent under the select and we must make an (arbitrary) choice between them.

As an example of a subtle issue: say that we have something like

SELECT * FROM (SELECT a,b,c,k FROM t UNION ALL <something>) WHERE b=k ORDER BY a,b,c

We have a Select (b=k) with required ordering +a,+b,+c. If we require ordering +a,+(b|k),+c from its input, then that might choose to do a streaming UNION along columns +a,+b,+c. Say that k is a key in t (but it won't be a key anymore at the level of the UNION ALL). Under the UNION, the ordering would get incorrectly simplified to +a,+(b|k). Now that operator might emit rows where column c is incorrectly ordered (which would break the assumption used by streaming union execution). The key issue here is that Simplify assumes that columns in groups are equivalent. I don't know how it would work if it didn't make that assumption. In this case, we would want to express that we require ordering +a,+k OR +a,+b,+c, but OrderingChoice is not capable of expressing that.

@RaduBerinde RaduBerinde reopened this Dec 12, 2018
BACKLOG, NO NEW ISSUES: SQL Optimizer automation moved this from Done to In progress Dec 12, 2018
@RaduBerinde
Copy link
Member

An idea is to use interesting orderings when making a choice for which subset of columns to keep.

@andy-kimball andy-kimball moved this from In progress to Higher Priority Backlog in BACKLOG, NO NEW ISSUES: SQL Optimizer Feb 7, 2019
@RaduBerinde RaduBerinde moved this from Higher Priority Backlog to Lower Priority Backlog in BACKLOG, NO NEW ISSUES: SQL Optimizer Feb 20, 2019
@RaduBerinde RaduBerinde moved this from Lower Priority Backlog to Plan enhancements (lower priority) in BACKLOG, NO NEW ISSUES: SQL Optimizer Apr 18, 2020
rytaft added a commit to rytaft/cockroach that referenced this issue May 3, 2021
This commit updates the logic for determining whether an OrderingChoice
can imply another OrderingChoice. The new logic allows columns in a group to be
treated like optional columns after one of the columns is used. For example,
"+1,+2,+3" implies "+(1|2),+3" now returns true. Prior to this commit, it
returned false, which resulted in unnecessary sort operations in some cases.
This commit also updates the corresponding logic for intersecting and finding
the common prefix of two OrderingChoices.

Furthermore, we now use interesting orderings to inform the required ordering
passed down to the children of Select expressions. This makes it more likely
that the child will be able to provide the required ordering.

Informs cockroachdb#33023

Release note (performance improvement): increased the intelligence
of the optimizer around orderings that can be provided by certain
relational expressions when there are equalities between columns.
This can allow the optimizer to remove unnecessary sort operations in
some cases, thus improving performance.
rytaft added a commit to rytaft/cockroach that referenced this issue May 3, 2021
This commit updates the logic for determining whether an OrderingChoice
can imply another OrderingChoice. The new logic allows columns in a group to be
treated like optional columns after one of the columns is used. For example,
"+1,+2,+3" implies "+(1|2),+3" now returns true. Prior to this commit, it
returned false, which resulted in unnecessary sort operations in some cases.
This commit also updates the corresponding logic for intersecting and finding
the common prefix of two OrderingChoices.

Furthermore, we now use interesting orderings to inform the required ordering
passed down to the children of Select expressions. This makes it more likely
that the child will be able to provide the required ordering.

Informs cockroachdb#33023

Release note (performance improvement): increased the intelligence
of the optimizer around orderings that can be provided by certain
relational expressions when there are equalities between columns.
This can allow the optimizer to remove unnecessary sort operations in
some cases, thus improving performance.
rytaft added a commit to rytaft/cockroach that referenced this issue May 3, 2021
This commit updates the logic for determining whether an OrderingChoice
can imply another OrderingChoice. The new logic allows columns in a group to be
treated like optional columns after one of the columns is used. For example,
"+1,+2,+3" implies "+(1|2),+3" now returns true. Prior to this commit, it
returned false, which resulted in unnecessary sort operations in some cases.
This commit also updates the corresponding logic for intersecting and finding
the common prefix of two OrderingChoices.

Furthermore, we now use interesting orderings to inform the required ordering
passed down to the children of Select expressions. This makes it more likely
that the child will be able to provide the required ordering.

Informs cockroachdb#33023

Release note (performance improvement): increased the intelligence
of the optimizer around orderings that can be provided by certain
relational expressions when there are equalities between columns.
This can allow the optimizer to remove unnecessary sort operations in
some cases, thus improving performance.
rytaft added a commit to rytaft/cockroach that referenced this issue May 4, 2021
This commit updates the logic for determining whether an OrderingChoice
can imply another OrderingChoice. The new logic allows columns in a group to be
treated like optional columns after one of the columns is used. For example,
"+1,+2,+3" implies "+(1|2),+3" now returns true. Prior to this commit, it
returned false, which resulted in unnecessary sort operations in some cases.
This commit also updates the corresponding logic for intersecting and finding
the common prefix of two OrderingChoices.

Furthermore, we now use interesting orderings to inform the required ordering
passed down to the children of Select expressions. This makes it more likely
that the child will be able to provide the required ordering.

Informs cockroachdb#33023

Release note (performance improvement): increased the intelligence
of the optimizer around orderings that can be provided by certain
relational expressions when there are equalities between columns.
This can allow the optimizer to remove unnecessary sort operations in
some cases, thus improving performance.
craig bot pushed a commit that referenced this issue May 4, 2021
64593: opt: allow OrderingChoices to imply more orderings r=rytaft a=rytaft

This commit updates the logic for determining whether an `OrderingChoice`
can imply another `OrderingChoice`. The new logic allows columns in a group to be
treated like optional columns after one of the columns is used. For example,
"+1,+2,+3" implies "+(1|2),+3" now returns true. Prior to this commit, it
returned false, which resulted in unnecessary sort operations in some cases.
This commit also updates the corresponding logic for intersecting and finding
the common prefix of two `OrderingChoice`s.

Furthermore, we now use interesting orderings to inform the required ordering
passed down to the children of `Select` expressions. This makes it more likely
that the child will be able to provide the required ordering.

Informs #33023

Release note (performance improvement): increased the intelligence
of the optimizer around orderings that can be provided by certain
relational expressions when there are equalities between columns.
This can allow the optimizer to remove unnecessary sort operations in
some cases, thus improving performance.

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@github-actions
Copy link

github-actions bot commented Jun 5, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team
Projects
BACKLOG, NO NEW ISSUES: SQL Optimizer
Plan enhancements (lower priority)
Status: Backlog
Development

No branches or pull requests

3 participants