Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underestimated row count for rank window function #81831

Open
msirek opened this issue May 25, 2022 · 2 comments
Open

Underestimated row count for rank window function #81831

msirek opened this issue May 25, 2022 · 2 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects

Comments

@msirek
Copy link
Contributor

msirek commented May 25, 2022

Describe the problem
A query with a predicate rank_window_function_result = 1 is returning an estimate of 1 row, but the answer should be taking into account the number of distinct values in the PARTITION BY columns and ORDER BY columns. If there are subsequent joins in the query, a sub-optimal join plan may be picked to evaluate those joins due to the bad estimate.

To Reproduce

create table t1 (a int, b int, c int not null, primary key(a));
insert into t1 select g,g,g FROM generate_series(1,100000) g(g);
analyze t1;
EXPLAIN SELECT * FROM 
  (SELECT t1.*, rank() OVER (PARTITION BY a ORDER BY b) AS rank_ab FROM t1) 
  AS ab WHERE rank_ab = 1;
                                             info
----------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • filter
  │ estimated row count: 1
  │ filter: rank_ab = 1
  │
  └── • window
      │ estimated row count: 100,000
      │
      └── • scan
            estimated row count: 100,000 (100% of the table; stats collected 15 minutes ago)
            table: t1@t1_pkey
            spans: FULL SCAN

The esimated row count after the filter is 1, but it should be 100000.

Jira issue: CRDB-16087

@msirek msirek added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 25, 2022
@msirek msirek self-assigned this May 25, 2022
@msirek msirek added this to Triage in SQL Queries via automation May 25, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label May 25, 2022
@msirek
Copy link
Contributor Author

msirek commented May 25, 2022

This is related to support issue #1609

@yuzefovich
Copy link
Member

Nice find! I'd guess that other window functions like row_number, dense_rank, and maybe others also have the same problem.

@mgartner mgartner moved this from Triage to Backlog in SQL Queries May 31, 2022
@mgartner mgartner moved this from Backlog to 22.2 Release in SQL Queries May 31, 2022
@rytaft rytaft moved this from 22.2 Release to Active in SQL Queries May 31, 2022
@rytaft rytaft moved this from Active to 23.1 Release in SQL Queries Aug 2, 2022
@mgartner mgartner moved this from 23.1 Release to Backlog in SQL Queries Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Status: Backlog
SQL Queries
Backlog (DO NOT ADD NEW ISSUES)
Development

No branches or pull requests

2 participants