Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspicious Estimated Rows by DISTINCT #90113

Open
bajinsheng opened this issue Oct 18, 2022 · 2 comments
Open

Suspicious Estimated Rows by DISTINCT #90113

bajinsheng opened this issue Oct 18, 2022 · 2 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-quick-win Likely to be a quick win for someone experienced. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner
Projects

Comments

@bajinsheng
Copy link

bajinsheng commented Oct 18, 2022

Describe the problem

The number of estimated rows of the SELECT statement with RIGHT JOIN is smaller than that with INNER JOIN.

To Reproduce

CREATE TABLE t0 (c0 INT);
CREATE TABLE t1 (c0 INT, c1 TIMESTAMPTZ);
CREATE TABLE t2 (c0 TIMESTAMPTZ[]);
INSERT INTO t0 (rowid) VALUES(438922823), (1547647092);
INSERT INTO t1 (c0) VALUES(-909104110);
INSERT INTO t1 (c1) VALUES(TIMESTAMPTZ '1969-12-08T02:39:43');
INSERT INTO t2 (c0) VALUES(ARRAY[]), (ARRAY[]), (ARRAY[]);
CREATE INDEX ON t1(c1);
CREATE INDEX ON t2(c0);
ANALYZE t0;
ANALYZE t1;
ANALYZE t2;

EXPLAIN SELECT DISTINCT t0.rowid, t1.c1 FROM t1, t0 RIGHT OUTER JOIN t2 ON ((t2.c0)>=(NULLIF(t2.c0, t2.c0))); -- 3 rows
EXPLAIN SELECT DISTINCT t0.rowid, t1.c1 FROM t1, t0 INNER JOIN t2 ON ((t2.c0)>=(NULLIF(t2.c0, t2.c0))); -- 4 rows

Expected behavior
The query plan of the first SELECT

                                           info
-------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • distinct
  │ estimated row count: 3
  │ distinct on: c1, rowid
  │
  └── • cross join
      │ estimated row count: 6
      │
      ├── • cross join (left outer)
      │   │ estimated row count: 3
      │   │ pred: c0 >= CASE c0 WHEN c0 THEN CAST(NULL AS TIMESTAMPTZ[]) ELSE c0 END
      │   │
      │   ├── • scan
      │   │     estimated row count: 3 (100% of the table; stats collected 0 seconds ago)
      │   │     table: t2@t2_pkey
      │   │     spans: FULL SCAN
      │   │
      │   └── • scan
      │         estimated row count: 2 (100% of the table; stats collected 0 seconds ago)
      │         table: t0@t0_pkey
      │         spans: FULL SCAN
      │
      └── • scan
            estimated row count: 2 (100% of the table; stats collected 0 seconds ago)
            table: t1@t1_c1_idx
            spans: FULL SCAN
(28 rows)

The query plan of the second SELECT

                                             info
-----------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • distinct
  │ estimated row count: 4
  │ distinct on: c1, rowid
  │
  └── • cross join
      │ estimated row count: 4
      │
      ├── • scan
      │     estimated row count: 2 (100% of the table; stats collected 0 seconds ago)
      │     table: t1@t1_c1_idx
      │     spans: FULL SCAN
      │
      └── • cross join
          │ estimated row count: 2
          │
          ├── • scan
          │     estimated row count: 2 (100% of the table; stats collected 0 seconds ago)
          │     table: t0@t0_pkey
          │     spans: FULL SCAN
          │
          └── • filter
              │ estimated row count: 1
              │ filter: c0 >= CASE c0 WHEN c0 THEN CAST(NULL AS TIMESTAMPTZ[]) ELSE c0 END
              │
              └── • scan
                    estimated row count: 3 (100% of the table; stats collected 0 seconds ago)
                    table: t2@t2_pkey
                    spans: FULL SCAN
(31 rows)

For the CROSS JOIN operator in query plans, the first SELECT returns more estimated rows than the second SELECT. However, after the DISTINCT operator, the first SELECT returns less estimated rows than the second SELECT. The results are unexpected.

Environment:

  • CockroachDB version [fbfb71b]
  • Server OS: [Linux/Ubuntu 20.04]
  • Client app [cockroach sql]

Jira issue: CRDB-20590

@bajinsheng bajinsheng added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 18, 2022
@blathers-crl
Copy link

blathers-crl bot commented Oct 18, 2022

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/sql-queries (found keywords: vectorized,plan)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Oct 18, 2022
@mgartner mgartner added this to Triage in SQL Queries via automation Oct 18, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Oct 18, 2022
@mgartner mgartner moved this from Triage to Backlog in SQL Queries Oct 18, 2022
@rytaft rytaft added the E-quick-win Likely to be a quick win for someone experienced. label Oct 18, 2022
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-quick-win Likely to be a quick win for someone experienced. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner
Projects
Status: Backlog
SQL Queries
Backlog (DO NOT ADD NEW ISSUES)
Development

No branches or pull requests

3 participants