Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Issue for Estimated Rows #88455

Closed
bajinsheng opened this issue Sep 22, 2022 · 3 comments · Fixed by #88555
Closed

Potential Issue for Estimated Rows #88455

bajinsheng opened this issue Sep 22, 2022 · 3 comments · Fixed by #88555
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team
Projects

Comments

@bajinsheng
Copy link

bajinsheng commented Sep 22, 2022

Describe the problem

The estimated row count of the first SELECT statement may be far from real number.

To Reproduce

CREATE TABLE t0 (c0 INT);
CREATE TABLE t1 (c0 INT);
INSERT INTO t0 VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13);
INSERT INTO t1 VALUES (21),(22),(23),(24),(25);  
ANALYZE t0;
ANALYZE t1;

EXPLAIN SELECT * FROM t0 LEFT OUTER JOIN t1 ON t0.c0<1 OR t0.c0>1; -- estimated row count: 20
EXPLAIN SELECT * FROM t0 LEFT OUTER JOIN t1 ON t0.c0!=1; -- estimated row count: 60
EXPLAIN SELECT * FROM t0 INNER JOIN t1 ON t0.c0<1 OR t0.c0>1; -- estimated row count: 60

Expected behavior
The first and second SELECT statements are semantic equivalents, so I think both statements should have the same number of estimated rows? They don't in fact.

As a reference, I also rewrite LEFT JOIN to INNER JOIN and get the third SELECT statement. In my understanding, the INNER JOIN should have no greater estimated row count than LEFT JOIN.

Therefore, I suspect there is an issue for the estimated rows of the first SELECT statement.

Environment:

  • CockroachDB version [7cde315]
  • Server OS: [Ubuntu 20.04]
  • Client app [cockroach sql]

Additional context
Although the estimated row count is an approximate number, I think this may be a potential issue which may impact the query optimization. I hope it would provide valuable information. Looking forward your reply!

Jira issue: CRDB-19831

@bajinsheng bajinsheng added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Sep 22, 2022
@blathers-crl
Copy link

blathers-crl bot commented Sep 22, 2022

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I was unable to automatically find someone to ping.

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-untriaged blathers was unable to find an owner labels Sep 22, 2022
@yuzefovich yuzefovich removed the X-blathers-untriaged blathers was unable to find an owner label Sep 22, 2022
@yuzefovich yuzefovich added this to Triage in SQL Queries via automation Sep 22, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Sep 22, 2022
@DrewKimball DrewKimball moved this from Triage to Active in SQL Queries Sep 23, 2022
@DrewKimball DrewKimball self-assigned this Sep 23, 2022
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Sep 23, 2022
Previously, an `OR` expression with tight constraints would have its
selecitivy applied in `applyFilters` as expected, without incrementing
`numUnappliedConjuncts`. However, joins call into
`selectivityFromOredEquivalencies`, which would then increment
`numUnappliedConjuncts` for that `OR` if the disjuncts weren't all
conjunctions of equalities. This caused an additional factor of `1/3`
(`memo.unknownFilterSelectivity`) to be applied to the join row count
estimate.

This commit modifies `selectivityFromOredEquivalencies` to avoid
incrementing `numUnappliedConjuncts` for `OR` conditions with tight
constraints. This prevents the double-counting behavior. This commit
also removes a few `FiltersItem` copies from loops.

Fixes cockroachdb#88455

Release note: None
@DrewKimball
Copy link
Collaborator

Thanks for opening this issue! Looks like we were doing some double-counting when estimating the selectivity of OR expressions in join ON conditions.

@bajinsheng
Copy link
Author

bajinsheng commented Sep 24, 2022

Thanks for your prompt reply! I'm glad we can help CockroachDB get better!

craig bot pushed a commit that referenced this issue Sep 30, 2022
88555: opt: don't double count OR selectivity for joins r=DrewKimball a=DrewKimball

Previously, an `OR` expression with tight constraints would have its selecitivy applied in `applyFilters` as expected, without incrementing `numUnappliedConjuncts`. However, joins call into `selectivityFromOredEquivalencies`, which would then increment `numUnappliedConjuncts` for that `OR` if the disjuncts weren't all conjunctions of equalities. This caused an additional factor of `1/3` (`memo.unknownFilterSelectivity`) to be applied to the join row count estimate.

This commit modifies `selectivityFromOredEquivalencies` to avoid incrementing `numUnappliedConjuncts` for `OR` conditions with tight constraints. This prevents the double-counting behavior. This commit also removes a few `FiltersItem` copies from loops.

Fixes #88455

Release note: None

89050: rowexec: fix usage of the shared "single datum" acc in windower r=yuzefovich a=yuzefovich

This commit fixes the usage of the shared "single datum agg mem account" by the window builtins. Previously, we were updating the eval context with the memory account after the builtins were constructed. The impact of the bug on its own is pretty minor (namely that we could reserve more memory than necessary - the initial allocation is 10KiB), but I have some other changes brewing which make it more important to be precise about the attribution of the memory allocations.

Release note: None

Co-authored-by: DrewKimball <drewk@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig craig bot closed this as completed in 1c05628 Sep 30, 2022
SQL Queries automation moved this from Active to Done Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants