Skip to content

stats: fix cardinality estimation bug#603

Merged
adsharma merged 1 commit into
mainfrom
fix/hll-stats-selection-vector
Jun 19, 2026
Merged

stats: fix cardinality estimation bug#603
adsharma merged 1 commit into
mainfrom
fix/hll-stats-selection-vector

Conversation

@adsharma

Copy link
Copy Markdown
Contributor

Honor selection vector so we estimate the same cardinality regardless of inserting 1 row at a time or using UNWIND

CREATE NODE TABLE stats_node(
    id INT64,
    common INT64,
    rare INT64,
    PRIMARY KEY(id)
);

UNWIND range(0, 19) AS i
CREATE (:stats_node {id: i, common: i % 2, rare: i});

ANALYZE stats_node;

EXPLAIN  LOGICAL
MATCH (n:stats_node)
WHERE n.common = 1 AND n.rare = 7
RETURN n.id;

now produces the same query plan as using 20 individual CREATE statements.

@adsharma adsharma merged commit a921a4c into main Jun 19, 2026
4 checks passed
@adsharma adsharma deleted the fix/hll-stats-selection-vector branch June 19, 2026 18:40
@M0nkeyFl0wer

Copy link
Copy Markdown

Followed this over from the note about UNWIND vs a CREATE loop changing the optimizer's behavior. Built from a checkout that still had the bug and reproduced it independently before applying #603.

With UNWIND range(0,19) AS i CREATE (:stats_node {id:i, common:i%2, rare:i}), rare comes out with numDistinct around 1 (vs ~20 with 20 literal CREATEs), so the stats-aware filter push down inverts and filters the non selective common first.

One useful tell while narrowing it down: rare: i (a direct reference to the UNWIND variable) collapses to 1, but rare: i + 0 (any computed expression) is fine. That lines up exactly with the flat position vs selection vector read issue that #603 fixes.

Applied #603 locally and it resolves it. All three variants then report rare numDistinct around 20 and produce the identical plan. 👍

One small observation in case it's useful: the existing FilterPushDownOrdersMostSelectivePredicateFirst test loads its rows via a literal CREATE loop, so it wouldn't catch this if it regressed, since the bug only surfaces on the UNWIND / selection vector path.

@adsharma

Copy link
Copy Markdown
Contributor Author

Yes fix is incomplete without a regression test. Will add

@adsharma adsharma mentioned this pull request Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants