Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix exploding Delim Joins #9564

Merged
merged 4 commits into from
Nov 6, 2023
Merged

Fix exploding Delim Joins #9564

merged 4 commits into from
Nov 6, 2023

Conversation

Tmonster
Copy link
Contributor

@Tmonster Tmonster commented Nov 3, 2023

After refactoring the join order optimizer, delim scans were allowed to be reordered. Unfortunately, getting statistics from delim scans wasn't possible and the cardinality was estimated to be 0. This throws off the join order optimizer. Delim scans become perfect candidates for first joins because the cardinality of any join with a (supposedly empty) table is 0, which is great for the cost model.

In some cases, this delim scan has a lot of data, and the join at the bottom of the table can blow up significantly. A proper fix for this would be to inspect the delim join and get the query from there, but to match 0.8.1 functionality, the idea is to not allow delim scans to be reordered.

Added a benchmark for testing. On my laptop 8 cores, 16GB memory, results look like this

current

benchmark/micro/join/delim_join_no_blowup.benchmark 1   14.247793
benchmark/micro/join/delim_join_no_blowup.benchmark 2   13.947187
benchmark/micro/join/delim_join_no_blowup.benchmark 3   13.267580
benchmark/micro/join/delim_join_no_blowup.benchmark 4   13.384650
benchmark/micro/join/delim_join_no_blowup.benchmark 5   13.450060

with fix

benchmark/micro/join/delim_join_no_blowup.benchmark 1   1.472236
benchmark/micro/join/delim_join_no_blowup.benchmark 2   1.508781
benchmark/micro/join/delim_join_no_blowup.benchmark 3   1.618396
benchmark/micro/join/delim_join_no_blowup.benchmark 4   1.430282
benchmark/micro/join/delim_join_no_blowup.benchmark 5   1.412266

@Tmonster Tmonster assigned Mytherin and unassigned Mytherin Nov 3, 2023
@github-actions github-actions bot marked this pull request as draft November 3, 2023 13:20
@Tmonster Tmonster marked this pull request as ready for review November 6, 2023 08:52
Copy link
Contributor

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@Mytherin
Copy link
Collaborator

Mytherin commented Nov 6, 2023

Thanks! LGTM. For the future we should be able to use the cardinality estimate obtained in the original delim join and propagate that estimate to each of the delim scans.

@Mytherin Mytherin merged commit cb27693 into duckdb:main Nov 6, 2023
43 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants