-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorder semi and anti joins. #11815
Merged
Mytherin
merged 118 commits into
duckdb:feature
from
Tmonster:reorder_semi_anti_joins_easier_fix_refactor
Jun 5, 2024
Merged
Reorder semi and anti joins. #11815
Mytherin
merged 118 commits into
duckdb:feature
from
Tmonster:reorder_semi_anti_joins_easier_fix_refactor
Jun 5, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…uld be able to reorder the semi join
…so add better heuristics for determining a distinct count
…kdb into reorder-semi-and-anti-joins
…_column_lifetime_analyzer you will find why operator expressions need to be visited first. or why rilters cannot be removed
…i_joins_easier_fix_refactor
@Mytherin this ready to go now. You reviewed once already, but I had to battle CI for a while since I also needed to patch substrait |
Thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
follow up of #11573
To do this, we do the following,
Since this cardinality estimation method uses multiplication, it is also symmetrical, which means we don't have to worry about different join plans for the same set of relations having different estimated cardinalities.
Calculating the denominator of the estimated cardinality is easier now. It works like finding a maximum spanning tree. Assuming relations are nodes and join filters are weighted edges, the process of finding the most selective filters is exactly like a maximum spanning tree problem. The weights of the edges come from the shared total domain of the columns of the filter.
Some other small improvements:
Another heuristic was added for determining the number of distinct elements in a column as well. For integral type columns, if the maxVal - minVal is less than the distinct count measured by HLL, then DuckDB will prefer max - min as the distinct count.
I thought this was the source of a bad join order. Turns out that wasn't the case, but I think it is still a good join heuristic.