Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #7969: Prefer Range Join #8092

Merged
merged 24 commits into from
Jul 8, 2023
Merged

Conversation

hawkfish
Copy link
Contributor

  • Make sure IEJoin is ready for right side projections.
  • Add PRAGMA for preferring range joins.
  • Disable PRAGMA in benchmark because it doesn't help when the code is correct(!)

@hawkfish hawkfish requested a review from Mytherin June 27, 2023 20:14
@Mytherin
Copy link
Collaborator

Thanks for the PR! Looks good to me in principle - but I wonder if this will not regress other queries. Which join is better to use (hash join vs range join) is likely heavily dependent on the selectivity of the join predicates. If the range predicate is non-selective then this will cause large regressions.

Could we run some more benchmarks testing the various scenarios?

Could we also add some more tests that trigger the various projection map scenarios with the IE Join?

@hawkfish
Copy link
Contributor Author

Thanks for the PR! Looks good to me in principle - but I wonder if this will not regress other queries. Which join is better to use (hash join vs range join) is likely heavily dependent on the selectivity of the join predicates. If the range predicate is non-selective then this will cause large regressions.

Could we run some more benchmarks testing the various scenarios?

Right now this is behind a pragma that is off by default. It's really just for users to force the issue if perf is terrible.

My thinking on how to make this smarter is to check the selectivity of the equality predicates and switch if they are obviously horrible. We could throw in some estimates from here but they specifically don't work in the common case (intervals) so it would be another vague heuristic. Ideally we would have something sort of smart but extend the pragma to force either way if we guess wrong.

Could we also add some more tests that trigger the various projection map scenarios with the IE Join?

AFAICT there is only one case here where we remove unused RHS columns so I have added a test for that (just a really small version of the benchmark). All the other cases seemed to be for indexed loop joins and the like.

@github-actions github-actions bot marked this pull request as draft June 30, 2023 20:48
@hawkfish hawkfish marked this pull request as ready for review June 30, 2023 20:54
Attempt to stabilise random number generation.
@github-actions github-actions bot marked this pull request as draft July 1, 2023 03:28
Another attempt to generate stable random data on Linux...
@hawkfish hawkfish marked this pull request as ready for review July 1, 2023 16:36
Try casting to DECIMAL to fix test...
@github-actions github-actions bot marked this pull request as draft July 2, 2023 12:52
Richard Wesley added 3 commits July 2, 2023 08:53
Match cast precision to ROUND.
Switch to exact aggregate.
Add magic skip_reload requirement.
@Mytherin Mytherin changed the base branch from feature to master July 4, 2023 13:18
@Mytherin Mytherin marked this pull request as ready for review July 4, 2023 15:29
@github-actions github-actions bot marked this pull request as draft July 5, 2023 15:40
@Mytherin Mytherin marked this pull request as ready for review July 7, 2023 07:10
@hawkfish
Copy link
Contributor Author

hawkfish commented Jul 7, 2023

Think the only thing that failed was Node download.

@Mytherin Mytherin merged commit b4c1529 into duckdb:master Jul 8, 2023
53 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Jul 8, 2023

Thanks!

@hawkfish hawkfish deleted the iejoin-projection branch July 26, 2023 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants