-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #7969: Prefer Range Join #8092
Conversation
hawkfish
commented
Jun 27, 2023
- Make sure IEJoin is ready for right side projections.
- Add PRAGMA for preferring range joins.
- Disable PRAGMA in benchmark because it doesn't help when the code is correct(!)
Make sure IEJoin is ready for right side projections.
Finish IEJoin projection map support. Add benchmark.
Add PRAGMA for preferring range joins.
Add projection support. Disable pragma in benchmark because it doesn't help when the code is correct.
Fix smart pointer tidy madness.
…into iejoin-projection
Thanks for the PR! Looks good to me in principle - but I wonder if this will not regress other queries. Which join is better to use (hash join vs range join) is likely heavily dependent on the selectivity of the join predicates. If the range predicate is non-selective then this will cause large regressions. Could we run some more benchmarks testing the various scenarios? Could we also add some more tests that trigger the various projection map scenarios with the IE Join? |
Review feedback: Add test.
Right now this is behind a pragma that is off by default. It's really just for users to force the issue if perf is terrible. My thinking on how to make this smarter is to check the selectivity of the equality predicates and switch if they are obviously horrible. We could throw in some estimates from here but they specifically don't work in the common case (intervals) so it would be another vague heuristic. Ideally we would have something sort of smart but extend the pragma to force either way if we guess wrong.
AFAICT there is only one case here where we remove unused RHS columns so I have added a test for that (just a really small version of the benchmark). All the other cases seemed to be for indexed loop joins and the like. |
Attempt to stabilise random number generation.
Another attempt to generate stable random data on Linux...
Try casting to DECIMAL to fix test...
Match cast precision to ROUND.
Switch to exact aggregate.
Add magic skip_reload requirement.
Think the only thing that failed was Node download. |
Thanks! |