We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Once #8961 is merged, we have an optimization for a JOIN that operates on two tables.
The next step is to extend this optimization to work with nested joins, and this is not trivial. See discussion in #8961 for context.
Reporter: Andy Grove / @andygrove
Note: This issue was originally created as ARROW-10964. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered:
Daniël Heres / @Dandandan: Found some nice material from Spark on this: https://databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-spark-2-2.html
basically the idea to use column level statistics such as:
min/max
nr of distinct values
null count
to come up with e.g. selectivity of a filter.
Also there is a formula for (inner) join cardinality:
num(A IJ B) = num(A)*num(B)/max(distinct(A.k),distinct(B.k))
Sorry, something went wrong.
Andrew Lamb / @alamb: Migrated to github: apache/datafusion#128
No branches or pull requests
Once #8961 is merged, we have an optimization for a JOIN that operates on two tables.
The next step is to extend this optimization to work with nested joins, and this is not trivial. See discussion in #8961 for context.
Reporter: Andy Grove / @andygrove
Note: This issue was originally created as ARROW-10964. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: