Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPhyp Join Order Algorithm #14158

Open
4 tasks
mkleen opened this issue May 18, 2023 · 6 comments
Open
4 tasks

DPhyp Join Order Algorithm #14158

mkleen opened this issue May 18, 2023 · 6 comments
Labels

Comments

@mkleen
Copy link
Contributor

mkleen commented May 18, 2023

State-of-the-Art for Join Ordering is the DPhyp algorithm:

  • Generated order guaranteed to be optimal.
  • Enumerates all possibly optimal join orders without cross products.
  • Optimization for linear queries is which scales up to 100 relations.
  • Optimization for clique queries is
  • which scales up to 14 relations.
  • Widely used in many database systems.

Roadmap:

  • Improve cardinality estimation for joins
  • Find test data and setup the tests first
  • Implement simplified DPhyp version which only supports inner-joins
  • Extend to other join types

References:
https://15721.courses.cs.cmu.edu/spring2020/papers/20-optimizer2/p539-moerkotte.pdf

@mkleen mkleen changed the title Join Ordering Join Ordering May 18, 2023
@mkleen mkleen changed the title Join Ordering Join Order optimization May 18, 2023
@mfussenegger mfussenegger removed the meta A meta issue label May 31, 2023
@seut seut added the need refined description A maintainer should refine the description and clarify the scope label Jul 4, 2023
@seut
Copy link
Member

seut commented Jul 4, 2023

It's currently not completely clear what is missing in our planner to support any join order algorithm, like do our existing join operators (join pairs) have enough information declared, etc. Please add more information.

@mkleen
Copy link
Contributor Author

mkleen commented Jul 5, 2023

From my perspective, the following parts are missing:

  • Make Logical Plans identifiable, add ids to logical plans. The algorithm uses binary set operations to find the solution, each operator is represented by a number. See Add ids to logical plans #13525
  • Add a mechanism to convert a Logical Operator Tree to a Join Graph and back to Logical Operator Tree.

@mkleen
Copy link
Contributor Author

mkleen commented Jul 5, 2023

This is a general workflow how I would structure it:

flowchart TD
    A[Sql] -->B(Analyzed Statement)
    B -->C(Planner)
    C -->D(Logical Plan Tree)
    D -->E(Enter Optimizer)
    E -->F(Push down Filters/Projections for Cardinality Estimates)
    F -->G(Match Join Operators)
    G -->H(Build Join Graph)
    H -->I(Run Join Order Algorithm)
    I -->J(Rebuild Logical Plan Tree with new Join Order)
    J -->K( Optimize Logical Plan Tree further)
    K -->L(Build Execution Plan)

@mkleen
Copy link
Contributor Author

mkleen commented Jul 5, 2023

It's currently not completely clear what is missing in our planner to support any join order algorithm, like do our existing join operators (join pairs) have enough information declared, etc. Please add more information.

The Join Reordering will be integrated into an optimizer rule and operate on Logical Plans such as Collect/JoinPlan etc. The Logical Plans consist of all information to build a Graph.

@matriv

This comment was marked as resolved.

@matriv

This comment was marked as resolved.

@mkleen mkleen changed the title Join Order optimization DPhyp Join Order Algorithm Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants