Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move group id to OptRelTyp as Placeholder(GroupId) #7

Closed
skyzh opened this issue Nov 10, 2023 · 0 comments · Fixed by #12
Closed

move group id to OptRelTyp as Placeholder(GroupId) #7

skyzh opened this issue Nov 10, 2023 · 0 comments · Fixed by #12

Comments

@skyzh
Copy link
Member

skyzh commented Nov 10, 2023

currently, we have so many representation of plan nodes:

  • RelNode: the only thing we should have in the core framework
  • OptRelNode: a set of pre-defined plan node that works for most database systems, should be moved to another crate
  • RelRuleNode: that's the evil thing. When handling rule conversion, we do not have access to the full plan node. The only thing we have is some part of it and the group id of the children. This is so-called binding in Cascades/Columbia.

The RelRuleNode also makes it harder to convert from the dynamically-typed RelNode to statically-typed LogicalXXX structs. LogicalXXX::from_rel_node is only implemented on RelNode instead of RelRuleNode.

Given that the optimizer only supports Cascades, we can ask the user to have a placeholder(group_id) function over RelNodeTyp, so that the optimizer can put some placeholders into user's plan node representation, and everything can work as usual. With that, we can eliminate RelRuleNode from the system and use a unified representation of plan nodes everywhere.

On the user side, they only need to ensure their plan type enum contains a special placeholder type for optd internal use.

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum OptRelNodeTyp {
    Placeholder(GroupId), // every set of plan nodes should have this
    // Plan nodes
    // Developers: update `is_plan_node` function after adding new elements
    Projection,
    ...
}
@skyzh skyzh closed this as completed in #12 Nov 10, 2023
Gun9niR added a commit that referenced this issue Mar 20, 2024
Generate the statistics in perftest and put them into `BaseCostModel` in
`DatafusionOptimizer`. Below is the comparison before & after stats are
added. You can check `PhysicalScan`, where the cost has changed.

The final cardinality remains the same because when stats on a column is
missing, we use a very small magic number `INVALID_SELECTIVITY` (0.001)
that just sets cardinality to 1.

### Todos in Future PRs

- Support generating stats on `Utf8`.
- Set a better magic number.
- Generate MCV.

### Before

```
plan space size budget used, not applying logical rules any more. current plan space: 1094
explain: PhysicalSort
├── exprs:SortOrder { order: Desc }
│   └── #1
├── cost: weighted=185.17,row_cnt=1.00,compute=179.17,io=6.00
└── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=182.12,row_cnt=1.00,compute=176.12,io=6.00 }
    └── PhysicalAgg
        ├── aggrs:Agg(Sum)
        │   └── Mul
        │       ├── #0
        │       └── Sub
        │           ├── 1
        │           └── #1
        ├── groups: [ #2 ]
        ├── cost: weighted=182.02,row_cnt=1.00,compute=176.02,io=6.00
        └── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=64.90,row_cnt=1.00,compute=58.90,io=6.00 }
            └── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=64.76,row_cnt=1.00,compute=58.76,io=6.00 }
                └── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=64.54,row_cnt=1.00,compute=58.54,io=6.00 }
                    └── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=64.24,row_cnt=1.00,compute=58.24,io=6.00 }
                        └── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=63.82,row_cnt=1.00,compute=57.82,io=6.00 }
                            └── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=63.32,row_cnt=1.00,compute=57.32,io=6.00 }
                                └── PhysicalNestedLoopJoin
                                    ├── join_type: Inner
                                    ├── cond:And
                                    │   ├── Eq
                                    │   │   ├── #11
                                    │   │   └── #14
                                    │   └── Eq
                                    │       ├── #3
                                    │       └── #15
                                    ├── cost: weighted=62.74,row_cnt=1.00,compute=56.74,io=6.00
                                    ├── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #1 ], cost: weighted=35.70,row_cnt=1.00,compute=32.70,io=3.00 }
                                    │   ├── PhysicalScan { table: customer, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                    │   └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=31.64,row_cnt=1.00,compute=29.64,io=2.00 }
                                    │       ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=27.40,row_cnt=1.00,compute=26.40,io=1.00 }
                                    │       │   └── PhysicalFilter
                                    │       │       ├── cond:And
                                    │       │       │   ├── Geq
                                    │       │       │   │   ├── #2
                                    │       │       │   │   └── 9131
                                    │       │       │   └── Lt
                                    │       │       │       ├── #2
                                    │       │       │       └── 9496
                                    │       │       ├── cost: weighted=27.30,row_cnt=1.00,compute=26.30,io=1.00
                                    │       │       └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 }
                                    │       │           └── PhysicalScan { table: orders, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                    │       └── PhysicalProjection { exprs: [ #0, #2, #5, #6 ], cost: weighted=1.18,row_cnt=1.00,compute=0.18,io=1.00 }
                                    │           └── PhysicalScan { table: lineitem, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                    └── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=15.72,row_cnt=1.00,compute=12.72,io=3.00 }
                                        └── PhysicalHashJoin { join_type: Inner, left_keys: [ #3 ], right_keys: [ #0 ], cost: weighted=15.46,row_cnt=1.00,compute=12.46,io=3.00 }
                                            ├── PhysicalScan { table: supplier, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                            └── PhysicalHashJoin { join_type: Inner, left_keys: [ #2 ], right_keys: [ #0 ], cost: weighted=11.40,row_cnt=1.00,compute=9.40,io=2.00 }
                                                ├── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 }
                                                │   └── PhysicalScan { table: nation, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                                └── PhysicalProjection { exprs: [ #0 ], cost: weighted=7.20,row_cnt=1.00,compute=6.20,io=1.00 }
                                                    └── PhysicalFilter
                                                        ├── cond:Eq
                                                        │   ├── #1
                                                        │   └── "AMERICA"
                                                        ├── cost: weighted=7.14,row_cnt=1.00,compute=6.14,io=1.00
                                                        └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=1.10,row_cnt=1.00,compute=0.10,io=1.00 }
                                                            └── PhysicalScan { table: region, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
plan space size budget used, not applying logical rules any more. current plan space: 1094
qerrors: {"DataFusion": [5.0]}
```

### After

```
plan space size budget used, not applying logical rules any more. current plan space: 1094
explain: PhysicalSort
├── exprs:SortOrder { order: Desc }
│   └── #1
├── cost: weighted=336032.32,row_cnt=1.00,compute=259227.32,io=76805.00
└── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=336029.27,row_cnt=1.00,compute=259224.27,io=76805.00 }
    └── PhysicalAgg
        ├── aggrs:Agg(Sum)
        │   └── Mul
        │       ├── #0
        │       └── Sub
        │           ├── 1
        │           └── #1
        ├── groups: [ #2 ]
        ├── cost: weighted=336029.17,row_cnt=1.00,compute=259224.17,io=76805.00
        └── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=335912.05,row_cnt=1.00,compute=259107.05,io=76805.00 }
            └── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=335911.91,row_cnt=1.00,compute=259106.91,io=76805.00 }
                └── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=335911.69,row_cnt=1.00,compute=259106.69,io=76805.00 }
                    └── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=335911.39,row_cnt=1.00,compute=259106.39,io=76805.00 }
                        └── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=335910.97,row_cnt=1.00,compute=259105.97,io=76805.00 }
                            └── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=335910.47,row_cnt=1.00,compute=259105.47,io=76805.00 }
                                └── PhysicalNestedLoopJoin
                                    ├── join_type: Inner
                                    ├── cond:And
                                    │   ├── Eq
                                    │   │   ├── #11
                                    │   │   └── #14
                                    │   └── Eq
                                    │       ├── #3
                                    │       └── #15
                                    ├── cost: weighted=335909.89,row_cnt=1.00,compute=259104.89,io=76805.00
                                    ├── PhysicalProjection { exprs: [ #6, #7, #8, #9, #10, #11, #12, #13, #0, #1, #2, #3, #4, #5 ], cost: weighted=335619.21,row_cnt=1.00,compute=258944.21,io=76675.00 }
                                    │   └── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #0 ], cost: weighted=335618.63,row_cnt=1.00,compute=258943.63,io=76675.00 }
                                    │       ├── PhysicalProjection { exprs: [ #4, #5, #0, #1, #2, #3 ], cost: weighted=332616.57,row_cnt=1.00,compute=257441.57,io=75175.00 }
                                    │       │   └── PhysicalProjection { exprs: [ #0, #2, #5, #6, #16, #17 ], cost: weighted=332616.31,row_cnt=1.00,compute=257441.31,io=75175.00 }
                                    │       │       └── PhysicalProjection { exprs: [ #2, #3, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #0, #1 ], cost: weighted=332616.05,row_cnt=1.00,compute=257441.05,io=75175.00 }
                                    │       │           └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=332615.31,row_cnt=1.00,compute=257440.31,io=75175.00 }
                                    │       │               ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=212263.25,row_cnt=1.00,compute=197263.25,io=15000.00 }
                                    │       │               │   └── PhysicalFilter
                                    │       │               │       ├── cond:And
                                    │       │               │       │   ├── Geq
                                    │       │               │       │   │   ├── #2
                                    │       │               │       │   │   └── 9131
                                    │       │               │       │   └── Lt
                                    │       │               │       │       ├── #2
                                    │       │               │       │       └── 9496
                                    │       │               │       ├── cost: weighted=212263.15,row_cnt=1.00,compute=197263.15,io=15000.00
                                    │       │               │       └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=16050.07,row_cnt=15000.00,compute=1050.07,io=15000.00 }
                                    │       │               │           └── PhysicalScan { table: orders, cost: weighted=15000.00,row_cnt=15000.00,compute=0.00,io=15000.00 }
                                    │       │               └── PhysicalScan { table: lineitem, cost: weighted=60175.00,row_cnt=60175.00,compute=0.00,io=60175.00 }
                                    │       └── PhysicalScan { table: customer, cost: weighted=1500.00,row_cnt=1500.00,compute=0.00,io=1500.00 }
                                    └── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=279.36,row_cnt=1.00,compute=149.36,io=130.00 }
                                        └── PhysicalProjection { exprs: [ #4, #5, #6, #7, #8, #9, #10, #0, #1, #2, #3 ], cost: weighted=279.10,row_cnt=1.00,compute=149.10,io=130.00 }
                                            └── PhysicalProjection { exprs: [ #1, #2, #3, #0, #4, #5, #6, #7, #8, #9, #10 ], cost: weighted=278.64,row_cnt=1.00,compute=148.64,io=130.00 }
                                                └── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #3 ], cost: weighted=278.18,row_cnt=1.00,compute=148.18,io=130.00 }
                                                    ├── PhysicalProjection { exprs: [ #3, #0, #1, #2 ], cost: weighted=76.12,row_cnt=1.00,compute=46.12,io=30.00 }
                                                    │   └── PhysicalProjection { exprs: [ #0, #1, #2, #4 ], cost: weighted=75.94,row_cnt=1.00,compute=45.94,io=30.00 }
                                                    │       └── PhysicalProjection { exprs: [ #1, #2, #3, #4, #0 ], cost: weighted=75.76,row_cnt=1.00,compute=45.76,io=30.00 }
                                                    │           └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #2 ], cost: weighted=75.54,row_cnt=1.00,compute=45.54,io=30.00 }
                                                    │               ├── PhysicalProjection { exprs: [ #0 ], cost: weighted=23.48,row_cnt=1.00,compute=18.48,io=5.00 }
                                                    │               │   └── PhysicalFilter
                                                    │               │       ├── cond:Eq
                                                    │               │       │   ├── #1
                                                    │               │       │   └── "AMERICA"
                                                    │               │       ├── cost: weighted=23.42,row_cnt=1.00,compute=18.42,io=5.00
                                                    │               │       └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=5.30,row_cnt=5.00,compute=0.30,io=5.00 }
                                                    │               │           └── PhysicalScan { table: region, cost: weighted=5.00,row_cnt=5.00,compute=0.00,io=5.00 }
                                                    │               └── PhysicalScan { table: nation, cost: weighted=25.00,row_cnt=25.00,compute=0.00,io=25.00 }
                                                    └── PhysicalScan { table: supplier, cost: weighted=100.00,row_cnt=100.00,compute=0.00,io=100.00 }
plan space size budget used, not applying logical rules any more. current plan space: 1094
qerrors: {"DataFusion": [5.0]}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant