-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rule DSL #6
Comments
Gun9niR
added a commit
that referenced
this issue
Mar 20, 2024
Generate the statistics in perftest and put them into `BaseCostModel` in `DatafusionOptimizer`. Below is the comparison before & after stats are added. You can check `PhysicalScan`, where the cost has changed. The final cardinality remains the same because when stats on a column is missing, we use a very small magic number `INVALID_SELECTIVITY` (0.001) that just sets cardinality to 1. ### Todos in Future PRs - Support generating stats on `Utf8`. - Set a better magic number. - Generate MCV. ### Before ``` plan space size budget used, not applying logical rules any more. current plan space: 1094 explain: PhysicalSort ├── exprs:SortOrder { order: Desc } │ └── #1 ├── cost: weighted=185.17,row_cnt=1.00,compute=179.17,io=6.00 └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=182.12,row_cnt=1.00,compute=176.12,io=6.00 } └── PhysicalAgg ├── aggrs:Agg(Sum) │ └── Mul │ ├── #0 │ └── Sub │ ├── 1 │ └── #1 ├── groups: [ #2 ] ├── cost: weighted=182.02,row_cnt=1.00,compute=176.02,io=6.00 └── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=64.90,row_cnt=1.00,compute=58.90,io=6.00 } └── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=64.76,row_cnt=1.00,compute=58.76,io=6.00 } └── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=64.54,row_cnt=1.00,compute=58.54,io=6.00 } └── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=64.24,row_cnt=1.00,compute=58.24,io=6.00 } └── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=63.82,row_cnt=1.00,compute=57.82,io=6.00 } └── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=63.32,row_cnt=1.00,compute=57.32,io=6.00 } └── PhysicalNestedLoopJoin ├── join_type: Inner ├── cond:And │ ├── Eq │ │ ├── #11 │ │ └── #14 │ └── Eq │ ├── #3 │ └── #15 ├── cost: weighted=62.74,row_cnt=1.00,compute=56.74,io=6.00 ├── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #1 ], cost: weighted=35.70,row_cnt=1.00,compute=32.70,io=3.00 } │ ├── PhysicalScan { table: customer, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } │ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=31.64,row_cnt=1.00,compute=29.64,io=2.00 } │ ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=27.40,row_cnt=1.00,compute=26.40,io=1.00 } │ │ └── PhysicalFilter │ │ ├── cond:And │ │ │ ├── Geq │ │ │ │ ├── #2 │ │ │ │ └── 9131 │ │ │ └── Lt │ │ │ ├── #2 │ │ │ └── 9496 │ │ ├── cost: weighted=27.30,row_cnt=1.00,compute=26.30,io=1.00 │ │ └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 } │ │ └── PhysicalScan { table: orders, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } │ └── PhysicalProjection { exprs: [ #0, #2, #5, #6 ], cost: weighted=1.18,row_cnt=1.00,compute=0.18,io=1.00 } │ └── PhysicalScan { table: lineitem, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } └── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=15.72,row_cnt=1.00,compute=12.72,io=3.00 } └── PhysicalHashJoin { join_type: Inner, left_keys: [ #3 ], right_keys: [ #0 ], cost: weighted=15.46,row_cnt=1.00,compute=12.46,io=3.00 } ├── PhysicalScan { table: supplier, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } └── PhysicalHashJoin { join_type: Inner, left_keys: [ #2 ], right_keys: [ #0 ], cost: weighted=11.40,row_cnt=1.00,compute=9.40,io=2.00 } ├── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 } │ └── PhysicalScan { table: nation, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } └── PhysicalProjection { exprs: [ #0 ], cost: weighted=7.20,row_cnt=1.00,compute=6.20,io=1.00 } └── PhysicalFilter ├── cond:Eq │ ├── #1 │ └── "AMERICA" ├── cost: weighted=7.14,row_cnt=1.00,compute=6.14,io=1.00 └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=1.10,row_cnt=1.00,compute=0.10,io=1.00 } └── PhysicalScan { table: region, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } plan space size budget used, not applying logical rules any more. current plan space: 1094 qerrors: {"DataFusion": [5.0]} ``` ### After ``` plan space size budget used, not applying logical rules any more. current plan space: 1094 explain: PhysicalSort ├── exprs:SortOrder { order: Desc } │ └── #1 ├── cost: weighted=336032.32,row_cnt=1.00,compute=259227.32,io=76805.00 └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=336029.27,row_cnt=1.00,compute=259224.27,io=76805.00 } └── PhysicalAgg ├── aggrs:Agg(Sum) │ └── Mul │ ├── #0 │ └── Sub │ ├── 1 │ └── #1 ├── groups: [ #2 ] ├── cost: weighted=336029.17,row_cnt=1.00,compute=259224.17,io=76805.00 └── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=335912.05,row_cnt=1.00,compute=259107.05,io=76805.00 } └── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=335911.91,row_cnt=1.00,compute=259106.91,io=76805.00 } └── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=335911.69,row_cnt=1.00,compute=259106.69,io=76805.00 } └── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=335911.39,row_cnt=1.00,compute=259106.39,io=76805.00 } └── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=335910.97,row_cnt=1.00,compute=259105.97,io=76805.00 } └── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=335910.47,row_cnt=1.00,compute=259105.47,io=76805.00 } └── PhysicalNestedLoopJoin ├── join_type: Inner ├── cond:And │ ├── Eq │ │ ├── #11 │ │ └── #14 │ └── Eq │ ├── #3 │ └── #15 ├── cost: weighted=335909.89,row_cnt=1.00,compute=259104.89,io=76805.00 ├── PhysicalProjection { exprs: [ #6, #7, #8, #9, #10, #11, #12, #13, #0, #1, #2, #3, #4, #5 ], cost: weighted=335619.21,row_cnt=1.00,compute=258944.21,io=76675.00 } │ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #0 ], cost: weighted=335618.63,row_cnt=1.00,compute=258943.63,io=76675.00 } │ ├── PhysicalProjection { exprs: [ #4, #5, #0, #1, #2, #3 ], cost: weighted=332616.57,row_cnt=1.00,compute=257441.57,io=75175.00 } │ │ └── PhysicalProjection { exprs: [ #0, #2, #5, #6, #16, #17 ], cost: weighted=332616.31,row_cnt=1.00,compute=257441.31,io=75175.00 } │ │ └── PhysicalProjection { exprs: [ #2, #3, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #0, #1 ], cost: weighted=332616.05,row_cnt=1.00,compute=257441.05,io=75175.00 } │ │ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=332615.31,row_cnt=1.00,compute=257440.31,io=75175.00 } │ │ ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=212263.25,row_cnt=1.00,compute=197263.25,io=15000.00 } │ │ │ └── PhysicalFilter │ │ │ ├── cond:And │ │ │ │ ├── Geq │ │ │ │ │ ├── #2 │ │ │ │ │ └── 9131 │ │ │ │ └── Lt │ │ │ │ ├── #2 │ │ │ │ └── 9496 │ │ │ ├── cost: weighted=212263.15,row_cnt=1.00,compute=197263.15,io=15000.00 │ │ │ └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=16050.07,row_cnt=15000.00,compute=1050.07,io=15000.00 } │ │ │ └── PhysicalScan { table: orders, cost: weighted=15000.00,row_cnt=15000.00,compute=0.00,io=15000.00 } │ │ └── PhysicalScan { table: lineitem, cost: weighted=60175.00,row_cnt=60175.00,compute=0.00,io=60175.00 } │ └── PhysicalScan { table: customer, cost: weighted=1500.00,row_cnt=1500.00,compute=0.00,io=1500.00 } └── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=279.36,row_cnt=1.00,compute=149.36,io=130.00 } └── PhysicalProjection { exprs: [ #4, #5, #6, #7, #8, #9, #10, #0, #1, #2, #3 ], cost: weighted=279.10,row_cnt=1.00,compute=149.10,io=130.00 } └── PhysicalProjection { exprs: [ #1, #2, #3, #0, #4, #5, #6, #7, #8, #9, #10 ], cost: weighted=278.64,row_cnt=1.00,compute=148.64,io=130.00 } └── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #3 ], cost: weighted=278.18,row_cnt=1.00,compute=148.18,io=130.00 } ├── PhysicalProjection { exprs: [ #3, #0, #1, #2 ], cost: weighted=76.12,row_cnt=1.00,compute=46.12,io=30.00 } │ └── PhysicalProjection { exprs: [ #0, #1, #2, #4 ], cost: weighted=75.94,row_cnt=1.00,compute=45.94,io=30.00 } │ └── PhysicalProjection { exprs: [ #1, #2, #3, #4, #0 ], cost: weighted=75.76,row_cnt=1.00,compute=45.76,io=30.00 } │ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #2 ], cost: weighted=75.54,row_cnt=1.00,compute=45.54,io=30.00 } │ ├── PhysicalProjection { exprs: [ #0 ], cost: weighted=23.48,row_cnt=1.00,compute=18.48,io=5.00 } │ │ └── PhysicalFilter │ │ ├── cond:Eq │ │ │ ├── #1 │ │ │ └── "AMERICA" │ │ ├── cost: weighted=23.42,row_cnt=1.00,compute=18.42,io=5.00 │ │ └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=5.30,row_cnt=5.00,compute=0.30,io=5.00 } │ │ └── PhysicalScan { table: region, cost: weighted=5.00,row_cnt=5.00,compute=0.00,io=5.00 } │ └── PhysicalScan { table: nation, cost: weighted=25.00,row_cnt=25.00,compute=0.00,io=25.00 } └── PhysicalScan { table: supplier, cost: weighted=100.00,row_cnt=100.00,compute=0.00,io=100.00 } plan space size budget used, not applying logical rules any more. current plan space: 1094 qerrors: {"DataFusion": [5.0]} ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
we already have the intermediate representation of rule matchers, and therefore it would be easy to implement a Rust macro that converts a rule DSL into the IR. one proposal:
!a
,!b
,!cond
will be picked up and the rule can use them to compose the new plan node.The text was updated successfully, but these errors were encountered: