Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement unop, binop, and func expressions #2

Merged
merged 3 commits into from
Oct 30, 2023
Merged

Conversation

yliang412
Copy link
Member

No description provided.

Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
@yliang412 yliang412 changed the title feat: implement unop, binop, and func expressions (WIP) feat: implement unop, binop, and func expressions Oct 25, 2023
@skyzh skyzh changed the title (WIP) feat: implement unop, binop, and func expressions feat: implement unop, binop, and func expressions Oct 30, 2023
@skyzh skyzh marked this pull request as ready for review October 30, 2023 22:35
@skyzh skyzh merged commit 746e0bf into main Oct 30, 2023
@skyzh skyzh deleted the yliang/expr-impls branch October 30, 2023 22:35
yliang412 added a commit that referenced this pull request Mar 11, 2024
Closes #65. This PR produces metadata in optimizer output. Currently,
the only metadata kept is the group id. The group id information has the
following benefits:
- allow the consumer of the optd plan to recover reusable fragments in
the DAG-shaped query plan.
- implement adaptive physical collector outside `optd-core`.
- display cardinality cost: #89 

Looking forward to feedbacks!

In the example below, we produce the same plan as before when
adaptiveness is enabled.

```sql
❯ create table t1(t1v1 int, t1v2 int);
0 rows in set. Query took 0.009 seconds.

Execution took 0.000 secs, Planning took 0.000 secs
❯ explain select * from t1 as a, t1 as b;
+--------------------------------------------------+-------------------------------------------------------------------------------------------------+
| plan_type                                        | plan                                                                                            |
+--------------------------------------------------+-------------------------------------------------------------------------------------------------+
| logical_plan after datafusion                    | Projection: a.t1v1, a.t1v2, b.t1v1, b.t1v2                                                      |
|                                                  |   CrossJoin:                                                                                    |
|                                                  |     SubqueryAlias: a                                                                            |
|                                                  |       TableScan: t1                                                                             |
|                                                  |     SubqueryAlias: b                                                                            |
|                                                  |       TableScan: t1                                                                             |
| logical_plan after optd                          | LogicalProjection { exprs: [ #0, #1, #2, #3 ] }                                                 |
|                                                  | └── LogicalJoin { join_type: Cross, cond: true }                                                |
|                                                  |     ├── LogicalScan { table: t1 }                                                               |
|                                                  |     └── LogicalScan { table: t1 }                                                               |
| physical_plan after optd                         | PhysicalProjection { exprs: [ #0, #1, #2, #3 ] }                                                |
|                                                  | └── PhysicalNestedLoopJoin { join_type: Cross, cond: true }                                     |
|                                                  |     ├── PhysicalScan { table: t1 }                                                              |
|                                                  |     └── PhysicalScan { table: t1 }                                                              |
| physical_plan after optd-join-order              | (NLJ t1 t1)                                                                                     |
| physical_plan after optd-all-join-orders         | SAME TEXT AS ABOVE                                                                              |
| physical_plan after optd-all-logical-join-orders | (Join t1 t1)                                                                                    |
| physical_plan                                    | CollectorExec group_id=!17                                                                      |
|                                                  |   ProjectionExec: expr=[<expr>@0 as col0, <expr>@1 as col1, <expr>@2 as col2, <expr>@3 as col3] |
|                                                  |     CollectorExec group_id=!5                                                                   |
|                                                  |       CrossJoinExec                                                                             |
|                                                  |         CollectorExec group_id=!1                                                               |
|                                                  |           MemoryExec: partitions=1, partition_sizes=[0]                                         |
|                                                  |         CollectorExec group_id=!1                                                               |
|                                                  |           MemoryExec: partitions=1, partition_sizes=[0]                                         |
|                                                  |                                                                                                 |
+--------------------------------------------------+-------------------------------------------------------------------------------------------------+
```

---------

Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Gun9niR added a commit that referenced this pull request Mar 20, 2024
Generate the statistics in perftest and put them into `BaseCostModel` in
`DatafusionOptimizer`. Below is the comparison before & after stats are
added. You can check `PhysicalScan`, where the cost has changed.

The final cardinality remains the same because when stats on a column is
missing, we use a very small magic number `INVALID_SELECTIVITY` (0.001)
that just sets cardinality to 1.

### Todos in Future PRs

- Support generating stats on `Utf8`.
- Set a better magic number.
- Generate MCV.

### Before

```
plan space size budget used, not applying logical rules any more. current plan space: 1094
explain: PhysicalSort
├── exprs:SortOrder { order: Desc }
│   └── #1
├── cost: weighted=185.17,row_cnt=1.00,compute=179.17,io=6.00
└── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=182.12,row_cnt=1.00,compute=176.12,io=6.00 }
    └── PhysicalAgg
        ├── aggrs:Agg(Sum)
        │   └── Mul
        │       ├── #0
        │       └── Sub
        │           ├── 1
        │           └── #1
        ├── groups: [ #2 ]
        ├── cost: weighted=182.02,row_cnt=1.00,compute=176.02,io=6.00
        └── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=64.90,row_cnt=1.00,compute=58.90,io=6.00 }
            └── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=64.76,row_cnt=1.00,compute=58.76,io=6.00 }
                └── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=64.54,row_cnt=1.00,compute=58.54,io=6.00 }
                    └── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=64.24,row_cnt=1.00,compute=58.24,io=6.00 }
                        └── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=63.82,row_cnt=1.00,compute=57.82,io=6.00 }
                            └── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=63.32,row_cnt=1.00,compute=57.32,io=6.00 }
                                └── PhysicalNestedLoopJoin
                                    ├── join_type: Inner
                                    ├── cond:And
                                    │   ├── Eq
                                    │   │   ├── #11
                                    │   │   └── #14
                                    │   └── Eq
                                    │       ├── #3
                                    │       └── #15
                                    ├── cost: weighted=62.74,row_cnt=1.00,compute=56.74,io=6.00
                                    ├── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #1 ], cost: weighted=35.70,row_cnt=1.00,compute=32.70,io=3.00 }
                                    │   ├── PhysicalScan { table: customer, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                    │   └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=31.64,row_cnt=1.00,compute=29.64,io=2.00 }
                                    │       ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=27.40,row_cnt=1.00,compute=26.40,io=1.00 }
                                    │       │   └── PhysicalFilter
                                    │       │       ├── cond:And
                                    │       │       │   ├── Geq
                                    │       │       │   │   ├── #2
                                    │       │       │   │   └── 9131
                                    │       │       │   └── Lt
                                    │       │       │       ├── #2
                                    │       │       │       └── 9496
                                    │       │       ├── cost: weighted=27.30,row_cnt=1.00,compute=26.30,io=1.00
                                    │       │       └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 }
                                    │       │           └── PhysicalScan { table: orders, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                    │       └── PhysicalProjection { exprs: [ #0, #2, #5, #6 ], cost: weighted=1.18,row_cnt=1.00,compute=0.18,io=1.00 }
                                    │           └── PhysicalScan { table: lineitem, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                    └── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=15.72,row_cnt=1.00,compute=12.72,io=3.00 }
                                        └── PhysicalHashJoin { join_type: Inner, left_keys: [ #3 ], right_keys: [ #0 ], cost: weighted=15.46,row_cnt=1.00,compute=12.46,io=3.00 }
                                            ├── PhysicalScan { table: supplier, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                            └── PhysicalHashJoin { join_type: Inner, left_keys: [ #2 ], right_keys: [ #0 ], cost: weighted=11.40,row_cnt=1.00,compute=9.40,io=2.00 }
                                                ├── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 }
                                                │   └── PhysicalScan { table: nation, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
                                                └── PhysicalProjection { exprs: [ #0 ], cost: weighted=7.20,row_cnt=1.00,compute=6.20,io=1.00 }
                                                    └── PhysicalFilter
                                                        ├── cond:Eq
                                                        │   ├── #1
                                                        │   └── "AMERICA"
                                                        ├── cost: weighted=7.14,row_cnt=1.00,compute=6.14,io=1.00
                                                        └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=1.10,row_cnt=1.00,compute=0.10,io=1.00 }
                                                            └── PhysicalScan { table: region, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
plan space size budget used, not applying logical rules any more. current plan space: 1094
qerrors: {"DataFusion": [5.0]}
```

### After

```
plan space size budget used, not applying logical rules any more. current plan space: 1094
explain: PhysicalSort
├── exprs:SortOrder { order: Desc }
│   └── #1
├── cost: weighted=336032.32,row_cnt=1.00,compute=259227.32,io=76805.00
└── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=336029.27,row_cnt=1.00,compute=259224.27,io=76805.00 }
    └── PhysicalAgg
        ├── aggrs:Agg(Sum)
        │   └── Mul
        │       ├── #0
        │       └── Sub
        │           ├── 1
        │           └── #1
        ├── groups: [ #2 ]
        ├── cost: weighted=336029.17,row_cnt=1.00,compute=259224.17,io=76805.00
        └── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=335912.05,row_cnt=1.00,compute=259107.05,io=76805.00 }
            └── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=335911.91,row_cnt=1.00,compute=259106.91,io=76805.00 }
                └── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=335911.69,row_cnt=1.00,compute=259106.69,io=76805.00 }
                    └── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=335911.39,row_cnt=1.00,compute=259106.39,io=76805.00 }
                        └── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=335910.97,row_cnt=1.00,compute=259105.97,io=76805.00 }
                            └── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=335910.47,row_cnt=1.00,compute=259105.47,io=76805.00 }
                                └── PhysicalNestedLoopJoin
                                    ├── join_type: Inner
                                    ├── cond:And
                                    │   ├── Eq
                                    │   │   ├── #11
                                    │   │   └── #14
                                    │   └── Eq
                                    │       ├── #3
                                    │       └── #15
                                    ├── cost: weighted=335909.89,row_cnt=1.00,compute=259104.89,io=76805.00
                                    ├── PhysicalProjection { exprs: [ #6, #7, #8, #9, #10, #11, #12, #13, #0, #1, #2, #3, #4, #5 ], cost: weighted=335619.21,row_cnt=1.00,compute=258944.21,io=76675.00 }
                                    │   └── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #0 ], cost: weighted=335618.63,row_cnt=1.00,compute=258943.63,io=76675.00 }
                                    │       ├── PhysicalProjection { exprs: [ #4, #5, #0, #1, #2, #3 ], cost: weighted=332616.57,row_cnt=1.00,compute=257441.57,io=75175.00 }
                                    │       │   └── PhysicalProjection { exprs: [ #0, #2, #5, #6, #16, #17 ], cost: weighted=332616.31,row_cnt=1.00,compute=257441.31,io=75175.00 }
                                    │       │       └── PhysicalProjection { exprs: [ #2, #3, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #0, #1 ], cost: weighted=332616.05,row_cnt=1.00,compute=257441.05,io=75175.00 }
                                    │       │           └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=332615.31,row_cnt=1.00,compute=257440.31,io=75175.00 }
                                    │       │               ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=212263.25,row_cnt=1.00,compute=197263.25,io=15000.00 }
                                    │       │               │   └── PhysicalFilter
                                    │       │               │       ├── cond:And
                                    │       │               │       │   ├── Geq
                                    │       │               │       │   │   ├── #2
                                    │       │               │       │   │   └── 9131
                                    │       │               │       │   └── Lt
                                    │       │               │       │       ├── #2
                                    │       │               │       │       └── 9496
                                    │       │               │       ├── cost: weighted=212263.15,row_cnt=1.00,compute=197263.15,io=15000.00
                                    │       │               │       └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=16050.07,row_cnt=15000.00,compute=1050.07,io=15000.00 }
                                    │       │               │           └── PhysicalScan { table: orders, cost: weighted=15000.00,row_cnt=15000.00,compute=0.00,io=15000.00 }
                                    │       │               └── PhysicalScan { table: lineitem, cost: weighted=60175.00,row_cnt=60175.00,compute=0.00,io=60175.00 }
                                    │       └── PhysicalScan { table: customer, cost: weighted=1500.00,row_cnt=1500.00,compute=0.00,io=1500.00 }
                                    └── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=279.36,row_cnt=1.00,compute=149.36,io=130.00 }
                                        └── PhysicalProjection { exprs: [ #4, #5, #6, #7, #8, #9, #10, #0, #1, #2, #3 ], cost: weighted=279.10,row_cnt=1.00,compute=149.10,io=130.00 }
                                            └── PhysicalProjection { exprs: [ #1, #2, #3, #0, #4, #5, #6, #7, #8, #9, #10 ], cost: weighted=278.64,row_cnt=1.00,compute=148.64,io=130.00 }
                                                └── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #3 ], cost: weighted=278.18,row_cnt=1.00,compute=148.18,io=130.00 }
                                                    ├── PhysicalProjection { exprs: [ #3, #0, #1, #2 ], cost: weighted=76.12,row_cnt=1.00,compute=46.12,io=30.00 }
                                                    │   └── PhysicalProjection { exprs: [ #0, #1, #2, #4 ], cost: weighted=75.94,row_cnt=1.00,compute=45.94,io=30.00 }
                                                    │       └── PhysicalProjection { exprs: [ #1, #2, #3, #4, #0 ], cost: weighted=75.76,row_cnt=1.00,compute=45.76,io=30.00 }
                                                    │           └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #2 ], cost: weighted=75.54,row_cnt=1.00,compute=45.54,io=30.00 }
                                                    │               ├── PhysicalProjection { exprs: [ #0 ], cost: weighted=23.48,row_cnt=1.00,compute=18.48,io=5.00 }
                                                    │               │   └── PhysicalFilter
                                                    │               │       ├── cond:Eq
                                                    │               │       │   ├── #1
                                                    │               │       │   └── "AMERICA"
                                                    │               │       ├── cost: weighted=23.42,row_cnt=1.00,compute=18.42,io=5.00
                                                    │               │       └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=5.30,row_cnt=5.00,compute=0.30,io=5.00 }
                                                    │               │           └── PhysicalScan { table: region, cost: weighted=5.00,row_cnt=5.00,compute=0.00,io=5.00 }
                                                    │               └── PhysicalScan { table: nation, cost: weighted=25.00,row_cnt=25.00,compute=0.00,io=25.00 }
                                                    └── PhysicalScan { table: supplier, cost: weighted=100.00,row_cnt=100.00,compute=0.00,io=100.00 }
plan space size budget used, not applying logical rules any more. current plan space: 1094
qerrors: {"DataFusion": [5.0]}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants