-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)
Take TPC-H q6 for example.
Plan prior to simplification:
- has a strange "AND true" clause in the ParquetExec
- not able to push l_discount down into the ParquetExec
❯ explain select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < date '1995-01-01'
and l_discount between 0.06 - 0.01 and 0.06 + 0.01
and l_quantity < 24;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #SUM(lineitem.l_extendedprice * lineitem.l_discount) AS revenue |
| | Aggregate: groupBy=[[]], aggr=[[SUM(#lineitem.l_extendedprice * #lineitem.l_discount)]] |
| | Filter: #lineitem.l_shipdate >= Date32("8766") AND #lineitem.l_shipdate < Date32("9131") AND #lineitem.l_discount BETWEEN Float64(0.049999999999999996) AND Float64(0.06999999999999999) AND #lineitem.l_quantity < Decimal128(Some(2400),15,2) |
| | TableScan: lineitem projection=[l_quantity, l_extendedprice, l_discount, l_shipdate], partial_filters=[#lineitem.l_shipdate >= Date32("8766"), #lineitem.l_shipdate < Date32("9131"), #lineitem.l_discount BETWEEN Float64(0.049999999999999996) AND Float64(0.06999999999999999), #lineitem.l_quantity < Decimal128(Some(2400),15,2)] |
| physical_plan | ProjectionExec: expr=[SUM(lineitem.l_extendedprice * lineitem.l_discount)@0 as revenue] |
| | AggregateExec: mode=Final, gby=[], aggr=[SUM(lineitem.l_extendedprice * lineitem.l_discount)] |
| | CoalescePartitionsExec |
| | AggregateExec: mode=Partial, gby=[], aggr=[SUM(lineitem.l_extendedprice * lineitem.l_discount)] |
| | CoalesceBatchesExec: target_batch_size=4096 |
| | FilterExec: l_shipdate@3 >= 8766 AND l_shipdate@3 < 9131 AND CAST(l_discount@2 AS Decimal128(30, 15)) >= CAST(0.049999999999999996 AS Decimal128(30, 15)) AND CAST(l_discount@2 AS Decimal128(30, 15)) <= CAST(0.06999999999999999 AS Decimal128(30, 15)) AND l_quantity@0 < Some(2400),15,2 |
| | RepartitionExec: partitioning=RoundRobinBatch(20) |
| | ParquetExec: limit=None, partitions=[home/kmitchener/dev/arrow-datafusion/benchmarks/data-parquet/lineitem/part-0.parquet], predicate=l_shipdate_max@0 >= 8766 AND l_shipdate_min@1 < 9131 AND true AND l_quantity_min@2 < Some(2400),15,2, projection=[l_quantity, l_extendedprice, l_discount, l_shipdate] |
| | |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Describe the solution you'd like
A clear and concise description of what you want to happen.
Convert between expression into >= and <= expressions and there is more opportunity for further optimize it in the logical plan.
It results in a better plan overall, with more predicates pushed down to the tablescan.
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #SUM(lineitem.l_extendedprice * lineitem.l_discount) AS revenue |
| | Aggregate: groupBy=[[]], aggr=[[SUM(#lineitem.l_extendedprice * #lineitem.l_discount)]] |
| | Filter: #lineitem.l_shipdate >= Date32("8766") AND #lineitem.l_shipdate < Date32("9131") AND CAST(#lineitem.l_discount AS Decimal128(30, 15)) >= CAST(Float64(0.049999999999999996) AS Decimal128(30, 15)) AND CAST(#lineitem.l_discount AS Decimal128(30, 15)) <= CAST(Float64(0.06999999999999999) AS Decimal128(30, 15)) AND #lineitem.l_quantity < Decimal128(Some(2400),15,2) |
| | TableScan: lineitem projection=[l_quantity, l_extendedprice, l_discount, l_shipdate], partial_filters=[#lineitem.l_shipdate >= Date32("8766"), #lineitem.l_shipdate < Date32("9131"), #lineitem.l_discount >= Float64(0.049999999999999996), #lineitem.l_discount <= Float64(0.06999999999999999), #lineitem.l_quantity < Decimal128(Some(2400),15,2)] |
| physical_plan | ProjectionExec: expr=[SUM(lineitem.l_extendedprice * lineitem.l_discount)@0 as revenue] |
| | AggregateExec: mode=Final, gby=[], aggr=[SUM(lineitem.l_extendedprice * lineitem.l_discount)] |
| | CoalescePartitionsExec |
| | AggregateExec: mode=Partial, gby=[], aggr=[SUM(lineitem.l_extendedprice * lineitem.l_discount)] |
| | CoalesceBatchesExec: target_batch_size=4096 |
| | FilterExec: l_shipdate@3 >= 8766 AND l_shipdate@3 < 9131 AND CAST(l_discount@2 AS Decimal128(30, 15)) >= CAST(0.049999999999999996 AS Decimal128(30, 15)) AND CAST(l_discount@2 AS Decimal128(30, 15)) <= CAST(0.06999999999999999 AS Decimal128(30, 15)) AND l_quantity@0 < Some(2400),15,2 |
| | RepartitionExec: partitioning=RoundRobinBatch(20) |
| | ParquetExec: limit=None, partitions=[home/kmitchener/dev/arrow-datafusion/benchmarks/data-parquet/lineitem/part-0.parquet], predicate=l_shipdate_max@0 >= 8766 AND l_shipdate_min@1 < 9131 AND CAST(l_discount_max@2 AS Decimal128(30, 15)) >= CAST(0.049999999999999996 AS Decimal128(30, 15)) AND CAST(l_discount_min@3 AS Decimal128(30, 15)) <= CAST(0.06999999999999999 AS Decimal128(30, 15)) AND l_quantity_min@4 < Some(2400),15,2, projection=[l_quantity, l_extendedprice, l_discount, l_shipdate] |
| | |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.