Skip to content

Conversation

@Ted-Jiang
Copy link
Member

Which issue does this PR close?

Thanks for @yjshen give this advice ❤️.
Closes #2073.

Rationale for this change

Before:

❯  explain select c1, c2 from test where  c2 = 0.000001;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                 |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: #test.c1, #test.c2                                                                                                                       |
|               |   Filter: #test.c2 = Float64(0.000001)                                                                                                               |
|               |     TableScan: test projection=Some([0, 1]), partial_filters=[#test.c2 = Float64(0.000001)]                                                          |
| physical_plan | ProjectionExec: expr=[c1@0 as c1, c2@1 as c2]                                                                                                        |
|               |   CoalesceBatchesExec: target_batch_size=4096                                                                                                        |
|               |     FilterExec: CAST(c2@1 AS Float64) = 0.000001                                                                                                     |
|               |       RepartitionExec: partitioning=RoundRobinBatch(16)                                                                                              |
|               |         CsvExec: files=[/Users/yangjiang/CLionProjects/github/arrow-datafusion/testing/data/csv/aggregate_test_100.csv], has_header=true, limit=None |
|               |                                                                                                                                                      |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.004 seconds.

Now:

explain select l_orderkey, l_shipdate from parquet where l_shipdate < '1996-05-20';
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                           |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: #parquet.l_orderkey, #parquet.l_shipdate                                                                                                                                                                                                                           |
|               |   Filter: #parquet.l_shipdate < Utf8("1996-05-20")                                                                                                                                                                                                                             |
|               |     TableScan: parquet projection=Some([0, 10]), partial_filters=[#parquet.l_shipdate < Utf8("1996-05-20")]                                                                                                                                                                    |
| physical_plan | ProjectionExec: expr=[l_orderkey@0 as l_orderkey, l_shipdate@1 as l_shipdate]                                                                                                                                                                                                  |
|               |   CoalesceBatchesExec: target_batch_size=4096                                                                                                                                                                                                                                  |
|               |     FilterExec: l_shipdate@1 < CAST(1996-05-20 AS Date32)                                                                                                                                                                                                                      |
|               |       RepartitionExec: partitioning=RoundRobinBatch(16)                                                                                                                                                                                                                        |
|               |         ParquetExec: limit=None, partitions=[/Users/yangjiang/test-data/tpch-1g-oneFile/lineitem/part-00000-41937a05-669a-4bfc-abbd-b4cdf90557f1-c000.snappy.parquet], pruning_predicate=l_shipdate_min@0 < CAST(1996-05-20 AS Date32), projected_col=[l_orderkey, l_shipdate] |
|               |                                                                                                                                                                                                                                                                                |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.008 seconds.



explain select c1,c2 from csv where c2 < 10;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                         |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: #csv.c1, #csv.c2                                                                                                                                                 |
|               |   Filter: #csv.c2 < Int64(10)                                                                                                                                                |
|               |     TableScan: csv projection=Some([0, 1]), partial_filters=[#csv.c2 < Int64(10)]                                                                                            |
| physical_plan | ProjectionExec: expr=[c1@0 as c1, c2@1 as c2]                                                                                                                                |
|               |   CoalesceBatchesExec: target_batch_size=4096                                                                                                                                |
|               |     FilterExec: CAST(c2@1 AS Int64) < 10                                                                                                                                     |
|               |       RepartitionExec: partitioning=RoundRobinBatch(16)                                                                                                                      |
|               |         CsvExec: files=[/Users/yangjiang/CLionProjects/github/arrow-datafusion/testing/data/csv/aggregate_test_100.csv], has_header=true, limit=None, projected_col=[c1, c2] |
|               |                                                                                                                                                                              |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.005 seconds.









What changes are included in this PR?

Are there any user-facing changes?

// normalize newlines (output on windows uses \r\n)
let actual_output = actual_output.replace("\r\n", "\n");
let mut actual_output = actual_output.replace("\r\n", "\n");
actual_output.retain(|x| !x.is_ascii_whitespace());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everytime changing explain struct will add whitespace in output.
Add this filter will make this test pass.

Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang !

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang and @yjshen -- this looks great

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang

@xudong963 xudong963 merged commit 3d31915 into apache:master Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add filters and projections to EXPLAIN PLAN for ParquetExec, CSVExec etc

5 participants