ARROW-5945: [Rust] [DataFusion] Table trait can now be used to build real queries #4875

andygrove · 2019-07-13T14:02:54Z

The Table (DataFrame) trait can now be used to build real queries, since it now supports projection, selection, aggregate, and limit.

Tests were moved from Table to TableImpl and are now more comprehensive.

Not all expressions are supported yet and separate PRs will expand the number of expressions that are supported.

Note that this PR also removes the optimize step from ExecutionContext.create_logical_plan so it is necessary to explicitly call this now. I was planning on doing this in a separate PR (https://issues.apache.org/jira/browse/ARROW-5948) but ended up needing it here to implement the unit tests.

andygrove · 2019-07-13T14:05:41Z

@sunchao @nevi-me @kszucs @paddyhoran PTAL when you can

codecov-io · 2019-07-13T15:14:49Z

Codecov Report

Merging #4875 into master will decrease coverage by 5.02%.
The diff coverage is 88.88%.

@@             Coverage Diff             @@
##           master    #4875       +/-   ##
===========================================
- Coverage   87.41%   82.38%    -5.03%     
===========================================
  Files         996       85      -911     
  Lines      140343    24547   -115796     
  Branches     1418        0     -1418     
===========================================
- Hits       122679    20224   -102455     
+ Misses      17302     4323    -12979     
+ Partials      362        0      -362

Impacted Files	Coverage Δ
rust/datafusion/tests/sql.rs	`95.83% <100%> (+0.05%)`	⬆️
rust/datafusion/src/optimizer/type_coercion.rs	`81.81% <100%> (+0.16%)`	⬆️
rust/datafusion/src/execution/context.rs	`64.81% <100%> (+0.16%)`	⬆️
rust/datafusion/src/sql/planner.rs	`75% <66.66%> (ø)`	⬆️
rust/datafusion/src/execution/table_impl.rs	`90.06% <89.13%> (+5.21%)`	⬆️
python/pyarrow/ipc.pxi
cpp/src/arrow/csv/chunker-test.cc
cpp/src/parquet/column_page.h
cpp/src/parquet/bloom_filter-test.cc
... and 907 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 690823c...3767597. Read the comment docs.

sunchao · 2019-07-15T16:58:25Z

rust/datafusion/src/execution/table_impl.rs

+        }
+    }
+
+    /// Create an expression to represent the max() aggregate function


These comments need to be updated.

sunchao · 2019-07-15T17:03:40Z

rust/datafusion/src/execution/table_impl.rs

+        let plan = t2.to_logical_plan();
+
+        assert_eq!(
+            "Aggregate: groupBy=[[#0]], aggr=[[MIN(#11), MAX(#11), AVG(#11), SUM(#11), COUNT(#11)]]\n  TableScan: aggregate_test_100 projection=None",


Is there a way to implement equality for query plan? It would be great if we can avoid equality check with string comparison.

It is not possible (without code changes) to compare the plans. I have re-implemented these tests using a different strategy now, where each query build via the Table API is compared to the same query produced via SQL. This has made the tests much more concise and also detected a difference in the data type used for the LIMIT clause, which is resolved now.

sunchao · 2019-07-15T17:05:26Z

rust/datafusion/src/execution/table_impl.rs

+    }
+
+    #[test]
+    fn select_columns() -> Result<()> {


Nice to know we can return Result<()> here. We can avoid lots of unwraps in all the unit tests we have.

andygrove · 2019-07-16T00:29:26Z

@sunchao Please take another look when you can. Thanks.

sunchao · 2019-07-16T05:51:32Z

Looks good but there's test failure. @andygrove could you take a look?

andygrove · 2019-07-16T13:29:35Z

Thanks @sunchao ... I needed to rebase and update a test that was in master but not in this PR branch. Should be good now 🤞

sunchao

LGTM

andygrove · 2019-07-16T22:34:36Z

Thanks @sunchao !

…real queries The Table (DataFrame) trait can now be used to build real queries, since it now supports projection, selection, aggregate, and limit. Tests were moved from Table to TableImpl and are now more comprehensive. Not all expressions are supported yet and separate PRs will expand the number of expressions that are supported. Note that this PR also removes the `optimize` step from `ExecutionContext.create_logical_plan` so it is necessary to explicitly call this now. I was planning on doing this in a separate PR (https://issues.apache.org/jira/browse/ARROW-5948) but ended up needing it here to implement the unit tests. Author: Andy Grove <andygrove73@gmail.com> Closes #4875 from andygrove/ARROW-5945 and squashes the following commits: 3767597 <Andy Grove> Table API

…real queries The Table (DataFrame) trait can now be used to build real queries, since it now supports projection, selection, aggregate, and limit. Tests were moved from Table to TableImpl and are now more comprehensive. Not all expressions are supported yet and separate PRs will expand the number of expressions that are supported. Note that this PR also removes the `optimize` step from `ExecutionContext.create_logical_plan` so it is necessary to explicitly call this now. I was planning on doing this in a separate PR (https://issues.apache.org/jira/browse/ARROW-5948) but ended up needing it here to implement the unit tests. Author: Andy Grove <andygrove73@gmail.com> Closes apache#4875 from andygrove/ARROW-5945 and squashes the following commits: 3767597 <Andy Grove> Table API

andygrove requested review from sunchao, paddyhoran and kszucs July 13, 2019 14:03

andygrove added Component: Rust Component: Rust - DataFusion labels Jul 13, 2019

andygrove mentioned this pull request Jul 13, 2019

Use DataFusion Table API to build query plan in executor ballista-compute/ballista#5

Merged

sunchao reviewed Jul 15, 2019

View reviewed changes

Table API

3767597

andygrove force-pushed the ARROW-5945 branch from 1ae5d37 to 3767597 Compare July 16, 2019 13:28

sunchao approved these changes Jul 16, 2019

View reviewed changes

andygrove closed this in cbaa066 Jul 16, 2019

asfimport mentioned this pull request Aug 1, 2019

[Rust] [DataFusion] Table trait should support building complete queries #22354

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-5945: [Rust] [DataFusion] Table trait can now be used to build real queries #4875

ARROW-5945: [Rust] [DataFusion] Table trait can now be used to build real queries #4875

andygrove commented Jul 13, 2019 •

edited

andygrove commented Jul 13, 2019

codecov-io commented Jul 13, 2019 •

edited

sunchao Jul 15, 2019

sunchao Jul 15, 2019

andygrove Jul 16, 2019

sunchao Jul 15, 2019

andygrove commented Jul 16, 2019

sunchao commented Jul 16, 2019

andygrove commented Jul 16, 2019

sunchao left a comment

andygrove commented Jul 16, 2019

ARROW-5945: [Rust] [DataFusion] Table trait can now be used to build real queries #4875

ARROW-5945: [Rust] [DataFusion] Table trait can now be used to build real queries #4875

Conversation

andygrove commented Jul 13, 2019 • edited

andygrove commented Jul 13, 2019

codecov-io commented Jul 13, 2019 • edited

Codecov Report

sunchao Jul 15, 2019

Choose a reason for hiding this comment

sunchao Jul 15, 2019

Choose a reason for hiding this comment

andygrove Jul 16, 2019

Choose a reason for hiding this comment

sunchao Jul 15, 2019

Choose a reason for hiding this comment

andygrove commented Jul 16, 2019

sunchao commented Jul 16, 2019

andygrove commented Jul 16, 2019

sunchao left a comment

Choose a reason for hiding this comment

andygrove commented Jul 16, 2019

andygrove commented Jul 13, 2019 •

edited

codecov-io commented Jul 13, 2019 •

edited