New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5945: [Rust] [DataFusion] Table trait can now be used to build real queries #4875
Conversation
@sunchao @nevi-me @kszucs @paddyhoran PTAL when you can |
Codecov Report
@@ Coverage Diff @@
## master #4875 +/- ##
===========================================
- Coverage 87.41% 82.38% -5.03%
===========================================
Files 996 85 -911
Lines 140343 24547 -115796
Branches 1418 0 -1418
===========================================
- Hits 122679 20224 -102455
+ Misses 17302 4323 -12979
+ Partials 362 0 -362
Continue to review full report at Codecov.
|
} | ||
} | ||
|
||
/// Create an expression to represent the max() aggregate function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments need to be updated.
let plan = t2.to_logical_plan(); | ||
|
||
assert_eq!( | ||
"Aggregate: groupBy=[[#0]], aggr=[[MIN(#11), MAX(#11), AVG(#11), SUM(#11), COUNT(#11)]]\n TableScan: aggregate_test_100 projection=None", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to implement equality for query plan? It would be great if we can avoid equality check with string comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not possible (without code changes) to compare the plans. I have re-implemented these tests using a different strategy now, where each query build via the Table API is compared to the same query produced via SQL. This has made the tests much more concise and also detected a difference in the data type used for the LIMIT clause, which is resolved now.
} | ||
|
||
#[test] | ||
fn select_columns() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to know we can return Result<()>
here. We can avoid lots of unwrap
s in all the unit tests we have.
@sunchao Please take another look when you can. Thanks. |
Looks good but there's test failure. @andygrove could you take a look? |
Thanks @sunchao ... I needed to rebase and update a test that was in master but not in this PR branch. Should be good now 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @sunchao ! |
…real queries The Table (DataFrame) trait can now be used to build real queries, since it now supports projection, selection, aggregate, and limit. Tests were moved from Table to TableImpl and are now more comprehensive. Not all expressions are supported yet and separate PRs will expand the number of expressions that are supported. Note that this PR also removes the `optimize` step from `ExecutionContext.create_logical_plan` so it is necessary to explicitly call this now. I was planning on doing this in a separate PR (https://issues.apache.org/jira/browse/ARROW-5948) but ended up needing it here to implement the unit tests. Author: Andy Grove <andygrove73@gmail.com> Closes #4875 from andygrove/ARROW-5945 and squashes the following commits: 3767597 <Andy Grove> Table API
…real queries The Table (DataFrame) trait can now be used to build real queries, since it now supports projection, selection, aggregate, and limit. Tests were moved from Table to TableImpl and are now more comprehensive. Not all expressions are supported yet and separate PRs will expand the number of expressions that are supported. Note that this PR also removes the `optimize` step from `ExecutionContext.create_logical_plan` so it is necessary to explicitly call this now. I was planning on doing this in a separate PR (https://issues.apache.org/jira/browse/ARROW-5948) but ended up needing it here to implement the unit tests. Author: Andy Grove <andygrove73@gmail.com> Closes apache#4875 from andygrove/ARROW-5945 and squashes the following commits: 3767597 <Andy Grove> Table API
The Table (DataFrame) trait can now be used to build real queries, since it now supports projection, selection, aggregate, and limit.
Tests were moved from Table to TableImpl and are now more comprehensive.
Not all expressions are supported yet and separate PRs will expand the number of expressions that are supported.
Note that this PR also removes the
optimize
step fromExecutionContext.create_logical_plan
so it is necessary to explicitly call this now. I was planning on doing this in a separate PR (https://issues.apache.org/jira/browse/ARROW-5948) but ended up needing it here to implement the unit tests.