Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-5945: [Rust] [DataFusion] Table trait can now be used to build real queries #4875

Closed
wants to merge 1 commit into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jul 13, 2019

The Table (DataFrame) trait can now be used to build real queries, since it now supports projection, selection, aggregate, and limit.

Tests were moved from Table to TableImpl and are now more comprehensive.

Not all expressions are supported yet and separate PRs will expand the number of expressions that are supported.

Note that this PR also removes the optimize step from ExecutionContext.create_logical_plan so it is necessary to explicitly call this now. I was planning on doing this in a separate PR (https://issues.apache.org/jira/browse/ARROW-5948) but ended up needing it here to implement the unit tests.

@andygrove
Copy link
Member Author

@sunchao @nevi-me @kszucs @paddyhoran PTAL when you can

@codecov-io
Copy link

codecov-io commented Jul 13, 2019

Codecov Report

Merging #4875 into master will decrease coverage by 5.02%.
The diff coverage is 88.88%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #4875       +/-   ##
===========================================
- Coverage   87.41%   82.38%    -5.03%     
===========================================
  Files         996       85      -911     
  Lines      140343    24547   -115796     
  Branches     1418        0     -1418     
===========================================
- Hits       122679    20224   -102455     
+ Misses      17302     4323    -12979     
+ Partials      362        0      -362
Impacted Files Coverage Δ
rust/datafusion/tests/sql.rs 95.83% <100%> (+0.05%) ⬆️
rust/datafusion/src/optimizer/type_coercion.rs 81.81% <100%> (+0.16%) ⬆️
rust/datafusion/src/execution/context.rs 64.81% <100%> (+0.16%) ⬆️
rust/datafusion/src/sql/planner.rs 75% <66.66%> (ø) ⬆️
rust/datafusion/src/execution/table_impl.rs 90.06% <89.13%> (+5.21%) ⬆️
python/pyarrow/ipc.pxi
cpp/src/arrow/csv/chunker-test.cc
cpp/src/parquet/column_page.h
cpp/src/parquet/bloom_filter-test.cc
... and 907 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 690823c...3767597. Read the comment docs.

}
}

/// Create an expression to represent the max() aggregate function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments need to be updated.

let plan = t2.to_logical_plan();

assert_eq!(
"Aggregate: groupBy=[[#0]], aggr=[[MIN(#11), MAX(#11), AVG(#11), SUM(#11), COUNT(#11)]]\n TableScan: aggregate_test_100 projection=None",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to implement equality for query plan? It would be great if we can avoid equality check with string comparison.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible (without code changes) to compare the plans. I have re-implemented these tests using a different strategy now, where each query build via the Table API is compared to the same query produced via SQL. This has made the tests much more concise and also detected a difference in the data type used for the LIMIT clause, which is resolved now.

}

#[test]
fn select_columns() -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to know we can return Result<()> here. We can avoid lots of unwraps in all the unit tests we have.

@andygrove
Copy link
Member Author

@sunchao Please take another look when you can. Thanks.

@sunchao
Copy link
Member

sunchao commented Jul 16, 2019

Looks good but there's test failure. @andygrove could you take a look?

@andygrove
Copy link
Member Author

Thanks @sunchao ... I needed to rebase and update a test that was in master but not in this PR branch. Should be good now 🤞

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andygrove andygrove closed this in cbaa066 Jul 16, 2019
@andygrove
Copy link
Member Author

Thanks @sunchao !

kszucs pushed a commit that referenced this pull request Jul 22, 2019
…real queries

The Table (DataFrame) trait can now be used to build real queries, since it now supports projection, selection, aggregate, and limit.

Tests were moved from Table to TableImpl and are now more comprehensive.

Not all expressions are supported yet and separate PRs will expand the number of expressions that are supported.

Note that this PR also removes the `optimize` step from `ExecutionContext.create_logical_plan` so it is necessary to explicitly call this now. I was planning on doing this in a separate PR (https://issues.apache.org/jira/browse/ARROW-5948) but ended up needing it here to implement the unit tests.

Author: Andy Grove <andygrove73@gmail.com>

Closes #4875 from andygrove/ARROW-5945 and squashes the following commits:

3767597 <Andy Grove> Table API
pprudhvi pushed a commit to pprudhvi/arrow that referenced this pull request Aug 11, 2019
…real queries

The Table (DataFrame) trait can now be used to build real queries, since it now supports projection, selection, aggregate, and limit.

Tests were moved from Table to TableImpl and are now more comprehensive.

Not all expressions are supported yet and separate PRs will expand the number of expressions that are supported.

Note that this PR also removes the `optimize` step from `ExecutionContext.create_logical_plan` so it is necessary to explicitly call this now. I was planning on doing this in a separate PR (https://issues.apache.org/jira/browse/ARROW-5948) but ended up needing it here to implement the unit tests.

Author: Andy Grove <andygrove73@gmail.com>

Closes apache#4875 from andygrove/ARROW-5945 and squashes the following commits:

3767597 <Andy Grove> Table API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants