-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional pruning tests with casts, handle unsupported predicates better #3454
Conversation
7d27c57
to
c9c475d
Compare
}, | ||
// result was a column | ||
ColumnarValue::Scalar(ScalarValue::Boolean(v)) => { | ||
let v = v.unwrap_or(true); // None -> true per comments above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new code -- to handle boolean exprs (aka true
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this code and the code to make record batches with 0 columns below, some of the new negative pruning tests (the ones with invalid casts) error rather than return true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there test case for this case? get a Scalar
after evaluating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- this code is covered. I verified this by applying this change locally:
--- a/datafusion/core/src/physical_optimizer/pruning.rs
+++ b/datafusion/core/src/physical_optimizer/pruning.rs
@@ -205,8 +205,7 @@ impl PruningPredicate {
},
// result was a column
ColumnarValue::Scalar(ScalarValue::Boolean(v)) => {
- let v = v.unwrap_or(true); // None -> true per comments above
- Ok(vec![v; statistics.num_containers()])
+ unimplemented!();
}
other => {
Err(DataFusionError::Internal(format!(
And then running this command
cargo test -p datafusion -- pruning
...
failures:
---- physical_optimizer::pruning::tests::prune_int32_col_lte_zero_cast stdout ----
thread 'physical_optimizer::pruning::tests::prune_int32_col_lte_zero_cast' panicked at 'not implemented', datafusion/core/src/physical_optimizer/pruning.rs:208:17
stack backtrace:
0: rust_begin_unwind
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
2: core::panicking::panic
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:48:5
3: datafusion::physical_optimizer::pruning::PruningPredicate::prune
at ./src/physical_optimizer/pruning.rs:208:17
4: datafusion::physical_optimizer::pruning::tests::prune_int32_col_lte_zero_cast
at ./src/physical_optimizer/pruning.rs:1963:22
5: datafusion::physical_optimizer::pruning::tests::prune_int32_col_lte_zero_cast::{{closure}}
at ./src/physical_optimizer/pruning.rs:1949:5
6: core::ops::function::FnOnce::call_once
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
---- physical_optimizer::pruning::tests::prune_int32_col_eq_zero_cast_as_str stdout ----
thread 'physical_optimizer::pruning::tests::prune_int32_col_eq_zero_cast_as_str' panicked at 'not implemented', datafusion/core/src/physical_optimizer/pruning.rs:208:17
stack backtrace:
0: rust_begin_unwind
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
2: core::panicking::panic
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:48:5
3: datafusion::physical_optimizer::pruning::PruningPredicate::prune
at ./src/physical_optimizer/pruning.rs:208:17
4: datafusion::physical_optimizer::pruning::tests::prune_int32_col_eq_zero_cast_as_str
at ./src/physical_optimizer/pruning.rs:2046:22
5: datafusion::physical_optimizer::pruning::tests::prune_int32_col_eq_zero_cast_as_str::{{closure}}
at ./src/physical_optimizer/pruning.rs:2030:5
6: core::ops::function::FnOnce::call_once
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
@@ -390,8 +406,13 @@ fn build_statistics_record_batch<S: PruningStatistics>( | |||
} | |||
|
|||
let schema = Arc::new(Schema::new(fields)); | |||
RecordBatch::try_new(schema, arrays) | |||
.map_err(|err| DataFusionError::Plan(err.to_string())) | |||
// provide the count in case there were no needed statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, if the pruning predicate did not call for any columns (b/c it was "true" for example) this would error because in earlier versions of arrow-rs
RecordBatch
es could not have 0
columns -- now that they can, we use that feature to build a 0 column RecordBatch when the predicate has no columns that can be transformed
@@ -237,7 +237,7 @@ async fn prune_int32_scalar_fun() { | |||
test_prune( | |||
Scenario::Int32, | |||
"SELECT * FROM t where abs(i) = 1", | |||
Some(4), | |||
Some(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file are due to the fact that these predicates no longer error (instead they evaluate successfully to all true
, meaning all record groups are kept)
The 3
is the number of rows that pass this predicate -- so no actual answers are changed here
@@ -1921,6 +1946,45 @@ mod tests { | |||
assert_eq!(result, expected_ret); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new tests in this file are the main reason for this PR
c9c475d
to
15aa56d
Compare
I will review it later. |
.as_any() | ||
.downcast_ref::<BooleanArray>() | ||
.ok_or_else(|| { | ||
DataFusionError::Internal(format!( | ||
"Expected pruning predicate evaluation to be BooleanArray, \ | ||
but was {:?}", | ||
array | ||
)) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably just use the downcast_value!
macro here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea -- done in 5ad465c
Unless there are any objections, I plan to merge this PR tomorrow |
All comments have been addressed, so merging this in |
Benchmark runs are scheduled for baseline = 67002a0 and contender = c7f3a70. c7f3a70 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Draft as it builds on #3422 from @liukun4515 (I will rebase once that is merged)Which issue does this PR close?
re #3377 and #3414
Rationale for this change
I wanted to add more tests in #3422 for pruning with casts -- see #3422 (comment)
While writing these tests, I also ended up cleaning up the handling of predicates without columns (aka ones that got turned into constants)
What changes are included in this PR?
false
ortrue
) without errorAre there any user-facing changes?