-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support cast/try_cast in prune with signed integer and decimal #3422
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3422 +/- ##
==========================================
+ Coverage 85.58% 85.66% +0.07%
==========================================
Files 296 296
Lines 54252 54544 +292
==========================================
+ Hits 46432 46723 +291
- Misses 7820 7821 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
], | ||
negated: true, | ||
}; | ||
let expected_expr = "CAST(#c1_min AS Int64) != Int64(1) OR Int64(1) != CAST(#c1_max AS Int64) AND CAST(#c1_min AS Int64) != Int64(2) OR Int64(2) != CAST(#c1_max AS Int64) AND CAST(#c1_min AS Int64) != Int64(3) OR Int64(3) != CAST(#c1_max AS Int64)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what is happening with the negated case. Could you add a comment here explaining this? Why is there !=
in here rather than <=
or >=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andygrove I don't change any logical about the list.
you can get this from #2282 (comment)
I get the meanings from the code context.
if the min_c1 and max_c1 is 1, all the value in the column or chunk must be 1 or null, so the c1 not in (1)
can skip this column or chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @liukun4515 -- this is great
I will look at this code in the morning as well, but I think we need to handle the case where casting won't preserve sortedness (aka UTF8 -> numeric), as described on #3377
I got your point. We can add rule for cast/try_cast case. For example, just support cast/try_cast from numeric to numeric data type, I think the data type of numeric has the order. We can support other complex case in the future pr. |
cc @alamb |
if support_type_for_prune(&from_type, data_type) { | ||
let (left, op, right) = | ||
rewrite_expr_to_prunable(expr, op, scalar_expr, schema)?; | ||
Ok((cast(left, data_type.clone()), op, right)) | ||
} else { | ||
return Err(DataFusionError::Plan(format!( | ||
"Cast with from type {} to type {} is not supported", | ||
from_type, data_type | ||
))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably avoid the duplication of the error generation by moving the error return into support_type_for_prune
like:
if support_type_for_prune(&from_type, data_type) { | |
let (left, op, right) = | |
rewrite_expr_to_prunable(expr, op, scalar_expr, schema)?; | |
Ok((cast(left, data_type.clone()), op, right)) | |
} else { | |
return Err(DataFusionError::Plan(format!( | |
"Cast with from type {} to type {} is not supported", | |
from_type, data_type | |
))); | |
} | |
verify_support_type_for_prune(&from_type, data_type)?; | |
let (left, op, right) = rewrite_expr_to_prunable(expr, op, scalar_expr)?; | |
Ok((cast(left, data_type.clone()), op, right)) |
This is not needed for correctness, just a code structure suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
@@ -552,6 +577,25 @@ fn is_compare_op(op: Operator) -> bool { | |||
) | |||
} | |||
|
|||
// The pruning logic is based on the comparing the min/max bounds. | |||
// Must make sure the two type has order. | |||
// For example, casts from string to numbers is not correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thank you for these comments
// For example, casts from string to numbers is not correct. | ||
// Because the "13" is less than "3" with UTF8 comparison order. | ||
fn support_type_for_prune(from_type: &DataType, to_type: &DataType) -> bool { | ||
// TODO: support other data type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to file a follow on ticket listing the other types (you did so in #3414 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let result = | ||
rewrite_expr_to_prunable(&left_input, Operator::Gt, &right_input, df_schema); | ||
assert!(result.is_err()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add some tests that exercise the pruning code with actual values too
For example, perhaps you can take prune_int32_cast
from #3378?
#[test] | ||
fn test_rewrite_expr_to_prunable_error() { | ||
// cast string value to numeric value | ||
// this cast is not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
.unwrap(); | ||
assert_eq!(result_left, left_input); | ||
assert_eq!(result_right, right_input); | ||
// TODO: add test for other case and op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than extending this test for other ops, I think we should add tests which actually run the pruning expressions on values (see my other comments).
Would it be helpful for me to write some more tests? I can do so tomorrow morning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find i have added the test for executing with real data in the prune_cast_column_scalar
function.
You can check them, it just include cast(col, int64) and try_cast(col,int64). If you think it is not enough, you can add more test for it or give some advices for the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @alamb
To be explicit I think the code in this PR looks good to me, but I would like to see some actual tests with pruning expressions on data prior to merging it. I am happy to help write these tests too |
@alamb Thanks for you help. |
4b1a405
to
eb9a540
Compare
rewrite_expr_to_prunable(expr, op, scalar_expr, schema)?; | ||
Ok((try_cast(left, data_type.clone()), op, right)) | ||
} | ||
// `col > lit()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be removed I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
removed
eb9a540
to
706357e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-reviewed this @liukun4515 and I think it looks good to go. Thank you.
I wrote several more tests which I will put up in a follow on PR but they all pass and I don't think there is any reason to hold this PR.
Follow on PR is #3454 |
Benchmark runs are scheduled for baseline = 3b8a20a and contender = 69d05aa. 69d05aa is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Thanks for your careful review. @alamb |
Which issue does this PR close?
part of #3414
Closes #3377
In order to merge this #3396, we should support the cast/try_cast in the prune first.
cc @andygrove @alamb
Just support cast from unsigned integer or decimal to unsigned integer or decimal.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?