-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Support unsigned integers in unwrap_cast_in_comparison Optimizer rule
#4149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d8fa119 to
23ca07a
Compare
|
Here is the next PR for unwrapping casts for your review @liukun4515 |
| "Explain [plan_type:Utf8, plan:Utf8]", | ||
| " Projection: t1.t1_id, t1.t1_name, t1.t1_int, t2.t2_id, t2.t2_name, t2.t2_int [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]", | ||
| " Inner Join: t1.t1_id = t2.t2_id [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]", | ||
| " Filter: CAST(t1.t1_id AS Int64) < Int64(100) [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 the casts have been removed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the original regression noted in #3699 (comment) that lead to #3702
| /// Until https://github.com/apache/arrow-rs/issues/1043 is done | ||
| /// (support for unsigned <--> decimal casts) we also don't do that | ||
| /// kind of cast in this optimizer | ||
| fn is_unsupported_cast(dt1: &DataType, dt2: &DataType) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this with the test (which failed when it tried to invoke the arrow cast kernels for decimal <--> unsigned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting for unsigned value <-> decimal is not supported now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to implement them recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // the rewriter will not throw error and just return the original expr | ||
| let schema = expr_test_schema(); | ||
| let expr_input = cast(col("c6"), DataType::Int64).eq(lit(0i64)); | ||
| let expr_input = cast(col("c6"), DataType::Float64).eq(lit(0f64)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment of "c6 > 0" will be cast to `cast(c6 as int64) > 0 should be change to "c6 > 0"` will be cast to `cast(c6 as float64) > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in c6946f0
liukun4515
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @alamb Thanks for your test
There is a bit comments.
|
Benchmark runs are scheduled for baseline = c9361e0 and contender = 4653df4. 4653df4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Draft as it builds on #4148Which issue does this PR close?
Closes #3702
Rationale for this change
This will make predicate evaluation on unsigned columns faster in some cases
What changes are included in this PR?
unwrap_cast_in_comparison#4147)Are these changes tested?
Yes
Are there any user-facing changes?
Predicates on unsigned columns should be faster