New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype #7253
Conversation
rust/datafusion/src/logicalplan.rs
Outdated
_ => false, | ||
}, | ||
UInt8 => match type_from { | ||
UInt8 => true, | ||
_ => false, | ||
}, | ||
UInt16 => match type_from { | ||
Int8 => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my own curiosity, what's the expected behavior when a negative int8 is coerced into uint16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the code, looks like it will return null for negative values. if that's the case, we should be able to support conversion from signed to unsigned with the same width as well right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is actually a bug. The goal of this can_coalesce method is to determine if DataFusion can safely cast from one type to another without any data loss, so it can add implicit casts in queries. This is convenient when the user is comparing two expressions of different types.
Separately from this, users can add any explicit CAST they want to in their query plan.
Float32 | Float64 => Some(r.clone()), | ||
_ => None, | ||
}, | ||
Int8 => match r { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we support signed and unsigned with the same width as well? for example (int8, uint8) -> int16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woudln't (int8, uint8) -> int8 instead of int16? I think signed & unsigned of same length should return signed, so I agree with your question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with regards to int8 vs int16, I was thinking 255 from uint8 doesn't fit into range of int8, so increase of width to 16 is required.
@houqp I pushed more changes. There will be more to do but for now, DataFusion will only add implicit casts when safe (so no more conversions between signed and unsigned int types). Also, for now, no unsafe casts are supported explicitly either. I will create a JIRA to add support for explicit casts between signed and unsigned types. |
Since integer to float conversion can also cause data loss, do we want to leave that to explicit cast path as well? |
I'm rethinking the approach here. I am fixing two things in this PR and it might be better to attempt to address them separately. The initial goal was to re-implement |
I'm going to take another run at this, with smaller changes in new PRs. Thanks for the reviews so far. |
Re-implement get_supertype and add unit tests for currently supported casts.