Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype #7253

Closed
wants to merge 4 commits into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented May 23, 2020

Re-implement get_supertype and add unit tests for currently supported casts.

@github-actions
Copy link

_ => false,
},
UInt8 => match type_from {
UInt8 => true,
_ => false,
},
UInt16 => match type_from {
Int8 => true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my own curiosity, what's the expected behavior when a negative int8 is coerced into uint16?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the code, looks like it will return null for negative values. if that's the case, we should be able to support conversion from signed to unsigned with the same width as well right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is actually a bug. The goal of this can_coalesce method is to determine if DataFusion can safely cast from one type to another without any data loss, so it can add implicit casts in queries. This is convenient when the user is comparing two expressions of different types.

Separately from this, users can add any explicit CAST they want to in their query plan.

Float32 | Float64 => Some(r.clone()),
_ => None,
},
Int8 => match r {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we support signed and unsigned with the same width as well? for example (int8, uint8) -> int16.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woudln't (int8, uint8) -> int8 instead of int16? I think signed & unsigned of same length should return signed, so I agree with your question

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with regards to int8 vs int16, I was thinking 255 from uint8 doesn't fit into range of int8, so increase of width to 16 is required.

@andygrove
Copy link
Member Author

@houqp I pushed more changes. There will be more to do but for now, DataFusion will only add implicit casts when safe (so no more conversions between signed and unsigned int types).

Also, for now, no unsafe casts are supported explicitly either. I will create a JIRA to add support for explicit casts between signed and unsigned types.

@houqp
Copy link
Member

houqp commented May 23, 2020

Since integer to float conversion can also cause data loss, do we want to leave that to explicit cast path as well?

@andygrove
Copy link
Member Author

I'm rethinking the approach here. I am fixing two things in this PR and it might be better to attempt to address them separately. The initial goal was to re-implement get_supertype for the implicit casting case.

@andygrove
Copy link
Member Author

I'm going to take another run at this, with smaller changes in new PRs. Thanks for the reviews so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants