-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Bug fix: Window frame range value outside the type range #5384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: Window frame range value outside the type range #5384
Conversation
# Conflicts: # datafusion/core/tests/sql/window.rs
| /// Determine whether the given data type `dt` is a `Utf8`. | ||
| pub fn is_utf8(dt: &DataType) -> bool { | ||
| matches!(dt, DataType::Utf8) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but I think we missed DataType::LargeUtf8 in #5234. cc @jackwener
| Err(DataFusionError::NotImplemented(format!( | ||
| "Cannot cast {:?} to {:?}", | ||
| value, target_type | ||
| ))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the largest type doesn't work then this coercion is impossible rather than unimplemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed you are right. Fixed it. Thanks.
avantgardnerio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much cleaner overall @mustafasrepo . If you could clarify my confusion on the behavior of Null I'd be willing to hit "merge".
| value, target_type | ||
| ))) | ||
| }, | ||
| |_| ScalarValue::try_from(target_type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand - this is ignoring the result of coerce_scalar() and re-parsing it as ScalarValue::try_from(target_type)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the case when target type is Int8 and range value used in the query is 10000 like following: OVER(ORDER BY c1 RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING)). 10000 cannot be casted to Int8. In this case we try to cast value to Int64 type. If it succeeds it means that 10000 couldn't be casted to Int8 because of overflow in the first case. It means that we can treat the OVER(ORDER BY c1 RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING) as OVER(ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) without loss of generality, since we know that range would cover the whole table. In this case we are not interested in the casted version (We ignore the result. Then return NULL to convert range to UNBOUNDED. By the way we also use NULL for the types in the form ScalarValue::Int8(None) not just for the types ScalarValue::Null in case that terminology is misleading.). We just need to know if value can be casted to large type. If large type casting fails also it means that some strange range entered not compatible with the original column like following: OVER(ORDER BY c1 RANGE BETWEEN '3 day' PRECEDING AND '1 hour' FOLLOWING). In that case we return error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification!
| /// If the coercion is successful, we return an `Ok` value with the result. | ||
| /// If the coercion fails because `target_type` is not wide enough (i.e. we | ||
| /// can not coerce to `target_type`, but we can to a wider type in the same | ||
| /// family), we return a `Null` value of this type to signal this situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the Null introduced? ScalarValue::try_from(target_type)?
Rust has Option and Result to indicate an operation was unsuccessful, is there a reason to use a valid value (Null) which callers must be aware of, vs returning one of the above to indicate an error? I could see how callers that did not read this documentation might continue on with the Null introducing subtle errors later in the call stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it looks like this is because that's the existing behavior of WindowFrameBound?:
pub enum WindowFrameBound {
Preceding(ScalarValue),
CurrentRow,
Following(ScalarValue),
}
|
|
||
| pub fn is_uft8(dt: &DataType) -> bool { | ||
| /// Determine whether the given data type `dt` is a `Utf8`. | ||
| pub fn is_utf8(dt: &DataType) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this.
| } | ||
|
|
||
| /// Casts the ScalarValue `value` to coerced type. | ||
| // When coerced type is `Interval` we use `parse_interval` since `try_from_string` not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was not introduced by this PR, but it seems like it would be a good idea to update try_from_string()...
|
Benchmark runs are scheduled for baseline = d6ef463 and contender = 49237a2. 49237a2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #5346.
Rationale for this change
What changes are included in this PR?
If we cannot cast range value to the target type. We now check for If range value can be successfully casted to largest type of the family (for
Inttypes largest type isInt64, forUinttypes it isUint64and so on). If we can accomplish so this range value is treated as unbounded. See #5346 for context.Are these changes tested?
Yes
Are there any user-facing changes?