-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FixedSizeBinaryArray::try_from_sparse_iter_with_size #3054
Add FixedSizeBinaryArray::try_from_sparse_iter_with_size #3054
Conversation
…lid array when the iterator only produces None values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
|
||
pub fn try_from_sparse_iter_with_size<T, U>( | ||
mut iter: T, | ||
asserted_size: Option<i32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the need for a separate asserted_size and detected_size. Looking at the logic I think we could just have mut size: Option<i32>
and everything would work correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we were to just have a single size option we would need to have separate implementations of try_from_sparse_iter
and try_from_sparse_iter_with_size
.
(IMO, I think the former is problematic and should be deprecated / removed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why you need to distinguish between the two different sizes in the implementation, it doesn't change if the size is determined by being passed in or from the first non-null element. They must all be equal regardless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated the instances of try_from_sparse_iter
and try_from_sparse_iter_with_size
because I think the corner cases of making the two implementations match up wasn't worth the complexity.
@@ -2270,6 +2270,18 @@ macro_rules! typed_compares { | |||
as_largestring_array($RIGHT), | |||
$OP, | |||
), | |||
(DataType::FixedSizeBinary(_), DataType::FixedSizeBinary(_)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not appear to be related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not, but it was something we were hoping to upstream and I think it accidentally fell into the branch. Would you like me to update this PR to mention this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either we need tests of it, or we need to roll it back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests added.
Self::try_from_sparse_iter_with_size(iter, None) | ||
} | ||
|
||
pub fn try_from_sparse_iter_with_size<T, U>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a doc comment similar to the one above would be good, especially to explain what the size argument is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
pub fn try_from_sparse_iter_with_size<T, U>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn try_from_sparse_iter_with_size<T, U>( | |
pub fn try_from_sparse_iter_with_size<T, U>(iter: T, size: i32) -> Result<Self, ArrowError> { | |
try_from_sparse_iter_with_size_opt::<T, U>(iter, Some(size)) | |
} | |
fn try_from_sparse_iter_with_size_opt<T, U>( |
I think this makes for better ergonomics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the parameter Optional instead
If we can get this PR merged in the next 4-5 hours it will be included in arrow 27.0.0. |
rats |
😢 |
But there are a few places using the deprecated function, see CI.
|
Yup, just saw the build failure and submitted a fix. I wasn't testing locally with the |
There are some clippy errors that are not related to this change. Submitted #3096 to fix that. |
@maxburke could you possibly merge master into this branch, so the CI goes green. I don't seem to have permissions to do this, there is a checkbox to allow maintainers the ability to edit PRed branches |
…rse_iter_with_size
done! |
Thanks @maxburke |
Benchmark runs are scheduled for baseline = b7af85c and contender = 20d81f5. 20d81f5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
🎉 |
Closes #1390
Rationale for this change
FixedSizeBinaryArray::try_from_sparse_iter generates incorrect results when the provided iterator only produces None values.
What changes are included in this PR?
This change also implements the comparison kernel for FixedSizeBinary types.
Are there any user-facing changes?
This change adds a new function, FixedSizeBinaryArray::try_from_iter_with_size, which takes a default size parameter, and markes the existing
try_from_sparse_iter
as deprecated.