-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add substring
support for binary
#1608
Add substring
support for binary
#1608
Conversation
fix some test for string array Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@@ -291,11 +575,14 @@ mod tests { | |||
|
|||
cases.into_iter().try_for_each::<_, Result<()>>( | |||
|(array, start, length, expected)| { | |||
let array = StringArray::from(array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did not cover LargeStringArray
in the past.
@@ -25,7 +26,68 @@ use crate::{ | |||
}; | |||
use std::cmp::Ordering; | |||
|
|||
fn generic_substring<OffsetSize: StringOffsetSizeTrait>( | |||
fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is similar to utf8_substring
except that there is no char boundary checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than replicate quite so much code, I wonder if it would be possible to make generic_substring
take a function to check char boundaries and and then pass in a function that does nothing
Maybe like(untested):
fn generic_substring<OffsetSize, F>(
array: &GenericBinaryArray<OffsetSize>,
start: OffsetSize,
length: Option<OffsetSize>,
check_char_boundary: F,
)
where
OffsetSize: StringOffsetSizeTrait,
F: Fn(OffsetSize) -> Result<OffsetSize>
{
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid that BinaryArray
and StringArray
can not share one API because GenericBinaryArray
and GenericStringArray
are two different types.
Maybe we could use macro to extract some common codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it -- makes sense -- duplication makes sense to me then
Codecov Report
@@ Coverage Diff @@
## master #1608 +/- ##
==========================================
+ Coverage 82.95% 83.00% +0.04%
==========================================
Files 193 193
Lines 55435 55571 +136
==========================================
+ Hits 45988 46125 +137
+ Misses 9447 9446 -1
Continue to review full report at Codecov.
|
Signed-off-by: remzi <13716567376yh@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @HaoYang670 -- I went through the tests carefully and it looks good to me.
cc @viirya and @jhorstmann
@@ -25,7 +26,68 @@ use crate::{ | |||
}; | |||
use std::cmp::Ordering; | |||
|
|||
fn generic_substring<OffsetSize: StringOffsetSizeTrait>( | |||
fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than replicate quite so much code, I wonder if it would be possible to make generic_substring
take a function to check char boundaries and and then pass in a function that does nothing
Maybe like(untested):
fn generic_substring<OffsetSize, F>(
array: &GenericBinaryArray<OffsetSize>,
start: OffsetSize,
length: Option<OffsetSize>,
check_char_boundary: F,
)
where
OffsetSize: StringOffsetSizeTrait,
F: Fn(OffsetSize) -> Result<OffsetSize>
{
...
}
fn with_nulls<T: 'static + Array + PartialEq + From<Vec<Option<&'static str>>>>( | ||
) -> Result<()> { | ||
#[allow(clippy::type_complexity)] | ||
fn with_nulls_generic_binary<O: BinaryOffsetSizeTrait>() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this follows string version with_nulls
, just wondering why in with_nulls_...
only edge cases are tested against. without_nulls_...
has normal test cases but it doesn't include nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks @HaoYang670
Which issue does this PR close?
Closes #1593 .
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?
Update some docs