-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the performance of ltrim/rtrim/btrim #10006
Conversation
Nice! It would be a nice addition if the benchmark was expanded to cover btrim and rtrim as well |
@@ -78,6 +80,19 @@ pub(crate) fn general_trim<T: OffsetSizeTrait>( | |||
2 => { | |||
let characters_array = as_generic_string_array::<T>(&args[1])?; | |||
|
|||
if characters_array.len() == 1 { | |||
if characters_array.is_null(0) { | |||
return Ok(new_null_array(args[0].data_type(), args[0].len())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like new behavior for null handling? Do we have existing unit tests for this case or can we add a new test as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a new behavior. The reason for this logic characters_array.is_null(0)
is because initially, a test did not pass, and the error was as follows:
...
External error: query result mismatch:
[SQL] SELECT btrim(' xyxtrimyyx ', NULL)
[Diff] (-expected|+actual)
- NULL
+ xyxtrimyyx
at test_files/expr.slt:373
...
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To provide additional context, this logic is consistent with the _ => None
here:
let result = string_array
.iter()
.zip(characters_array.iter())
.map(|(string, characters)| match (string, characters) {
(Some(string), Some(characters)) => Some(func(string, characters)),
_ => None, // If characters is null, append None.
})
.collect::<GenericStringArray<T>>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification @JasonLi-cn
it is not necessary, though it would be nice as @Omega359 said. We can also do it as a follow on PR. Thanks again @JasonLi-cn |
Thanks @Omega359 and @andygrove for the reviews! |
Which issue does this PR close?
Closes #10007
Rationale for this change
If the
trim
function includes a second argument, I believe it is predominantly a Scalar rather than an Array. Expanding the second argument into an Array would lead to performance degradation, and more critically, the codearg.clone().into_array(expansion_len)
would be invoked for every computation.Benchmark
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?