-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: optimize lower
and upper
functions
#9971
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @JasonLi-cn -- I am a little concerned this optimization isn't valid in general for UTF-8 strings (I think it may work only for ascii)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like numbers but the implementation looks a little bit complicated, I would check rust std lib how they implemented string and char case conversion
let item_len = string_array.len(); | ||
|
||
// Find the first nonascii string at the beginning. | ||
let find_the_first_nonascii = || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK it is quite a bit faster to do the check once on the entire string/byte array (including nulls), than to check it individually.
This should simplify the logic as well, e.g. not searching for the index but only do it when the entire array is ascii.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Dandandan for your suggestion. Based on your suggestion:
The benefits:
- Simpler logic
- Helps to further improve the performance of Case1 and Case2
The downside:
- Giving up on Case3
Is my understanding correct? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding seems correct :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. If the majority are in favor of this plan, I will implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be good to try the simpler approach. However, as long at this current implementation is well tested and shows performance improvements I think we could merge it as is and simplify the implementation in a follow on PR as well.
If this is your preference @JasonLi-cn I will try and find some more time to review the implementation carefully. A simpler implementation has the benefit it is easier (and thus faster) to review.
Some other random optimization thoughts:
- We could and upper/lower values as a single string in one call, for example, detecting when the relevant value was a different length and doing a special path then
- We could also special case when the string had nulls and when it didn't (which can make the inner loop simpler and allow a better chance for auto vectorization)
let item_len = string_array.len(); | ||
|
||
// Find the first nonascii string at the beginning. | ||
let find_the_first_nonascii = || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be good to try the simpler approach. However, as long at this current implementation is well tested and shows performance improvements I think we could merge it as is and simplify the implementation in a follow on PR as well.
If this is your preference @JasonLi-cn I will try and find some more time to review the implementation carefully. A simpler implementation has the benefit it is easier (and thus faster) to review.
Some other random optimization thoughts:
- We could and upper/lower values as a single string in one call, for example, detecting when the relevant value was a different length and doing a special path then
- We could also special case when the string had nulls and when it didn't (which can make the inner loop simpler and allow a better chance for auto vectorization)
New TestGnuplot not found, using plotters backend
lower_all_values_are_ascii: 1024
time: [5.2583 µs 5.2623 µs 5.2670 µs]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
lower_the_first_value_is_nonascii: 1024
time: [53.434 µs 53.474 µs 53.524 µs]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
lower_the_middle_value_is_nonascii: 1024
time: [53.889 µs 53.975 µs 54.083 µs]
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
lower_all_values_are_ascii: 4096
time: [20.936 µs 20.950 µs 20.965 µs]
Found 13 outliers among 100 measurements (13.00%)
6 (6.00%) high mild
7 (7.00%) high severe
lower_the_first_value_is_nonascii: 4096
time: [222.68 µs 222.90 µs 223.17 µs]
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
lower_the_middle_value_is_nonascii: 4096
time: [223.73 µs 223.98 µs 224.30 µs]
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
lower_all_values_are_ascii: 8192
time: [41.524 µs 41.796 µs 42.172 µs]
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
lower_the_first_value_is_nonascii: 8192
time: [449.05 µs 449.71 µs 450.57 µs]
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe
lower_the_middle_value_is_nonascii: 8192
time: [451.06 µs 452.63 µs 454.66 µs]
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @JasonLi-cn -- I think this PR looks really nice now.
Thank you for bearing with us through the process. I think the result is quite good
|
||
// conversion | ||
let converted_values = op(str_values); | ||
assert_eq!(converted_values.len(), str_values.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for this check
lower
and upper
functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @JasonLi-cn and @alamb for the proper review.
The code is much cleaner now 👍
I hope the bench is even better after all changes applied
❤️ -- Thanks again @JasonLi-cn and @Dandandan and @comphead -- I love seeing these functions optimized like this. I also really like the idea of finding the common patterns (e.g. how to make code for handling special cases) reusable / elegant. Can't wait to see what you come up with next |
Which issue does this PR close?
Closes #9970
Rationale for this change
When converting case, there is no need to transform each value individually. Instead, you can treat the data in the StringArray as a single value and then perform the case conversion on it. Benefits include:
Benchmark
Lower
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?