Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize lower and upper functions #9971

Merged
merged 10 commits into from
Apr 15, 2024

Conversation

JasonLi-cn
Copy link
Contributor

@JasonLi-cn JasonLi-cn commented Apr 6, 2024

Which issue does this PR close?

Closes #9970

Rationale for this change

When converting case, there is no need to transform each value individually. Instead, you can treat the data in the StringArray as a single value and then perform the case conversion on it. Benefits include:

  • Avoiding iteration.
  • Directly allocating a contiguous block of memory, thereby avoiding the overhead of repeatedly allocating memory when creating a StringArray through an Iterator.

Benchmark

Lower

Gnuplot not found, using plotters backend
lower full optimization: 1024
                        time:   [5.8336 µs 5.8401 µs 5.8481 µs]
                        change: [-77.800% -77.702% -77.608%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low severe
  8 (8.00%) high mild
  6 (6.00%) high severe

lower maybe optimization: 1024
                        time:   [52.137 µs 52.206 µs 52.287 µs]
                        change: [-3.1792% -2.9748% -2.7567%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

lower partial optimization: 1024
                        time:   [28.266 µs 28.289 µs 28.316 µs]
                        change: [-47.687% -47.535% -47.391%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  5 (5.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high mild
  10 (10.00%) high severe

lower full optimization: 4096
                        time:   [23.473 µs 23.506 µs 23.549 µs]
                        change: [-77.596% -77.519% -77.434%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

lower maybe optimization: 4096
                        time:   [216.41 µs 216.65 µs 216.92 µs]
                        change: [-4.0400% -3.7733% -3.5015%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

lower partial optimization: 4096
                        time:   [118.70 µs 118.87 µs 119.08 µs]
                        change: [-47.772% -47.681% -47.588%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  9 (9.00%) high severe

lower full optimization: 8192
                        time:   [49.088 µs 49.232 µs 49.401 µs]
                        change: [-77.615% -77.497% -77.359%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe

lower maybe optimization: 8192
                        time:   [435.56 µs 435.96 µs 436.47 µs]
                        change: [-5.1301% -4.8973% -4.6614%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

lower partial optimization: 8192
                        time:   [238.99 µs 239.86 µs 240.89 µs]
                        change: [-46.649% -46.348% -46.021%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @JasonLi-cn -- I am a little concerned this optimization isn't valid in general for UTF-8 strings (I think it may work only for ascii)

datafusion/functions/src/string/common.rs Outdated Show resolved Hide resolved
datafusion/functions/src/string/common.rs Show resolved Hide resolved
datafusion/functions/src/string/lower.rs Outdated Show resolved Hide resolved
datafusion/functions/src/string/upper.rs Show resolved Hide resolved
@JasonLi-cn JasonLi-cn marked this pull request as draft April 6, 2024 11:27
@JasonLi-cn JasonLi-cn marked this pull request as ready for review April 7, 2024 12:38
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like numbers but the implementation looks a little bit complicated, I would check rust std lib how they implemented string and char case conversion

let item_len = string_array.len();

// Find the first nonascii string at the beginning.
let find_the_first_nonascii = || {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK it is quite a bit faster to do the check once on the entire string/byte array (including nulls), than to check it individually.
This should simplify the logic as well, e.g. not searching for the index but only do it when the entire array is ascii.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dandandan for your suggestion. Based on your suggestion:
The benefits:

  • Simpler logic
  • Helps to further improve the performance of Case1 and Case2

The downside:

  • Giving up on Case3

Is my understanding correct? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding seems correct :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. If the majority are in favor of this plan, I will implement it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be good to try the simpler approach. However, as long at this current implementation is well tested and shows performance improvements I think we could merge it as is and simplify the implementation in a follow on PR as well.

If this is your preference @JasonLi-cn I will try and find some more time to review the implementation carefully. A simpler implementation has the benefit it is easier (and thus faster) to review.

Some other random optimization thoughts:

  • We could and upper/lower values as a single string in one call, for example, detecting when the relevant value was a different length and doing a special path then
  • We could also special case when the string had nulls and when it didn't (which can make the inner loop simpler and allow a better chance for auto vectorization)

datafusion/functions/benches/lower.rs Show resolved Hide resolved
let item_len = string_array.len();

// Find the first nonascii string at the beginning.
let find_the_first_nonascii = || {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be good to try the simpler approach. However, as long at this current implementation is well tested and shows performance improvements I think we could merge it as is and simplify the implementation in a follow on PR as well.

If this is your preference @JasonLi-cn I will try and find some more time to review the implementation carefully. A simpler implementation has the benefit it is easier (and thus faster) to review.

Some other random optimization thoughts:

  • We could and upper/lower values as a single string in one call, for example, detecting when the relevant value was a different length and doing a special path then
  • We could also special case when the string had nulls and when it didn't (which can make the inner loop simpler and allow a better chance for auto vectorization)

@JasonLi-cn
Copy link
Contributor Author

New Test

Gnuplot not found, using plotters backend
lower_all_values_are_ascii: 1024
                        time:   [5.2583 µs 5.2623 µs 5.2670 µs]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

lower_the_first_value_is_nonascii: 1024
                        time:   [53.434 µs 53.474 µs 53.524 µs]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

lower_the_middle_value_is_nonascii: 1024
                        time:   [53.889 µs 53.975 µs 54.083 µs]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

lower_all_values_are_ascii: 4096
                        time:   [20.936 µs 20.950 µs 20.965 µs]
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

lower_the_first_value_is_nonascii: 4096
                        time:   [222.68 µs 222.90 µs 223.17 µs]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

lower_the_middle_value_is_nonascii: 4096
                        time:   [223.73 µs 223.98 µs 224.30 µs]
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

lower_all_values_are_ascii: 8192
                        time:   [41.524 µs 41.796 µs 42.172 µs]
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) high mild
  14 (14.00%) high severe

lower_the_first_value_is_nonascii: 8192
                        time:   [449.05 µs 449.71 µs 450.57 µs]
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

lower_the_middle_value_is_nonascii: 8192
                        time:   [451.06 µs 452.63 µs 454.66 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JasonLi-cn -- I think this PR looks really nice now.

Thank you for bearing with us through the process. I think the result is quite good

datafusion/functions/src/string/common.rs Outdated Show resolved Hide resolved

// conversion
let converted_values = op(str_values);
assert_eq!(converted_values.len(), str_values.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this check

@alamb alamb changed the title feat: optimize lower and upper functions feat: optimize lower and upper functions Apr 13, 2024
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @JasonLi-cn and @alamb for the proper review.
The code is much cleaner now 👍

I hope the bench is even better after all changes applied

@alamb alamb merged commit 483663b into apache:main Apr 15, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Apr 15, 2024

❤️ -- Thanks again @JasonLi-cn and @Dandandan and @comphead -- I love seeing these functions optimized like this.

I also really like the idea of finding the common patterns (e.g. how to make code for handling special cases) reusable / elegant. Can't wait to see what you come up with next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the performance of lower and upper function
5 participants