feat: optimize `lower` and `upper` functions #9971

JasonLi-cn · 2024-04-06T04:25:12Z

Which issue does this PR close?

Rationale for this change

When converting case, there is no need to transform each value individually. Instead, you can treat the data in the StringArray as a single value and then perform the case conversion on it. Benefits include:

Avoiding iteration.
Directly allocating a contiguous block of memory, thereby avoiding the overhead of repeatedly allocating memory when creating a StringArray through an Iterator.

Benchmark

Lower

Gnuplot not found, using plotters backend
lower full optimization: 1024
                        time:   [5.8336 µs 5.8401 µs 5.8481 µs]
                        change: [-77.800% -77.702% -77.608%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low severe
  8 (8.00%) high mild
  6 (6.00%) high severe

lower maybe optimization: 1024
                        time:   [52.137 µs 52.206 µs 52.287 µs]
                        change: [-3.1792% -2.9748% -2.7567%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

lower partial optimization: 1024
                        time:   [28.266 µs 28.289 µs 28.316 µs]
                        change: [-47.687% -47.535% -47.391%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  5 (5.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high mild
  10 (10.00%) high severe

lower full optimization: 4096
                        time:   [23.473 µs 23.506 µs 23.549 µs]
                        change: [-77.596% -77.519% -77.434%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

lower maybe optimization: 4096
                        time:   [216.41 µs 216.65 µs 216.92 µs]
                        change: [-4.0400% -3.7733% -3.5015%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

lower partial optimization: 4096
                        time:   [118.70 µs 118.87 µs 119.08 µs]
                        change: [-47.772% -47.681% -47.588%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  9 (9.00%) high severe

lower full optimization: 8192
                        time:   [49.088 µs 49.232 µs 49.401 µs]
                        change: [-77.615% -77.497% -77.359%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe

lower maybe optimization: 8192
                        time:   [435.56 µs 435.96 µs 436.47 µs]
                        change: [-5.1301% -4.8973% -4.6614%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

lower partial optimization: 8192
                        time:   [238.99 µs 239.86 µs 240.89 µs]
                        change: [-46.649% -46.348% -46.021%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb

Thank you for the contribution @JasonLi-cn -- I am a little concerned this optimization isn't valid in general for UTF-8 strings (I think it may work only for ascii)

datafusion/functions/src/string/common.rs

datafusion/functions/src/string/lower.rs

datafusion/functions/src/string/upper.rs

datafusion/functions/src/string/common.rs

comphead

I like numbers but the implementation looks a little bit complicated, I would check rust std lib how they implemented string and char case conversion

datafusion/functions/src/string/common.rs

Dandandan · 2024-04-10T09:33:37Z

datafusion/functions/src/string/common.rs

+    let item_len = string_array.len();
+
+    // Find the first nonascii string at the beginning.
+    let find_the_first_nonascii = || {


AFAIK it is quite a bit faster to do the check once on the entire string/byte array (including nulls), than to check it individually.
This should simplify the logic as well, e.g. not searching for the index but only do it when the entire array is ascii.

Thank you @Dandandan for your suggestion. Based on your suggestion:
The benefits:

Simpler logic

Helps to further improve the performance of Case1 and Case2

The downside:

Giving up on Case3

Is my understanding correct? 🤔

Your understanding seems correct :)

Ok. If the majority are in favor of this plan, I will implement it.

I agree it would be good to try the simpler approach. However, as long at this current implementation is well tested and shows performance improvements I think we could merge it as is and simplify the implementation in a follow on PR as well.

If this is your preference @JasonLi-cn I will try and find some more time to review the implementation carefully. A simpler implementation has the benefit it is easier (and thus faster) to review.

Some other random optimization thoughts:

We could and upper/lower values as a single string in one call, for example, detecting when the relevant value was a different length and doing a special path then

We could also special case when the string had nulls and when it didn't (which can make the inner loop simpler and allow a better chance for auto vectorization)

datafusion/functions/benches/lower.rs

alamb · 2024-04-11T19:52:11Z

datafusion/functions/src/string/common.rs

+    let item_len = string_array.len();
+
+    // Find the first nonascii string at the beginning.
+    let find_the_first_nonascii = || {


I agree it would be good to try the simpler approach. However, as long at this current implementation is well tested and shows performance improvements I think we could merge it as is and simplify the implementation in a follow on PR as well.

If this is your preference @JasonLi-cn I will try and find some more time to review the implementation carefully. A simpler implementation has the benefit it is easier (and thus faster) to review.

Some other random optimization thoughts:

We could and upper/lower values as a single string in one call, for example, detecting when the relevant value was a different length and doing a special path then

We could also special case when the string had nulls and when it didn't (which can make the inner loop simpler and allow a better chance for auto vectorization)

JasonLi-cn · 2024-04-12T11:14:36Z

New Test

Gnuplot not found, using plotters backend
lower_all_values_are_ascii: 1024
                        time:   [5.2583 µs 5.2623 µs 5.2670 µs]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

lower_the_first_value_is_nonascii: 1024
                        time:   [53.434 µs 53.474 µs 53.524 µs]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

lower_the_middle_value_is_nonascii: 1024
                        time:   [53.889 µs 53.975 µs 54.083 µs]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

lower_all_values_are_ascii: 4096
                        time:   [20.936 µs 20.950 µs 20.965 µs]
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

lower_the_first_value_is_nonascii: 4096
                        time:   [222.68 µs 222.90 µs 223.17 µs]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

lower_the_middle_value_is_nonascii: 4096
                        time:   [223.73 µs 223.98 µs 224.30 µs]
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

lower_all_values_are_ascii: 8192
                        time:   [41.524 µs 41.796 µs 42.172 µs]
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) high mild
  14 (14.00%) high severe

lower_the_first_value_is_nonascii: 8192
                        time:   [449.05 µs 449.71 µs 450.57 µs]
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

lower_the_middle_value_is_nonascii: 8192
                        time:   [451.06 µs 452.63 µs 454.66 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

alamb

Thank you @JasonLi-cn -- I think this PR looks really nice now.

Thank you for bearing with us through the process. I think the result is quite good

datafusion/functions/src/string/common.rs

alamb · 2024-04-13T13:11:19Z

datafusion/functions/src/string/common.rs

+
+    // conversion
+    let converted_values = op(str_values);
+    assert_eq!(converted_values.len(), str_values.len());


thank you for this check

comphead

lgtm thanks @JasonLi-cn and @alamb for the proper review.
The code is much cleaner now 👍

I hope the bench is even better after all changes applied

alamb · 2024-04-15T10:23:33Z

❤️ -- Thanks again @JasonLi-cn and @Dandandan and @comphead -- I love seeing these functions optimized like this.

I also really like the idea of finding the common patterns (e.g. how to make code for handling special cases) reusable / elegant. Can't wait to see what you come up with next

JasonLi-cn added 3 commits April 6, 2024 12:17

feat: optimize lower and upper functions

17743a5

chore: pass cargo check

e80e3db

chore: pass cargo clippy

bee2140

alamb reviewed Apr 6, 2024

View reviewed changes

datafusion/functions/src/string/common.rs Outdated Show resolved Hide resolved

datafusion/functions/src/string/common.rs Show resolved Hide resolved

datafusion/functions/src/string/lower.rs Outdated Show resolved Hide resolved

datafusion/functions/src/string/upper.rs Show resolved Hide resolved

JasonLi-cn marked this pull request as draft April 6, 2024 11:27

fix: lower and upper bug

2c27a8e

JasonLi-cn marked this pull request as ready for review April 7, 2024 12:38

optimize

9f1185b

comphead reviewed Apr 7, 2024

View reviewed changes

datafusion/functions/src/string/common.rs Outdated Show resolved Hide resolved

comphead reviewed Apr 7, 2024

View reviewed changes

jayzhan211 reviewed Apr 8, 2024

View reviewed changes

datafusion/functions/src/string/common.rs Outdated Show resolved Hide resolved

JasonLi-cn added 2 commits April 8, 2024 23:25

using iter to find the first nonascii

d0a1b4f

chore: rename function

fd04d4a

alamb mentioned this pull request Apr 8, 2024

DataFusion weekly project plan (Andrew Lamb) - April 8, 2024 #10002

Closed

9 tasks

Dandandan reviewed Apr 10, 2024

View reviewed changes

alamb reviewed Apr 11, 2024

View reviewed changes

JasonLi-cn added 2 commits April 12, 2024 17:07

merge main and resolve conflicts

f6f2ea2

refactor: case_conversion_array function

087f6ed

alamb approved these changes Apr 13, 2024

View reviewed changes

alamb changed the title ~~feat: optimize lower and upper functions~~ feat: optimize lower and upper functions Apr 13, 2024

comphead approved these changes Apr 13, 2024

View reviewed changes

refactor: remove !string_array.is_nullable() from case_conversion_array

eeb9282

alamb merged commit 483663b into apache:main Apr 15, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optimize `lower` and `upper` functions #9971

feat: optimize `lower` and `upper` functions #9971

JasonLi-cn commented Apr 6, 2024 •

edited

Loading

alamb left a comment

comphead left a comment

Dandandan Apr 10, 2024

JasonLi-cn Apr 10, 2024

Dandandan Apr 10, 2024

JasonLi-cn Apr 11, 2024

alamb Apr 11, 2024

alamb Apr 11, 2024

JasonLi-cn commented Apr 12, 2024

alamb left a comment

alamb Apr 13, 2024

comphead left a comment

alamb commented Apr 15, 2024

feat: optimize lower and upper functions #9971

feat: optimize lower and upper functions #9971

Conversation

JasonLi-cn commented Apr 6, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

Benchmark

Lower

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

comphead left a comment

Choose a reason for hiding this comment

Dandandan Apr 10, 2024

Choose a reason for hiding this comment

JasonLi-cn Apr 10, 2024

Choose a reason for hiding this comment

Dandandan Apr 10, 2024

Choose a reason for hiding this comment

JasonLi-cn Apr 11, 2024

Choose a reason for hiding this comment

alamb Apr 11, 2024

Choose a reason for hiding this comment

alamb Apr 11, 2024

Choose a reason for hiding this comment

JasonLi-cn commented Apr 12, 2024

New Test

alamb left a comment

Choose a reason for hiding this comment

alamb Apr 13, 2024

Choose a reason for hiding this comment

comphead left a comment

Choose a reason for hiding this comment

alamb commented Apr 15, 2024

feat: optimize `lower` and `upper` functions #9971

feat: optimize `lower` and `upper` functions #9971

JasonLi-cn commented Apr 6, 2024 •

edited

Loading