ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch #7418

wesm · 2020-06-12T13:12:38Z

Following on discussion in #7357. I added a simple benchmark also.

--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiLower    4774004 ns    4773998 ns        149   3.24122GB/s   209.468M items/s
AsciiUpper    4708606 ns    4708590 ns        146   3.28625GB/s   212.378M items/s

wesm · 2020-06-12T13:12:49Z

cc @maartenbreddels @pitrou

github-actions · 2020-06-12T13:16:53Z

https://issues.apache.org/jira/browse/ARROW-9115

pitrou · 2020-06-12T13:24:13Z

I get similar numbers here. It seems to be a 15x speedup over git master.

pitrou

+1

cpp/src/arrow/compute/kernels/scalar_string.cc

cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc

pitrou · 2020-06-12T13:35:25Z

Before:

AsciiLower   76218768 ns     76206752 ns           28 bytes_per_second=207.921M/s items_per_second=13.7596M/s
AsciiUpper   83254436 ns     83232143 ns           26 bytes_per_second=190.371M/s items_per_second=12.5982M/s

After:

AsciiLower    4512754 ns      4510290 ns          483 bytes_per_second=3.43073G/s items_per_second=232.485M/s
AsciiUpper    4536864 ns      4534349 ns          462 bytes_per_second=3.41253G/s items_per_second=231.252M/s

cyb70289 · 2020-06-12T14:19:59Z

cpp/src/arrow/compute/kernels/scalar_string.cc

+    const ArrayData& input = *batch[0].array();
+    ArrayData* out_arr = out->mutable_array();
+    // Reuse offsets from input
+    out_arr->buffers[1] = input.buffers[1];


These buffers[1], buffers[2] are mysterious to me. Any hint to figure it out? Thanks.

https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#buffer-listing-for-each-layout

maartenbreddels · 2020-06-12T19:14:29Z

Excellent, I was planning to look at this next week. But this probably saved me quite some time, and gives me some more examples, thanks.

wesm · 2020-06-12T20:04:52Z

Sounds good. I'll probably implement some more example kernels soon that have to process each value individually and be more efficient than the prior examples (by not copying anything if it isn't necessary, etc). Thinking strip/rstrip/lstrip

cpp/src/arrow/compute/kernels/scalar_string.cc

Following up on apache#7418 I tried and benchmarked a different way for * ascii_lower * ascii_upper Before (lower is similar): ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 4922843 ns 4918961 ns 10 bytes_per_second=3.1457G/s items_per_second=213.17M/s ``` After: ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 1391272 ns 1390014 ns 10 bytes_per_second=11.132G/s items_per_second=754.363M/s ``` This is a 3.7x speedup (on a AMD machine). Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2. Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode Standard Core Specification v13.0).

Following up on #7418 I tried and benchmarked a different way for * ascii_lower * ascii_upper Before (lower is similar): ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 4922843 ns 4918961 ns 10 bytes_per_second=3.1457G/s items_per_second=213.17M/s ``` After: ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 1391272 ns 1390014 ns 10 bytes_per_second=11.132G/s items_per_second=754.363M/s ``` This is a 3.7x speedup (on a AMD machine). Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2. Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode Standard Core Specification v13.0). Closes #7434 from maartenbreddels/ARROW-9131 Authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

wesm added 3 commits June 12, 2020 08:06

Batch-based ASCII lower/upper implementations

423e2e2

Add simple benchmarks for upper/lower

e8f0ee9

Rename function

8d71687

pitrou reviewed Jun 12, 2020

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_string.cc Show resolved Hide resolved

pitrou reviewed Jun 12, 2020

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc Show resolved Hide resolved

Review feedback

42e9ac8

cyb70289 reviewed Jun 12, 2020

View reviewed changes

wesm closed this in 8d782b1 Jun 12, 2020

wesm deleted the ARROW-9115 branch June 12, 2020 14:38

pierrebelzile reviewed Jun 13, 2020

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_string.cc Show resolved Hide resolved

maartenbreddels mentioned this pull request Jun 15, 2020

ARROW-9131: [C++] Faster ascii_lower and ascii_upper. #7434

Closed

This was referenced Jun 13, 2020

[C++] Process data buffers in batch in ascii_lower / ascii_upper kernels rather than using string_view value iteration #25227

Closed

[C++] Adapt ascii_lower/ascii_upper bulk transforms to work on sliced arrays #25234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch #7418

ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch #7418

wesm commented Jun 12, 2020 •

edited

wesm commented Jun 12, 2020

github-actions bot commented Jun 12, 2020

pitrou commented Jun 12, 2020

pitrou left a comment

pitrou commented Jun 12, 2020

cyb70289 Jun 12, 2020

wesm Jun 12, 2020

maartenbreddels commented Jun 12, 2020

wesm commented Jun 12, 2020 •

edited

ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch #7418

ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch #7418

Conversation

wesm commented Jun 12, 2020 • edited

wesm commented Jun 12, 2020

github-actions bot commented Jun 12, 2020

pitrou commented Jun 12, 2020

pitrou left a comment

Choose a reason for hiding this comment

pitrou commented Jun 12, 2020

cyb70289 Jun 12, 2020

Choose a reason for hiding this comment

wesm Jun 12, 2020

Choose a reason for hiding this comment

maartenbreddels commented Jun 12, 2020

wesm commented Jun 12, 2020 • edited

wesm commented Jun 12, 2020 •

edited

wesm commented Jun 12, 2020 •

edited