Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch #7418

Closed
wants to merge 4 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Jun 12, 2020

Following on discussion in #7357. I added a simple benchmark also.

--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiLower    4774004 ns    4773998 ns        149   3.24122GB/s   209.468M items/s
AsciiUpper    4708606 ns    4708590 ns        146   3.28625GB/s   212.378M items/s

@wesm
Copy link
Member Author

wesm commented Jun 12, 2020

cc @maartenbreddels @pitrou

@github-actions
Copy link

@pitrou
Copy link
Member

pitrou commented Jun 12, 2020

I get similar numbers here. It seems to be a 15x speedup over git master.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pitrou
Copy link
Member

pitrou commented Jun 12, 2020

Before:

AsciiLower   76218768 ns     76206752 ns           28 bytes_per_second=207.921M/s items_per_second=13.7596M/s
AsciiUpper   83254436 ns     83232143 ns           26 bytes_per_second=190.371M/s items_per_second=12.5982M/s

After:

AsciiLower    4512754 ns      4510290 ns          483 bytes_per_second=3.43073G/s items_per_second=232.485M/s
AsciiUpper    4536864 ns      4534349 ns          462 bytes_per_second=3.41253G/s items_per_second=231.252M/s

const ArrayData& input = *batch[0].array();
ArrayData* out_arr = out->mutable_array();
// Reuse offsets from input
out_arr->buffers[1] = input.buffers[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These buffers[1], buffers[2] are mysterious to me. Any hint to figure it out? Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm wesm closed this in 8d782b1 Jun 12, 2020
@wesm wesm deleted the ARROW-9115 branch June 12, 2020 14:38
@maartenbreddels
Copy link
Contributor

Excellent, I was planning to look at this next week. But this probably saved me quite some time, and gives me some more examples, thanks.

@wesm
Copy link
Member Author

wesm commented Jun 12, 2020

Sounds good. I'll probably implement some more example kernels soon that have to process each value individually and be more efficient than the prior examples (by not copying anything if it isn't necessary, etc). Thinking strip/rstrip/lstrip

maartenbreddels added a commit to maartenbreddels/arrow that referenced this pull request Jun 15, 2020
Following up on apache#7418 I tried and benchmarked a different way for
 * ascii_lower
 * ascii_upper

Before (lower is similar):
```
--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    4922843 ns      4918961 ns           10 bytes_per_second=3.1457G/s items_per_second=213.17M/s
```

After:
```
--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    1391272 ns      1390014 ns           10 bytes_per_second=11.132G/s items_per_second=754.363M/s

```

This is a 3.7x speedup (on a AMD machine).

Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2.

Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper
or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode
Standard Core Specification v13.0).
pitrou pushed a commit that referenced this pull request Jun 15, 2020
Following up on #7418 I tried and benchmarked a different way for
 * ascii_lower
 * ascii_upper

Before (lower is similar):
```
--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    4922843 ns      4918961 ns           10 bytes_per_second=3.1457G/s items_per_second=213.17M/s
```

After:
```
--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    1391272 ns      1390014 ns           10 bytes_per_second=11.132G/s items_per_second=754.363M/s

```

This is a 3.7x speedup (on a AMD machine).

Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2.

Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper
or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode
Standard Core Specification v13.0).

Closes #7434 from maartenbreddels/ARROW-9131

Authored-by: Maarten A. Breddels <maartenbreddels@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants