enable avx512 support for base64 encoding. Reuse WojciechMula/base64-… #102

lucshi · 2022-08-18T07:29:51Z

Add AVX512 support for base64 encoding based on WojciechMula's code.

aklomp code base has been borrowed by Node.js upstream and I would like to enable avx512 support which can improve base64 encoding performance by additional +12% for node.js base64 on. The measurement of aklomp benchmark result also showed performance improvement for <100KB data, compared with AVX2. Improvements are around 36%-86%.

Testing with buffer size 100 KB, fastest of 10 * 100
AVX2    encode  17238.22 MB/sec
AVX2    decode  16882.58 MB/sec
AVX512  encode  23613.74 MB/sec
AVX512  decode  16260.03 MB/sec
Testing with buffer size 10 KB, fastest of 100 * 100
AVX2    encode  16852.97 MB/sec
AVX2    decode  16605.95 MB/sec
AVX512  encode  31398.70 MB/sec
AVX512  decode  16316.29 MB/sec
Testing with buffer size 1 KB, fastest of 100 * 1000
AVX2    encode  11344.30 MB/sec
AVX2    decode  10261.57 MB/sec
AVX512  encode  19500.83 MB/sec
AVX512  decode  10841.66 MB/sec

Above results were collected on Icelake cpu which is widely used in Amazon AWS EC2 cloud instances.

…avx512 code

htot · 2022-08-18T20:28:16Z

Ordinarily, you would not benefit from the memory cache. That means that depending on your L3 cache size 10MB or 100MB would be the buffer size you need for the benchmark.

aklomp · 2022-08-18T22:07:27Z

Thanks for the pull request. I'll try to comment on it in the next few days. I like the idea of adding an AVX512 codec if @WojciechMula's code is under a compatible license. But there are a few things that I don't like in this pull request, such as copypasting code from Chromium and not updating the user documentation. I'll try to respond more substantially in the coming days.

lucshi · 2022-08-19T06:46:09Z

Hi htot, thank you for your prompt comments. I did not get your point exactly. Could you please kindly give more hints? Thanks!

htot · 2022-08-20T19:50:36Z

Hi htot, thank you for your prompt comments. I did not get your point exactly. Could you please kindly give more hints? Thanks!

The benchmark repeatedly encodes the same array of data. If the array fits in your (L3) cache you measure very high speeds that you would normally not see when encoding a single image once. I think most of our benchmarks were done with 10MB, however I now have a system with 16MB cache.

lucshi · 2022-08-21T07:01:32Z

Hi htot, thank you for your prompt comments. I did not get your point exactly. Could you please kindly give more hints? Thanks!

The benchmark repeatedly encodes the same array of data. If the array fits in your (L3) cache you measure very high speeds that you would normally not see when encoding a single image once. I think most of our benchmarks were done with 10MB, however I now have a system with 16MB cache.

L3 cache of each of CPU socket is 48MB (2 sockets are 96MB in total). Are you suggesting I should enlarge the benchmark data size bigger than 48MB/96MB, or just measure the first round performance to avoid cache benefit? I did not get your point why it would be a problem when benchmark data size is smaller than cache size. I thought L3 cache cannot make two algorithem with the same performance if the algorithms have performance gaps in nature.

htot · 2022-08-21T11:24:32Z

That's a lot of cache :-) I was lazy and just added a benchmark with 100MB. I'm note sure how the split L3 cache works. But in principle I would say that caching code but not the data is fair. Caching data can be useful to compare the speed of the algorithms excluding time to access memory.

lucshi · 2022-09-29T07:48:09Z

Hi @aklomp , I updated my patch and accepted your review comments of "adding readme and removing chromium code". Could you please have a review? Thanks! After it is landed it will be merged back into Node.js repo.

lucshi · 2022-10-09T01:16:45Z

Hi @aklomp , I'm wondering if you have bandwidth to review this PR. Thanks!

aklomp · 2022-10-09T01:26:52Z

@lucshi Sorry for the delay, I'm still here! Your pull request is near the top of my mental to-do list of things to process (along with #105), but I've been really busy the past weeks and haven't found a moment to deeply engage with it. I do plan on giving it a closer look soon. If you feel that I'm taking too long, don't hesitate to reach out.

lucshi · 2022-10-09T01:34:58Z

@lucshi Sorry for the delay, I'm still here! Your pull request is near the top of my mental to-do list of things to process (along with #105), but I've been really busy the past weeks and haven't found a moment to deeply engage with it. I do plan on giving it a closer look soon. If you feel that I'm taking too long, don't hesitate to reach out.

Thank you @aklomp for the response. I would like to make it landed if possible very soon to catch up with the Node.js upstream optimization. Do you require me to write some notes about the PR to make the review faster?

aklomp · 2022-10-09T01:38:22Z

lib/arch/avx512/dec_loop.c

+}
+
+static inline void
+dec_loop_avx2 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)


Typo? Should be dec_loop_avx512

No. This PR is only for the encoding part for AVX512 because Node.js only depends on base64 SIMD encoding. In general Base64 decoding cannot be vectorized when there are space chars in input. To not break your project in general, I reuse the AVX2 for decoding part in my PR.

aklomp · 2022-10-09T01:44:49Z

@lucshi The main question I had was whether WojciechMula's code was under a compatible licence, but apparently it is (3-clause BSD).

Before merging, I also want to do a thorough code review to ensure that the code conforms to the library's general structure. It's this code review that I'm blocked on, but I'll try to put it on the fast path. I do feel bad for making you wait this long for a proper PR review. I really do appreciate the effort that you put in this, and it's definitely a very nice enhancement.

lucshi · 2022-10-09T02:12:23Z

@lucshi The main question I had was whether WojciechMula's code was under a compatible licence, but apparently it is (3-clause BSD).

Before merging, I also want to do a thorough code review to ensure that the code conforms to the library's general structure. It's this code review that I'm blocked on, but I'll try to put it on the fast path. I do feel bad for making you wait this long for a proper PR review. I really do appreciate the effort that you put in this, and it's definitely a very nice enhancement.

I'm totally OK for a careful code review. Per my understanding, BSD-3 is a most popular license nowadays and the code can be used freely in BSD-2 project as long as including the copyright text and the authors and projects names not be used in market promotion.

I'll also start the IP scan for the code base along with my PR by Protex IP Scanning tool to see if any other IP issues. Will update the results soon.

lucshi · 2022-10-09T03:39:57Z

There may be other IP related revise, but the IP scanning tool Protex did not found IP conflictions.

aklomp · 2022-10-11T19:49:27Z

Hey, I've informally reviewed the updated PR and I think it looks good on the whole. It's something I think can be merged, but it needs some work to clean it up. I'm volunteering to do that; more on that below. The main sticking points regarding the code I have are:

The commit history is not nice, clean, composable and atomic. The first commit makes all kinds of changes, the next commit undoes a whole bunch of stuff. The commits are also either way too large or way too small. IMO, a patchset should tell a story and introduce (or remove) bits of functionality in a way that's atomic. A good split here would be "add codec", then "add cmake and makefile support for avx512", then "update the test suite", then "update the readme", or something along those lines.
There are a lot of code style errors: trailing whitespace, indents with spaces instead of tabs, overlong lines, overlong commit messages, inconsistent code formatting, and so on.
There's code duplication: the AVX2 decoder is copypasted. I understand the reason why, but it should be changed before merging. The "nicer" way of doing the same thing is to #include "../avx2/dec_loop.c" in lib/arch/avx512/codec.c.
There are some minor things missing, such as adding Daniel Lemire and Wojciech Muła to the copyright section of the LICENSE file.

I'm giving you this short list instead of a full review because I don't want to come across as mean by pointing out boring details like trailing whitespace in overlong detail. I'm saving us the effort. But if you want though, I can certainly do a full review.

For the way forward, what I had in mind was that I would take your PR and rework it to fix the issues above, retaining you as the author, and then come back to you for permission to merge that new branch. I think it's faster and less frustrating (and less work for you!) to do it that way. Of course, if you prefer, we can also do it classically and I'll be the reviewer/oracle while you push corrections. What do you say?

Then there are two ephemeral issues that should also be mentioned:

Muła's code is under BSD3, my code is under BSD2. Are they compatible? I'm not a license expert, but BSD3 is less permissive than BSD2, so I would guess that they might not be compatible. Any thoughts on this?
CI is currently failing in a number of places, but that's due to the issue mentioned in benchmarks on Atom crashes on AVX2 #77. It's not something that I can fix now without rearchitecting the library, so I suggest we leave it as-is.

(Sorry for accidentally closing this issue, I clicked the wrong button!)

aklomp · 2022-10-11T20:47:23Z

I pushed a branch called avx512 which squashes your work on top of master and makes some cleanups/fixes. I didn't break this up into smaller atomic commits yet, but it gives you an idea of the direction I'm moving in.

I also set up Intel SDE locally to test the AVX512 code.. very good for building confidence in the code! 😅

lucshi · 2022-10-12T01:32:52Z

Hi @aklomp , Thank you for the quick review comments. I'm OK to go with the fastest way that you do the fixes and made me one of the authors.

When everything is done, I will submit a PR on node.js and merge updated code.

lucshi · 2022-10-12T01:46:37Z

For the license issue. I have consulted the license expert in my company for the process of adding a BSD3 source file avx512.c into a BSD2 project.

The answer I got as below:

My understanding is that the bottom line is that you make sure the bsd-3 license text is in the avx512.c file.

aklomp · 2022-10-17T21:44:14Z

Again sorry for the delay. I intend to refactor the avx512 branch soon and merge it, so that we can both move on :)

Thanks for checking about the license. I agree with the proposed solution of putting the BSD-3 license text in the file with the imported code; that should indeed satisfy the license requirements.

I also created an issue, #106, to discuss moving this project from BSD-2 to BSD-3. It's a very small change that I think is for the better, and it will allow this project to import code from more locations. The one remaining question is if it would impact downstream users of this library, but I don't really think so because this library is generally included as a source bundle complete with license file.

aklomp · 2022-10-18T22:06:13Z

Regarding the license issue, I noticed that @WojciechMula has also published his AVX512 encoder in his base64simd project, specificially here. That library is under the BSD-2 license. That simplifies things, all we need to do is update the copyright year in Muła's mention in the LICENSE.

lucshi · 2022-10-19T01:03:30Z

Regarding the license issue, I noticed that @WojciechMula has also published his AVX512 encoder in his base64simd project, specificially here. That library is under the BSD-2 license. That simplifies things, all we need to do is update the copyright year in Muła's mention in the LICENSE.

Great！

aklomp · 2022-10-19T19:24:09Z

Alright, I pushed an avx512 branch that contains four commits, crediting you as the author, that implement the AVX512 encoder. The branch contains a number of relatively minor fixes I made to your PR; you can review them with a git diff.

Since your name is on these commits, I have to ask you: are you OK if I merge this?

lucshi · 2022-10-20T01:33:00Z

I'm OK to merge. Thank you!

enable avx512 support for base64 encoding. Reuse WojciechMula/base64-…

b42abe1

…avx512 code

lucshi added 3 commits September 26, 2022 17:24

fix bug

8731e47

updated the AVX512 encoding code and README, removed the Chromium code

ad528a9

resume to the main branch

234760e

rschu1ze mentioned this pull request Sep 30, 2022

AVX512VBMI implementation of base64Encode / base64Decode ClickHouse/ClickHouse#41957

Closed

aklomp reviewed Oct 9, 2022

View reviewed changes

aklomp closed this Oct 11, 2022

aklomp reopened this Oct 11, 2022

aklomp added enhancement performance labels Oct 11, 2022

aklomp force-pushed the master branch from 4a87aba to cba709a Compare October 13, 2022 13:15

aklomp closed this in 6b1a8b8 Oct 20, 2022

aklomp mentioned this pull request Nov 8, 2023

Create release 0.5.1 #122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable avx512 support for base64 encoding. Reuse WojciechMula/base64-… #102

enable avx512 support for base64 encoding. Reuse WojciechMula/base64-… #102

lucshi commented Aug 18, 2022

htot commented Aug 18, 2022

aklomp commented Aug 18, 2022

lucshi commented Aug 19, 2022

htot commented Aug 20, 2022

lucshi commented Aug 21, 2022

htot commented Aug 21, 2022

lucshi commented Sep 29, 2022

lucshi commented Oct 9, 2022

aklomp commented Oct 9, 2022

lucshi commented Oct 9, 2022

aklomp Oct 9, 2022

lucshi Oct 9, 2022

aklomp commented Oct 9, 2022

lucshi commented Oct 9, 2022 •

edited

lucshi commented Oct 9, 2022

aklomp commented Oct 11, 2022 •

edited

aklomp commented Oct 11, 2022

lucshi commented Oct 12, 2022

lucshi commented Oct 12, 2022

aklomp commented Oct 17, 2022

aklomp commented Oct 18, 2022

lucshi commented Oct 19, 2022

aklomp commented Oct 19, 2022

lucshi commented Oct 20, 2022

enable avx512 support for base64 encoding. Reuse WojciechMula/base64-… #102

enable avx512 support for base64 encoding. Reuse WojciechMula/base64-… #102

Conversation

lucshi commented Aug 18, 2022

htot commented Aug 18, 2022

aklomp commented Aug 18, 2022

lucshi commented Aug 19, 2022

htot commented Aug 20, 2022

lucshi commented Aug 21, 2022

htot commented Aug 21, 2022

lucshi commented Sep 29, 2022

lucshi commented Oct 9, 2022

aklomp commented Oct 9, 2022

lucshi commented Oct 9, 2022

aklomp Oct 9, 2022

Choose a reason for hiding this comment

lucshi Oct 9, 2022

Choose a reason for hiding this comment

aklomp commented Oct 9, 2022

lucshi commented Oct 9, 2022 • edited

lucshi commented Oct 9, 2022

aklomp commented Oct 11, 2022 • edited

aklomp commented Oct 11, 2022

lucshi commented Oct 12, 2022

lucshi commented Oct 12, 2022

aklomp commented Oct 17, 2022

aklomp commented Oct 18, 2022

lucshi commented Oct 19, 2022

aklomp commented Oct 19, 2022

lucshi commented Oct 20, 2022

lucshi commented Oct 9, 2022 •

edited

aklomp commented Oct 11, 2022 •

edited