-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace "Turbo-Base64" to "lemire/fastbase64" #31643
Comments
The fastbase64 library seems to be a good option for avx2 (and avx2-512) SIMD support. It does not seem to have native support for Neon (AMD) or AltiVec (PowerPC), so if nothing else, there would be a fallback to the slower chromium base64 for those. I saw that the aklomp library was attempted previously and unfortunately seems to have some issues as well. |
Looks abandoned. |
Do you still want it to be done? I'd like to work on this. |
@mkmkme, Yes, it makes sense (although with low priority). It will be very appreciated! |
It seems that so far this library is x86-only. I pinged the author to see if they have any plans on supporting ARM as well. |
I've got the reply from them to use https://github.com/aklomp/base64 |
So, here's the summary of my research. lemire/fastbase64 is not a proper library that was pointed by the author, but rather a PoC (for instance, it is x86-only). The author recommended using aklomp/base64, as that one inherited many of his algorithms as well. WojciechMula/base64-avx512 mentioned in issue #41957 is also x86-only. It has a reference to a different repository by the same author, WojciechMula/base64simd, that is mentioned by the author to be problematic, and they recommend using While trying to integrate And the result is quite underwhelming. On ARM (M1 Max with macOS), Here's version with Turbo-Base64: And here's with I tried to poke to compile flags, but the result didn't get any better. I thought the reason might be ARM or macOS specific. My changes can be found here. Sorry I didn't make it into a pull request. I think it's not worth even a WIP pull request, because the results are not satisfactory so far. I'd like to ping @alexey-milovidov and @rschu1ze. Maybe I'm missing something obvious? Or maybe it's not yet time to replace TB64 with something different? |
BTW, the behaviour of ARM and AMD builds is different . |
Hey @den-crane! Are you still able to reproduce the issue? There was a merged PR (#33779) since the version in question was released. I just tested on my Macbook, and the command seems to work just fine: It's not inside the docker, but I doubt that would change the behaviour. |
Ampere A1 ClickHouse client version 23.6.2.18 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.6.2 revision 54464.
clickhouse :) select base64Decode('AGy1cKzasOC/yOPct7tO3w');
SELECT base64Decode('AGy1cKzasOC/yOPct7tO3w')
Query id: cacb7607-1a58-4066-94b1-434d72ef9209
0 rows in set. Elapsed: 0.021 sec.
Received exception from server (version 23.6.2):
Code: 117. DB::Exception: Received from localhost:9000.
DB::Exception: Failed to base64Decode input 'AGy1cKzasOC/yOPct7tO3w':
While processing base64Decode('AGy1cKzasOC/yOPct7tO3w'). (INCORRECT_DATA)
Linux clickhouse 5.15.0-1021-oracle #27~20.04.1-Ubuntu SMP Mon Oct 17 01:56:24 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux Mac M1 curl https://clickhouse.com/ | sh
./clickhouse local
ClickHouse local version 23.7.1.2417 (official build).
m1.local :) select base64Decode('AGy1cKzasOC/yOPct7tO3w');
SELECT base64Decode('AGy1cKzasOC/yOPct7tO3w')
Query id: 813728fb-c65d-4cb6-a3b7-5a9f2d6bced7
0 rows in set. Elapsed: 0.399 sec.
Received exception:
Code: 117.
DB::Exception: Failed to base64Decode input 'AGy1cKzasOC/yOPct7tO3w':
While processing base64Decode('AGy1cKzasOC/yOPct7tO3w'). (INCORRECT_DATA) arm
SELECT base64Decode('SGVsbG8gd29ybGQ');
DB::Exception: Failed to base64Decode input 'SGVsbG8gd29ybGQ': While processing base64Decode('SGVsbG8gd29ybGQ'). (INCORRECT_DATA)
amd
SELECT base64Decode('SGVsbG8gd29ybGQ');
┌─base64Decode('SGVsbG8gd29ybGQ')─┐
│ Hello world │
└─────────────────────────────────┘ |
Huh, that's interesting! I've downloaded CH via curl and got the same result as you. However, the binary built on my machine works just fine! So there's a chance something is going on with official build. Sorry, I'm also quite new to this project, so I'd like to raise two newbie questions:
|
yes.
yes.
I think they should be available in |
Hmm, I stand corrected. Yesterday I think I mistakenly ran the version with I double-checked now, and yes, the version with Turbo-Base64 does fail with your example. |
@mkmkme About #31643 (comment): Yes, different code gets called for x86 and ARM but I am sure that @mkmkme noticed that already:
(btw., this discussion really belongs into #45790)
Looking at your draft, I notice that it calls |
Hey @rschu1ze ! |
Okay, that is unfortunate from a performance POV (assuming your measurements compared AVX2 vs. AVX2, AVX512 vs. AVX512, i.e. in a fair way). On the other hand, the motivation to go for |
I'll try to build some small reproduction for that case. |
Accepting performance degradation is possible but not good. |
Right now I'm working on a minimal benchmark to properly compare these two libraries. Turbo-Base64
aklomp/base64 (note that it's upside down in comparison to TB64)
|
@rschu1ze @alexey-milovidov I've pushed a somewhat minimalistic benchmark to compare those libraries. In that repo, I tried to build Turbo-Base64 exactly as it is built in CH. For Surprisingly, this time I got Please take a look. Test on your machine, if needed. And please tell me if this benchmark makes sense. |
https://github.com/mkmkme/base64-benchmark/blob/main/main.cpp#L55 This isn't the best benchmark - unrealistic data. If the number 10000 means bytes, it is too much. Typical strings in a database are in order of 100 bytes. The best benchmark will be on URL, Title, SearchPhrase columns from ClickBench. |
@alexey-milovidov noted. I updated the benchmark to use the data taken from ClickBench. I used URL, Title and SearchPhrase from the dataset, where The new results are also there. And so far those are exactly the same as before: Please take a look at https://github.com/mkmkme/base64-benchmark/blob/main/README.md |
Hi, I've actually no arm64 benchmarks on M1. In your benchmark you're wrongly calling the scalar functions tb64senc/tb64sdec while the correct functions for all architectures are tb64enc/tb64dec (or the direct SIMD/NEON calls tb64v128enc/tb64v128dec) In general you can always call the auto functions: tb64enc/tb64dec (no need to initialize anything). |
Hey @powturbo, Thanks for your input! You're right I was using scalar functions in case of Turbo-Base64 functions on ClickHouse/src/Functions/FunctionBase64Conversion.h Lines 49 to 57 in 357fee9
|
I've now the results for apple M1. The benchmark was done by a third party. All libs with the latest versions. Note that base64 is using inline assembly.
|
@alexey-milovidov @rschu1ze what'd you say? Given the new information, do you still want to move to the other library? |
(sorry for the delay, just returned from vacation) The Turbo-Base64 version used by ClickHouse and in the benchmark is the last non-GPL version and ca. 3 years old. While the performance improvements since then (#31643 (comment)) are appreciated, they are not usable for ClickHouse due to licensing issues. The performance trade-off (slower x86, faster ARM) seems reasonable given that 1. base64-decoding is IMHO not exactly a bottleneck and 2. the version of aklomp/base64 used in the benchmark is also slightly outdated and there were more upstream optimizations for x86 since then. Overall, I think it's worth to move to aklomp/base64. |
The bug on AArch64
is a blocker for using this library. |
I'll double-check if I can still reproduce it |
Couldn't reproduce this on my AArch64 VM. |
https://github.com/lemire/fastbase64
The text was updated successfully, but these errors were encountered: