-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base64 decode faster #10032
base64 decode faster #10032
Conversation
- by using a lookup table instead of strchr() - by doing full quantums first, then padding
I see tiny speedup on your test program, like 10s -> 9s with this patch. |
Yes, I think that is pretty obvious if we can accept the added memory
requirement. I went with my solution to keep it smaller.
…On December 5, 2022 6:13:56 PM Markus Linnala ***@***.***> wrote:
I see tiny speedup on your test program, like 10s -> 9s with this patch.
***@***.***
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
maybe I should allow the user to select which way to go with an ifdef... |
Thanks @bagder for this great improvement. In a quick analysis, your original solution uses less memory but there are some caveats:
The original solution has a memory peak greater than the second one during the function execution. Please also note that "operations to copy the data to stack" will also require some space in the code segment. Any compiler with a decent optimizer will replace both 2 memcpy to several other instructions. (Please check this example: https://godbolt.org/z/o8995W7qv): GCC (with -O2 & x86-64 arch) optimizes the memcpys to other instructions that requires 112 bytes in code segment. So, compiling with GCC, the original solution spends 80 + 112 = 192 bytes + 256 (if the function is called). Compiling with Clang, spends 80 + 192 = 272 bytes (more than the second solution) + 256 bytes in the stack. With all those numbers, I strongly recommend to use the second version and maybe add an ifdef to allow user change it if required, as you commented. What do you think? |
I've moved on. I don't believe in adding ifdefs for this because nobody will care enough to alter those. If someone feels strongly for further changes, propose those in a PR and explain why we want that. |
Test
The test decodes "short" strings, up to 2948 bytes in length and it loops a thousand times.
test code
Numbers
Best out of three runs on my local machine.
Old code: 1m47.593s
New code: 24.098s
Speedup: 4.46 times.