-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor base64 encoding #14611
base: master
Are you sure you want to change the base?
Refactor base64 encoding #14611
Conversation
I keep forgetting this exists Co-authored-by: David Keller <davidkeller@tuta.io>
# Clear out each buffer before each spec | ||
encode_buffer = IO::Memory.new | ||
decode_buffer = IO::Memory.new | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this 6 lines can be removed, and put initialization into it
looks good.
this pr
|
private PAD = '='.ord.to_u8 | ||
private NL = '\n'.ord.to_u8 | ||
private NR = '\r'.ord.to_u8 | ||
|
||
{% begin %} | ||
private STREAM_MAX_INPUT_BUFFER_SIZE = {{ IO::DEFAULT_BUFFER_SIZE // (LINE_SIZE + 1) * (LINE_SIZE // 4 * 3) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not understand why this is macro?, not just constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't compile without the macro because the compiler cannot find the referenced constants (probably because they are private)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should certainly work. They're in the same namespace. Might be a compiler bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified:
module A
FOO = 1
BAR = FOO
end
buffer = uninitialized UInt8[A::BAR]
# => Error: undefined constant FOO
-> #13018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also can't use the fix described in #13018, since I cannot refer to the constant by full path (private):
private STREAM_MAX_INPUT_BUFFER_SIZE = IO::DEFAULT_BUFFER_SIZE // (Base64::LINE_SIZE + 1) * Base64::LINE_BYTES
# Error: private constant Base64::LINE_SIZE referenced
I thought memmove was the function which disallowed overlapped memory regions for performance reasons
As a note to anyone reviewing this: In the bulk encoding part (-> Line 297 in b662f34
This was adapted from the previous encoding method: Line 236 in f9ffa35
The old method used an unaligned U32 access with This can also be seen in the generated IR: body: ; preds = %while
%122 = load ptr, ptr %cstr, align 8, !dbg !17334
%123 = load i32, ptr %122, align 4, !dbg !17334 ; <- Notice the align 4
%124 = call i32 @"*UInt32#byte_swap:UInt32"(i32 %123), !dbg !17335
The new variant should now always work correctly, but may be less performant than only reading 3 bytes on weak-memory architectures. Hopefully we'll find an even more efficient method for the bulk encoding work once crystal gets SIMD support. |
Co-authored-by: Jason Frey <fryguy9@gmail.com>
This looks great! Thanks for improving on my PR! |
This PR rewrites the entire base64-encoding logic.
While doing so, it adds methods used to encode (normal, urlsafe and strict) base64 from one
IO
into another.Also,
urlsafe_encode(data, io)
now received an optionalpadding = true
parameter to reach feature parity between all ways to encode base64 (buffer to buffer, buffer to IO, IO to IO).Encoding from buffer to buffer only saw small performance improvements, but everything related to
IO
s saw very significant performance improvements.The specs from #14604 have been copied over to this PR, although more should probably be added.
Benchmark Code
For this benchmark, I copied the
base64.cr
file from this repo intobase64blob.cr
and renamed the module inside toBase64Blob
. This way, both variants can be used in parallel.Direct Comparison
Throughput:
My Benchmark Results (Fedora 40, Ryzen 3600)
Direct comparisons
Throughput
Reading from and writing to
/dev/zero
, the code needed490.62ms
per1 GiB
(-> 2.04GiB / 2.19GB per second) in strict mode and565.33ms
using the default#encode
(-> 1.77GiB / 1.90GB per second)Closes #14604