Allow passing an `IO` as the Base64.encode source #14604

jgaskins · 2024-05-21T23:28:21Z

I haven't added decoding yet because I didn't have the code for that already written. This is a decent proof of concept, generating 100MB of binary data, writing it to a file on disk, then base64-encoding it as JSON to another file. It pauses before exiting so I could check how much RAM it's using and it only uses 2.1MB of RAM for all of this.

require "base64"
require "json"

# # Generate the giant binary file data
puts "Generating 100MB of binary data..."
File.open "foo.data", "w" do |f|
  (100 << 20).times do
    f.write_byte rand(UInt8)
  end
end

pp upload = FileUpload.new("output", Path["foo.data"])

puts "Writing JSON..."
File.open "output", "w" do |file|
  upload.to_json file
end

puts "Done."
# Pause before exiting to be able to check the amount of RAM used.
gets

struct FileUpload
  include JSON::Serializable

  getter filename : String
  @[JSON::Field(key: "data", converter: FileUpload::Base64Converter)]
  getter path : Path

  def initialize(@filename, @path)
  end

  module Base64Converter
    extend self

    # Output the file contents as base64 directly to the `JSON::Builder`'s `IO`
    def to_json(path : Path, json : JSON::Builder)
      File.open path do |file|
        json.string do |io|
          Base64.encode file, io
        end
      end
    end
  end
end

BlobCodes · 2024-05-21T23:33:48Z

src/base64.cr

+    inc = 0
+    # Base64 operates in 3-byte segments, so we use a buffer size that is a
+    # multiple of 3.
+    buffer = StaticArray(UInt8, 8193).new { 0u8 }


Zeroing out 8193 bytes is way too expensive for small inputs.

Suggested change

buffer = StaticArray(UInt8, 8193).new { 0u8 }

buffer = uninitialized UInt8[8193]

Can you elaborate on "too expensive"? How expensive is it?

crystal eval ' require "benchmark" io = IO::Memory.new(10000) Benchmark.ips do |x| x.report("zero") { buffer = StaticArray(UInt8, 8193).new { 0u8 }; io.read(buffer.to_slice); io.rewind; nil } x.report("nozero") { buffer = uninitialized UInt8[8193]; io.read(buffer.to_slice); io.rewind; nil } end ' --release zero 14.52M ( 68.87ns) (± 0.09%) 0.0B/op 13.13× slower nozero 190.61M ( 5.25ns) (± 0.22%) 0.0B/op fastest

Of course this is only synthetic, but encoding a few characters should take around 10-20ns on a modern cpu - so this will unnecessarily increase compute time for zero gain.

EDIT: Btw notice that the io is only there so LLVM doesn't optimize the buffers away. No data is copied at all in the benchmark above. (that would change the results to 140.48ns and 71.99ns, respectively).

While I'm not disputing that it's faster not to zero it out (all else being equal, it is doing a subset of the work), calling an operation measured in tens of nanoseconds "way too expensive" is simply incorrect.

I benchmarked the end-to-end process of encoding data from a file to another file, using a very small input like you mentioned. No matter how many times I run it, or which order they run in, the performance is indistinguishable because opening the files is 3 orders of magnitude more expensive than zeroing out 8KB of RAM — this would be even more pronounced on cloud servers that typically have slower hard drives than my MacBook Pro.

➜ Code crystal run --release bench_b64_encode_io.cr nozero 20.42k ( 48.96µs) (±10.30%) 65.2kB/op 1.00× slower zero 20.52k ( 48.74µs) (± 8.75%) 65.2kB/op fastest ➜ Code crystal run --release bench_b64_encode_io.cr nozero 20.31k ( 49.23µs) (± 8.96%) 65.2kB/op 1.03× slower zero 20.84k ( 47.99µs) (±10.20%) 65.2kB/op fastest ➜ Code crystal run --release bench_b64_encode_io.cr nozero 20.59k ( 48.56µs) (± 9.61%) 65.2kB/op fastest zero 20.27k ( 49.33µs) (±11.81%) 65.2kB/op 1.02× slower

Benchmark code

require "benchmark" input = "data.bin" output = "data.out" File.open input, "w" do |file| 5.times { file << "." } end Benchmark.ips do |x| x.report("nozero") do File.open input do |source| File.open output, "w" do |destination| Base64.encode source.rewind, destination.rewind, zero: false end end end x.report("zero") do File.open input do |source| File.open output, "w" do |destination| Base64.encode source.rewind, destination.rewind, zero: true end end end end module Base64 def encode(data : IO, io : IO, zero : Bool) count = 0 encode_with_new_lines(data, zero: zero) do |byte| io << byte.unsafe_chr count += 1 end end private def encode_with_new_lines(io : IO, zero : Bool, &) inc = 0 # Base64 operates in 3-byte segments, so we use a buffer size that is a # multiple of 3. if zero buffer = StaticArray(UInt8, 8193).new { 0u8 } else buffer = uninitialized UInt8[8193] end bytes = buffer.to_slice loop do bytes_read = io.read(bytes) break if bytes_read == 0 to_base64(bytes[0, bytes_read], CHARS_STD, pad: bytes_read < bytes.bytesize) do |byte| yield byte inc += 1 if inc >= LINE_SIZE yield NL inc = 0 end end end if inc > 0 yield NL end end end

Let's talk about the reasons for zeroing out. Why do you think it's necessary in the first place?
We know how much data is written to the buffer and thus can scale the slice that points into it accordingly. So it shouldn't matter whether the parts that are not written to hold zeros or garbage.

Why do you think I think it's necessary?

Because you do it. And I understand your comments in this thread as supportive of using the zeroing constructor instead of an uninitialised StaticArray as suggested in #14604 (comment)

Because you do it.

StaticArray(Type, Size).new doesn't work, so I passed the block because I know that works. That's really all there was to it.

77 | buffer = StaticArray(UInt8, 8193).new ^-- Error: private method 'new' called for StaticArray(T, N).class

I don't use StaticArrays frequently enough to remember all the details. I use them about 1-2x a year and the only thing I ever remember about them is that they should never leave the stack frame they were instantiated in. 😄

I understand your comments in this thread as supportive of using the zeroing constructor instead of an uninitialised StaticArray

There is a very important distinction between rejecting invalid criticism and defending minutiae in the code. I was doing the former and avoiding the latter.

@jgaskins

StaticArray(Type, Size).new doesn't work, so I passed the block because I know that works [...]

StaticArray(UInt8, 8193).new(0_u8) should work as well.

Okay. 👍

It's usually best to use uninitialized StaticArray(...) when you don't need any specific initialized data. That the case here.
So the original suggestion was good.
The argument about expensiveness might have some subjectivity, but I think we can agree that zeroing is completely unnecessary here. Not doing it has the potential of improving performance. However significant that may be in the greater scheme, it's the right thing to avoid unnecessary extras when possible.

Dismissing actual performance numbers is an odd choice, though.

Benchmarks lied to me far too much, like swap the x.report around and the first one is always faster 🤷 Benchmarks are useful, but don't always tell the whole story. The assembly, at least, doesn't lie, however how hard it is to decipher 😂

src/base64.cr

BlobCodes · 2024-05-21T23:45:59Z

src/base64.cr

+      bytes_read = io.read(bytes)
+      break if bytes_read == 0
+
+      to_base64(bytes[0, bytes_read], CHARS_STD, pad: bytes_read < bytes.bytesize) do |byte|


The parameter pad only does something if the amount of bytes isn't divisible by 3.
Since that is impossible if bytes_read == bytes.bytesize, the check can be replaced by a constant true.

Suggested change

to_base64(bytes[0, bytes_read], CHARS_STD, pad: bytes_read < bytes.bytesize) do |byte|

to_base64(bytes[0, bytes_read], CHARS_STD, pad: true) do |byte|

Actually, thinking more about this, my comment only holds if the IO always fills the complete buffer if possible.

IO#read only specifies that it "Reads at most slice.size bytes".
If I create an IO which only fills the buffer with one byte at a time, the current code doesn't work at all.

NOTE: Maybe a new method IO#read_greedy should be created in the stdlib which accommodates for this use-case?

class BadButValidIO < IO property size : Int32 = 10 property pos : Int32 = 0 def read(slice : Bytes) return 0 if @size <= @pos slice[0] = 80 @pos += 1 1 end def write(slice : Bytes) : Nil raise NotImplementedError.new end end

My first implementation had true in there and it added padding internally. That's the reason I added the inequality expression.

Can you provide any failing tests for cases you've found where this doesn't work?

My first implementation had true in there and it added padding internally

Then disabling the explicit padding only hides that the code doesn't work.

Here's an example:

class BadButValidIO < IO property size : Int32 = 10 property pos : Int32 = 0 def read(slice : Bytes) return 0 if @size <= @pos slice[0] = 80 @pos += 1 1 end def write(slice : Bytes) : Nil raise NotImplementedError.new end end puts "Input:" puts BadButValidIO.new.gets_to_end encoded = IO::Memory.new Base64.encode(BadButValidIO.new, encoded) encoded.rewind puts "Encoded:" puts encoded puts "Decoded:" puts Base64.decode_string(encoded.gets_to_end)

If you turn the BadButValidIO into a string before encoding it, everything works correctly.
With this PR, the IO is encoded to "UA==UA==UA==UA==UA==UA==UA==UA==UA==UA==".

Then disabling the explicit padding only hides that the code doesn't work.

You mean it doesn't work for this one very specific and pathological case? Saying it doesn't work, unqualified, is incorrect. If it were true, it wouldn't have passed the specs I included.

IO#read is only guaranteed to read at least 1 byte. You need to account for that and cannot assume bytes is filled.
The specs work because they use IO::Memory which has all data available in memory and thus can always fill the read buffer to the brim. If you read from an IO that needs to wait for data to become available, it may give you much smaller chunks than bytes.bytesize (even a buffered IO).
io_spec.cr and other specs have special IO implementations that simulate this behaviour, like the BadButValidIO example.

I didn't object to handling that case. I was responding to an unqualified "the code doesn't work".

I keep forgetting this exists Co-authored-by: David Keller <davidkeller@tuta.io>

spec/std/base64_spec.cr

straight-shoota · 2024-05-22T08:26:00Z

spec/std/base64_spec.cr

+    encode_buffer = IO::Memory.new
+    decode_buffer = IO::Memory.new
+
+    before_each do
+      # Clear out each buffer before each spec
+      encode_buffer = IO::Memory.new
+      decode_buffer = IO::Memory.new
+    end


question: Why do you define these variables outside the examples?
This could be useful if you're planning to reuse the buffers (with IO::Memory#clear), but with fresh instances it would be easier to just push the initialization into the test block.

Why do you define these variables outside the examples?

It was how I wrote the specs in my app because, when I wrote it, I just wanted to make it work with sufficient confidence. It was a pure copy/paste from my app into this PR. I was trying to push the PR in between a bunch of other things during a very busy evening. I didn't have time to polish it up.

straight-shoota · 2024-05-22T08:32:00Z

src/base64.cr

+    inc = 0
+    # Base64 operates in 3-byte segments, so we use a buffer size that is a
+    # multiple of 3.
+    buffer = StaticArray(UInt8, 8193).new { 0u8 }


Let's talk about the reasons for zeroing out. Why do you think it's necessary in the first place?
We know how much data is written to the buffer and thus can scale the slice that points into it accordingly. So it shouldn't matter whether the parts that are not written to hold zeros or garbage.

straight-shoota · 2024-05-22T08:41:22Z

src/base64.cr

+      bytes_read = io.read(bytes)
+      break if bytes_read == 0
+
+      to_base64(bytes[0, bytes_read], CHARS_STD, pad: bytes_read < bytes.bytesize) do |byte|


IO#read is only guaranteed to read at least 1 byte. You need to account for that and cannot assume bytes is filled.
The specs work because they use IO::Memory which has all data available in memory and thus can always fill the read buffer to the brim. If you read from an IO that needs to wait for data to become available, it may give you much smaller chunks than bytes.bytesize (even a buffered IO).
io_spec.cr and other specs have special IO implementations that simulate this behaviour, like the BadButValidIO example.

jgaskins · 2024-06-01T02:22:25Z

Closing in favor of #14611. It's a more accurate and delightfully faster implementation. This PR was based on code I cranked out very quickly to handle a very specific scenario in one of my apps.

Allow passing an IO as the Base64.encode source

752fab7

Blacksmoke16 added kind:feature topic:stdlib:text labels May 21, 2024

BlobCodes reviewed May 21, 2024

View reviewed changes

BlobCodes mentioned this pull request May 22, 2024

Add methods for filling a buffer from an IO greedily/lazily #14605

Open

Replace explicit Bytes.new with .to_slice

f9ffa35

I keep forgetting this exists Co-authored-by: David Keller <davidkeller@tuta.io>

straight-shoota reviewed May 22, 2024

View reviewed changes

BlobCodes mentioned this pull request May 23, 2024

Refactor base64 encoding #14611

Open

jgaskins closed this Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow passing an `IO` as the Base64.encode source #14604

Allow passing an `IO` as the Base64.encode source #14604

jgaskins commented May 21, 2024

BlobCodes May 21, 2024

jgaskins May 22, 2024

BlobCodes May 22, 2024 •

edited

Loading

jgaskins May 22, 2024

straight-shoota May 22, 2024

straight-shoota May 22, 2024

jgaskins May 22, 2024

Sija May 22, 2024 •

edited

Loading

straight-shoota May 22, 2024 •

edited

Loading

ysbaddaden May 23, 2024

BlobCodes May 21, 2024

BlobCodes May 22, 2024 •

edited

Loading

jgaskins May 22, 2024

BlobCodes May 22, 2024 •

edited

Loading

jgaskins May 22, 2024

straight-shoota May 22, 2024

jgaskins May 22, 2024

straight-shoota May 22, 2024

jgaskins May 22, 2024

straight-shoota May 22, 2024

straight-shoota May 22, 2024

jgaskins commented Jun 1, 2024

	buffer = StaticArray(UInt8, 8193).new { 0u8 }
	buffer = uninitialized UInt8[8193]

	to_base64(bytes[0, bytes_read], CHARS_STD, pad: bytes_read < bytes.bytesize) do \|byte\|
	to_base64(bytes[0, bytes_read], CHARS_STD, pad: true) do \|byte\|

Allow passing an IO as the Base64.encode source #14604

Allow passing an IO as the Base64.encode source #14604

Conversation

jgaskins commented May 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlobCodes May 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sija May 22, 2024 • edited Loading

Choose a reason for hiding this comment

straight-shoota May 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlobCodes May 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlobCodes May 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgaskins commented Jun 1, 2024

Allow passing an `IO` as the Base64.encode source #14604

Allow passing an `IO` as the Base64.encode source #14604

BlobCodes May 22, 2024 •

edited

Loading

Sija May 22, 2024 •

edited

Loading

straight-shoota May 22, 2024 •

edited

Loading

BlobCodes May 22, 2024 •

edited

Loading

BlobCodes May 22, 2024 •

edited

Loading