Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing an IO as the Base64.encode source #14604

Closed
wants to merge 2 commits into from

Conversation

jgaskins
Copy link
Contributor

Fixes #14603

I haven't added decoding yet because I didn't have the code for that already written. This is a decent proof of concept, generating 100MB of binary data, writing it to a file on disk, then base64-encoding it as JSON to another file. It pauses before exiting so I could check how much RAM it's using and it only uses 2.1MB of RAM for all of this.

require "base64"
require "json"

# # Generate the giant binary file data
puts "Generating 100MB of binary data..."
File.open "foo.data", "w" do |f|
  (100 << 20).times do
    f.write_byte rand(UInt8)
  end
end

pp upload = FileUpload.new("output", Path["foo.data"])

puts "Writing JSON..."
File.open "output", "w" do |file|
  upload.to_json file
end

puts "Done."
# Pause before exiting to be able to check the amount of RAM used.
gets

struct FileUpload
  include JSON::Serializable

  getter filename : String
  @[JSON::Field(key: "data", converter: FileUpload::Base64Converter)]
  getter path : Path

  def initialize(@filename, @path)
  end

  module Base64Converter
    extend self

    # Output the file contents as base64 directly to the `JSON::Builder`'s `IO`
    def to_json(path : Path, json : JSON::Builder)
      File.open path do |file|
        json.string do |io|
          Base64.encode file, io
        end
      end
    end
  end
end

inc = 0
# Base64 operates in 3-byte segments, so we use a buffer size that is a
# multiple of 3.
buffer = StaticArray(UInt8, 8193).new { 0u8 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeroing out 8193 bytes is way too expensive for small inputs.

Suggested change
buffer = StaticArray(UInt8, 8193).new { 0u8 }
buffer = uninitialized UInt8[8193]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on "too expensive"? How expensive is it?

Copy link
Contributor

@BlobCodes BlobCodes May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crystal eval '
require "benchmark"

io = IO::Memory.new(10000)

Benchmark.ips do |x|
  x.report("zero") { buffer = StaticArray(UInt8, 8193).new { 0u8 }; io.read(buffer.to_slice); io.rewind; nil }                                                                                                 
  x.report("nozero") { buffer = uninitialized UInt8[8193]; io.read(buffer.to_slice); io.rewind; nil }
end
' --release
  zero  14.52M ( 68.87ns) (± 0.09%)  0.0B/op  13.13× slower
nozero 190.61M (  5.25ns) (± 0.22%)  0.0B/op        fastest

Of course this is only synthetic, but encoding a few characters should take around 10-20ns on a modern cpu - so this will unnecessarily increase compute time for zero gain.

EDIT: Btw notice that the io is only there so LLVM doesn't optimize the buffers away. No data is copied at all in the benchmark above. (that would change the results to 140.48ns and 71.99ns, respectively).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm not disputing that it's faster not to zero it out (all else being equal, it is doing a subset of the work), calling an operation measured in tens of nanoseconds "way too expensive" is simply incorrect.

I benchmarked the end-to-end process of encoding data from a file to another file, using a very small input like you mentioned. No matter how many times I run it, or which order they run in, the performance is indistinguishable because opening the files is 3 orders of magnitude more expensive than zeroing out 8KB of RAM — this would be even more pronounced on cloud servers that typically have slower hard drives than my MacBook Pro.

➜  Code crystal run --release bench_b64_encode_io.cr
nozero  20.42k ( 48.96µs) (±10.30%)  65.2kB/op   1.00× slower
  zero  20.52k ( 48.74µs) (± 8.75%)  65.2kB/op        fastest
➜  Code crystal run --release bench_b64_encode_io.cr
nozero  20.31k ( 49.23µs) (± 8.96%)  65.2kB/op   1.03× slower
  zero  20.84k ( 47.99µs) (±10.20%)  65.2kB/op        fastest
➜  Code crystal run --release bench_b64_encode_io.cr
nozero  20.59k ( 48.56µs) (± 9.61%)  65.2kB/op        fastest
  zero  20.27k ( 49.33µs) (±11.81%)  65.2kB/op   1.02× slower
Benchmark code
require "benchmark"

input = "data.bin"
output = "data.out"
File.open input, "w" do |file|
  5.times { file << "." }
end

Benchmark.ips do |x|
  x.report("nozero") do
    File.open input do |source|
      File.open output, "w" do |destination|
        Base64.encode source.rewind, destination.rewind, zero: false
      end
    end
  end
  x.report("zero") do
    File.open input do |source|
      File.open output, "w" do |destination|
        Base64.encode source.rewind, destination.rewind, zero: true
      end
    end
  end
end

module Base64
  def encode(data : IO, io : IO, zero : Bool)
    count = 0
    encode_with_new_lines(data, zero: zero) do |byte|
      io << byte.unsafe_chr
      count += 1
    end
  end

  private def encode_with_new_lines(io : IO, zero : Bool, &)
    inc = 0
    # Base64 operates in 3-byte segments, so we use a buffer size that is a
    # multiple of 3.
    if zero
      buffer = StaticArray(UInt8, 8193).new { 0u8 }
    else
      buffer = uninitialized UInt8[8193]
    end
    bytes = buffer.to_slice

    loop do
      bytes_read = io.read(bytes)
      break if bytes_read == 0

      to_base64(bytes[0, bytes_read], CHARS_STD, pad: bytes_read < bytes.bytesize) do |byte|
        yield byte
        inc += 1
        if inc >= LINE_SIZE
          yield NL
          inc = 0
        end
      end
    end
    if inc > 0
      yield NL
    end
  end
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about the reasons for zeroing out. Why do you think it's necessary in the first place?
We know how much data is written to the buffer and thus can scale the slice that points into it accordingly. So it shouldn't matter whether the parts that are not written to hold zeros or garbage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think I think it's necessary?

Because you do it. And I understand your comments in this thread as supportive of using the zeroing constructor instead of an uninitialised StaticArray as suggested in #14604 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you do it.

StaticArray(Type, Size).new doesn't work, so I passed the block because I know that works. That's really all there was to it.

 77 | buffer = StaticArray(UInt8, 8193).new
                                        ^--
Error: private method 'new' called for StaticArray(T, N).class

I don't use StaticArrays frequently enough to remember all the details. I use them about 1-2x a year and the only thing I ever remember about them is that they should never leave the stack frame they were instantiated in. 😄

I understand your comments in this thread as supportive of using the zeroing constructor instead of an uninitialised StaticArray

There is a very important distinction between rejecting invalid criticism and defending minutiae in the code. I was doing the former and avoiding the latter.

Copy link
Contributor

@Sija Sija May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgaskins

StaticArray(Type, Size).new doesn't work, so I passed the block because I know that works [...]

StaticArray(UInt8, 8193).new(0_u8) should work as well.

Copy link
Member

@straight-shoota straight-shoota May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. 👍

It's usually best to use uninitialized StaticArray(...) when you don't need any specific initialized data. That the case here.
So the original suggestion was good.
The argument about expensiveness might have some subjectivity, but I think we can agree that zeroing is completely unnecessary here. Not doing it has the potential of improving performance. However significant that may be in the greater scheme, it's the right thing to avoid unnecessary extras when possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissing actual performance numbers is an odd choice, though.

Benchmarks lied to me far too much, like swap the x.report around and the first one is always faster 🤷 Benchmarks are useful, but don't always tell the whole story. The assembly, at least, doesn't lie, however how hard it is to decipher 😂

src/base64.cr Outdated Show resolved Hide resolved
bytes_read = io.read(bytes)
break if bytes_read == 0

to_base64(bytes[0, bytes_read], CHARS_STD, pad: bytes_read < bytes.bytesize) do |byte|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter pad only does something if the amount of bytes isn't divisible by 3.
Since that is impossible if bytes_read == bytes.bytesize, the check can be replaced by a constant true.

Suggested change
to_base64(bytes[0, bytes_read], CHARS_STD, pad: bytes_read < bytes.bytesize) do |byte|
to_base64(bytes[0, bytes_read], CHARS_STD, pad: true) do |byte|

Copy link
Contributor

@BlobCodes BlobCodes May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking more about this, my comment only holds if the IO always fills the complete buffer if possible.

IO#read only specifies that it "Reads at most slice.size bytes".
If I create an IO which only fills the buffer with one byte at a time, the current code doesn't work at all.

NOTE: Maybe a new method IO#read_greedy should be created in the stdlib which accommodates for this use-case?

class BadButValidIO < IO
  property size : Int32 = 10
  property pos : Int32 = 0

  def read(slice : Bytes)
    return 0 if @size <= @pos
    slice[0] = 80
    @pos += 1
    1
  end

  def write(slice : Bytes) : Nil
    raise NotImplementedError.new
  end
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first implementation had true in there and it added padding internally. That's the reason I added the inequality expression.

Can you provide any failing tests for cases you've found where this doesn't work?

Copy link
Contributor

@BlobCodes BlobCodes May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first implementation had true in there and it added padding internally

Then disabling the explicit padding only hides that the code doesn't work.

Here's an example:

class BadButValidIO < IO
  property size : Int32 = 10
  property pos : Int32 = 0

  def read(slice : Bytes)
    return 0 if @size <= @pos
    slice[0] = 80
    @pos += 1
    1
  end

  def write(slice : Bytes) : Nil
    raise NotImplementedError.new
  end
end

puts "Input:"
puts BadButValidIO.new.gets_to_end

encoded = IO::Memory.new
Base64.encode(BadButValidIO.new, encoded)
encoded.rewind

puts "Encoded:"
puts encoded

puts "Decoded:"
puts Base64.decode_string(encoded.gets_to_end)

If you turn the BadButValidIO into a string before encoding it, everything works correctly.
With this PR, the IO is encoded to "UA==UA==UA==UA==UA==UA==UA==UA==UA==UA==".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then disabling the explicit padding only hides that the code doesn't work.

You mean it doesn't work for this one very specific and pathological case? Saying it doesn't work, unqualified, is incorrect. If it were true, it wouldn't have passed the specs I included.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IO#read is only guaranteed to read at least 1 byte. You need to account for that and cannot assume bytes is filled.
The specs work because they use IO::Memory which has all data available in memory and thus can always fill the read buffer to the brim. If you read from an IO that needs to wait for data to become available, it may give you much smaller chunks than bytes.bytesize (even a buffered IO).
io_spec.cr and other specs have special IO implementations that simulate this behaviour, like the BadButValidIO example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't object to handling that case. I was responding to an unqualified "the code doesn't work".

I keep forgetting this exists

Co-authored-by: David Keller <davidkeller@tuta.io>
spec/std/base64_spec.cr Show resolved Hide resolved
Comment on lines +65 to +72
encode_buffer = IO::Memory.new
decode_buffer = IO::Memory.new

before_each do
# Clear out each buffer before each spec
encode_buffer = IO::Memory.new
decode_buffer = IO::Memory.new
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why do you define these variables outside the examples?
This could be useful if you're planning to reuse the buffers (with IO::Memory#clear), but with fresh instances it would be easier to just push the initialization into the test block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you define these variables outside the examples?

It was how I wrote the specs in my app because, when I wrote it, I just wanted to make it work with sufficient confidence. It was a pure copy/paste from my app into this PR. I was trying to push the PR in between a bunch of other things during a very busy evening. I didn't have time to polish it up.

inc = 0
# Base64 operates in 3-byte segments, so we use a buffer size that is a
# multiple of 3.
buffer = StaticArray(UInt8, 8193).new { 0u8 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about the reasons for zeroing out. Why do you think it's necessary in the first place?
We know how much data is written to the buffer and thus can scale the slice that points into it accordingly. So it shouldn't matter whether the parts that are not written to hold zeros or garbage.

bytes_read = io.read(bytes)
break if bytes_read == 0

to_base64(bytes[0, bytes_read], CHARS_STD, pad: bytes_read < bytes.bytesize) do |byte|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IO#read is only guaranteed to read at least 1 byte. You need to account for that and cannot assume bytes is filled.
The specs work because they use IO::Memory which has all data available in memory and thus can always fill the read buffer to the brim. If you read from an IO that needs to wait for data to become available, it may give you much smaller chunks than bytes.bytesize (even a buffered IO).
io_spec.cr and other specs have special IO implementations that simulate this behaviour, like the BadButValidIO example.

@jgaskins
Copy link
Contributor Author

jgaskins commented Jun 1, 2024

Closing in favor of #14611. It's a more accurate and delightfully faster implementation. This PR was based on code I cranked out very quickly to handle a very specific scenario in one of my apps.

@jgaskins jgaskins closed this Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add API for Base64.encode / Base64.decode with an IO as the source
6 participants