Improve base64 decoding performance #13574

BlobCodes · 2023-06-17T18:39:55Z

This pull request improves base64 decoding performance, especially when writing to IOs.
Additionally, more ways to decode base64 were added (IO to IO, Bytes to user-allocated Bytes).

For the benchmarks, I moved the new implementation into src/base64.cr but renamed it to Base64P.

Benchmark Code 1 (Comparison old vs new)

require "benchmark"

test_cases = {
  "8 chars" => "aGVsbG8=",
  "76 chars" => "YSB2ZXJ5IGxv" * 6 + "ab==",
  "6000 chars" => "YSB2ZXJ5IGxvbmcgc3RyaW5n" * 250,
  "1 MB" => "YSB2ZXJ5IGxvbmcgc3RyaW5n" * 41667
}

require "base64"
require "./src/base64"

io = IO::Memory.new(capacity: 2**21)

test_cases.each do |name, data|
  puts "\n#{name} (into IO)"
  Benchmark.ips do |x|
    x.report("old") { Base64.decode(data, io); io.rewind }
    x.report("new") { Base64P.decode(data, io); io.rewind }
  end
end

test_cases.each do |name, data|
  puts "\n#{name} (into Bytes)"
  Benchmark.ips do |x|
    x.report("old") { Base64.decode(data) }
    x.report("new") { Base64P.decode(data) }
  end
end

Benchmark Results 1 (Comparison old vs new)

8 chars (into IO)
old  50.46M ( 19.82ns) (± 0.93%)  0.0B/op        fastest
new  44.10M ( 22.68ns) (± 0.59%)  0.0B/op   1.14× slower

76 chars (into IO)
old   6.10M (163.88ns) (± 0.28%)  0.0B/op   3.01× slower
new  18.39M ( 54.38ns) (± 1.37%)  0.0B/op        fastest

6000 chars (into IO)
old  79.31k ( 12.61µs) (± 0.34%)  0.0B/op   4.40× slower
new 349.19k (  2.86µs) (± 0.16%)  0.0B/op        fastest

1 MB (into IO)
old 473.30  (  2.11ms) (± 0.36%)  0.0B/op   4.33× slower
new   2.05k (488.48µs) (± 0.81%)  0.0B/op        fastest

8 chars (into Bytes)
old  62.82M ( 15.92ns) (± 1.22%)  16.0B/op        fastest
new  41.69M ( 23.99ns) (± 1.70%)  16.0B/op   1.51× slower

76 chars (into Bytes)
old  18.03M ( 55.45ns) (± 0.58%)  64.0B/op        fastest
new  16.73M ( 59.76ns) (± 0.51%)  64.0B/op   1.08× slower

6000 chars (into Bytes)
old 295.23k (  3.39µs) (± 0.87%)  4.4kB/op   1.05× slower
new 310.40k (  3.22µs) (± 0.85%)  4.4kB/op        fastest

1 MB (into Bytes)
old   1.86k (538.21µs) (± 0.23%)  732kB/op   1.06× slower
new   1.97k (508.14µs) (± 0.42%)  732kB/op        fastest

Benchmark 2 (Throughput)

On my PC, I get 1.76GB/s of synthetic throughput.

Code:

require "./src/base64"

class IO::Spam < IO                                                                                                  
  @byte : UInt8
  getter pos : UInt64 = 0

  def initialize(@byte : UInt8)
  end

  def write(slice : Bytes) : Nil
  end

  def read(slice : Bytes) : Int32
    slice.fill(@byte)
    @pos += slice.size
    slice.size
  end
end

class IO::Void < IO
  def write(slice : Bytes) : Nil
  end

  def read(slice : Bytes) : Int32
    0
  end
end

INPUT = IO::Spam.new('f'.ord.to_u8)
OUTPUT = IO::Void.new

Thread.new do
  t1 = t2 = Time.monotonic
  last_pos = INPUT.pos

  while true
    t2 = Time.monotonic
    current_pos = INPUT.pos
    puts "#{((current_pos - last_pos) / (t2 - t1).total_seconds).humanize} bytes/s"
    t1 = t2; last_pos = current_pos
    sleep 0.5
  end
end

Base64P.decode(INPUT, OUTPUT)

As you can see from the first benchmark, the implementation itself is faster but the cost is higher than the previous algorithm. Still, that's in the nanosecons range so it doesn't matter all that much.
This contant time could also be reduced by changing the buffer size from IO::DEFAULT_BUFFER_SIZE to something lower.

But since base64 is most often used for things like data embedding or JWT, this should not be a huge problem.

straight-shoota

I found some odd things in the code, such as unused assignments. See the line comments.
I've only looked a bit into the diff so far. It's better to iron this out before continuing the review.
Could you please give the implementation a polish to ensure better code quality?

I suppose ameba should point out some of the issues (but not all).

src/base64.cr

straight-shoota

Sorry for the long wait. This slipped a bit through the cracks. 🙇

I have one comment on a suspicious piece of code that I missed before.

Besides, I'm concerned about introducing completely new API methods (.decode(IO, IO) and .decode(Bytes, Bytes)). Such additions should be proposed in a dedicated issue, including a discussion of their use cases.
And their implementation will need spec coverage.

So, I'd ask to remove these extra methods from this PR and create a proposal issue.

straight-shoota · 2023-09-26T21:23:02Z

src/base64.cr

+    # Move the pointer by one byte until there is a valid base64 character
+    while in_pos < in_size && in_ptr.value.in?(10_u8, 13_u8)
+      in_pos &+= 1


This seems odd. Contrary to the comment, this loop does not move the pointer. It just increases in_pos which is simple integer counter. Should this include in_ptr += 1 perhaps?

BlobCodes · 2023-09-27T00:00:40Z

Sorry for the long wait. This slipped a bit through the cracks. 🙇

No problem, it's only a performance PR anyways.

Besides, I'm concerned about introducing completely new API methods (.decode(IO, IO) and .decode(Bytes, Bytes)). Such additions should be proposed in a dedicated issue, including a discussion of their use cases.
And their implementation will need spec coverage.

So, I'd ask to remove these extra methods from this PR and create a proposal issue.

Yeah, I totally understand that!
I actually only started this PR because I wanted a way to decode from an IO to another IO, which is hard using the current API.
The .decode(Bytes, Bytes) method is used by other methods (like .decode(String)), so I can only make it private - not remove it outright.

straight-shoota · 2023-09-27T08:31:22Z

Yeah, adding a :nodoc: for now sounds good. I'm pretty much certain we'll want to have this method in the public API as it seems to be generally useful.

BlobCodes added 6 commits June 17, 2023 16:46

Update DECODE_TABLE for optimizing later

1899df2

Add improved base64 decoding methods

4ae38e5

Remove floating-point-arithmetic from decode_size

b674f3f

Use the new internal decoding methods

9395745

Remove old base64 decoder

6f9f8bc

Add Base64#decode(IO, IO)

1dd2a0c

BlobCodes changed the title ~~Performance/base64 buffering~~ Improve base64 decoding performance Jun 17, 2023

Replace LibC types because windows has no SSizeT

e8ef531

Blacksmoke16 added performance kind:refactor topic:stdlib:text labels Jun 17, 2023

straight-shoota requested changes Jun 20, 2023

View reviewed changes

src/base64.cr Outdated Show resolved Hide resolved

src/base64.cr Outdated Show resolved Hide resolved

src/base64.cr Outdated Show resolved Hide resolved

src/base64.cr Show resolved Hide resolved

Various improvements to Base64 decoding methods

88d6c9c

Sija reviewed Jun 20, 2023

View reviewed changes

src/base64.cr Show resolved Hide resolved

straight-shoota requested changes Sep 26, 2023

View reviewed changes

BlobCodes mentioned this pull request May 21, 2024

Add API for Base64.encode / Base64.decode with an IO as the source #14603

Open

straight-shoota mentioned this pull request May 23, 2024

Refactor base64 encoding #14611

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve base64 decoding performance #13574

Improve base64 decoding performance #13574

BlobCodes commented Jun 17, 2023

straight-shoota left a comment •

edited

Loading

straight-shoota left a comment

straight-shoota Sep 26, 2023

BlobCodes commented Sep 27, 2023

straight-shoota commented Sep 27, 2023

Improve base64 decoding performance #13574

Are you sure you want to change the base?

Improve base64 decoding performance #13574

Conversation

BlobCodes commented Jun 17, 2023

straight-shoota left a comment • edited Loading

Choose a reason for hiding this comment

straight-shoota left a comment

Choose a reason for hiding this comment

straight-shoota Sep 26, 2023

Choose a reason for hiding this comment

BlobCodes commented Sep 27, 2023

straight-shoota commented Sep 27, 2023

straight-shoota left a comment •

edited

Loading