Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve base64 decoding performance #13574

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

BlobCodes
Copy link
Contributor

This pull request improves base64 decoding performance, especially when writing to IOs.
Additionally, more ways to decode base64 were added (IO to IO, Bytes to user-allocated Bytes).

For the benchmarks, I moved the new implementation into src/base64.cr but renamed it to Base64P.

Benchmark Code 1 (Comparison old vs new)
require "benchmark"

test_cases = {
  "8 chars" => "aGVsbG8=",
  "76 chars" => "YSB2ZXJ5IGxv" * 6 + "ab==",
  "6000 chars" => "YSB2ZXJ5IGxvbmcgc3RyaW5n" * 250,
  "1 MB" => "YSB2ZXJ5IGxvbmcgc3RyaW5n" * 41667
}

require "base64"
require "./src/base64"

io = IO::Memory.new(capacity: 2**21)

test_cases.each do |name, data|
  puts "\n#{name} (into IO)"
  Benchmark.ips do |x|
    x.report("old") { Base64.decode(data, io); io.rewind }
    x.report("new") { Base64P.decode(data, io); io.rewind }
  end
end

test_cases.each do |name, data|
  puts "\n#{name} (into Bytes)"
  Benchmark.ips do |x|
    x.report("old") { Base64.decode(data) }
    x.report("new") { Base64P.decode(data) }
  end
end
Benchmark Results 1 (Comparison old vs new)
8 chars (into IO)
old  50.46M ( 19.82ns) (± 0.93%)  0.0B/op        fastest
new  44.10M ( 22.68ns) (± 0.59%)  0.0B/op   1.14× slower

76 chars (into IO)
old   6.10M (163.88ns) (± 0.28%)  0.0B/op   3.01× slower
new  18.39M ( 54.38ns) (± 1.37%)  0.0B/op        fastest

6000 chars (into IO)
old  79.31k ( 12.61µs) (± 0.34%)  0.0B/op   4.40× slower
new 349.19k (  2.86µs) (± 0.16%)  0.0B/op        fastest

1 MB (into IO)
old 473.30  (  2.11ms) (± 0.36%)  0.0B/op   4.33× slower
new   2.05k (488.48µs) (± 0.81%)  0.0B/op        fastest

8 chars (into Bytes)
old  62.82M ( 15.92ns) (± 1.22%)  16.0B/op        fastest
new  41.69M ( 23.99ns) (± 1.70%)  16.0B/op   1.51× slower

76 chars (into Bytes)
old  18.03M ( 55.45ns) (± 0.58%)  64.0B/op        fastest
new  16.73M ( 59.76ns) (± 0.51%)  64.0B/op   1.08× slower

6000 chars (into Bytes)
old 295.23k (  3.39µs) (± 0.87%)  4.4kB/op   1.05× slower
new 310.40k (  3.22µs) (± 0.85%)  4.4kB/op        fastest

1 MB (into Bytes)
old   1.86k (538.21µs) (± 0.23%)  732kB/op   1.06× slower
new   1.97k (508.14µs) (± 0.42%)  732kB/op        fastest
Benchmark 2 (Throughput)

On my PC, I get 1.76GB/s of synthetic throughput.

Code:

require "./src/base64"

class IO::Spam < IO                                                                                                  
  @byte : UInt8
  getter pos : UInt64 = 0

  def initialize(@byte : UInt8)
  end

  def write(slice : Bytes) : Nil
  end

  def read(slice : Bytes) : Int32
    slice.fill(@byte)
    @pos += slice.size
    slice.size
  end
end

class IO::Void < IO
  def write(slice : Bytes) : Nil
  end

  def read(slice : Bytes) : Int32
    0
  end
end

INPUT = IO::Spam.new('f'.ord.to_u8)
OUTPUT = IO::Void.new

Thread.new do
  t1 = t2 = Time.monotonic
  last_pos = INPUT.pos

  while true
    t2 = Time.monotonic
    current_pos = INPUT.pos
    puts "#{((current_pos - last_pos) / (t2 - t1).total_seconds).humanize} bytes/s"
    t1 = t2; last_pos = current_pos
    sleep 0.5
  end
end

Base64P.decode(INPUT, OUTPUT)

As you can see from the first benchmark, the implementation itself is faster but the cost is higher than the previous algorithm. Still, that's in the nanosecons range so it doesn't matter all that much.
This contant time could also be reduced by changing the buffer size from IO::DEFAULT_BUFFER_SIZE to something lower.

But since base64 is most often used for things like data embedding or JWT, this should not be a huge problem.

@BlobCodes BlobCodes changed the title Performance/base64 buffering Improve base64 decoding performance Jun 17, 2023
Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some odd things in the code, such as unused assignments. See the line comments.
I've only looked a bit into the diff so far. It's better to iron this out before continuing the review.
Could you please give the implementation a polish to ensure better code quality?

I suppose ameba should point out some of the issues (but not all).

src/base64.cr Outdated Show resolved Hide resolved
src/base64.cr Outdated Show resolved Hide resolved
src/base64.cr Outdated Show resolved Hide resolved
src/base64.cr Show resolved Hide resolved
src/base64.cr Show resolved Hide resolved
Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long wait. This slipped a bit through the cracks. 🙇

I have one comment on a suspicious piece of code that I missed before.

Besides, I'm concerned about introducing completely new API methods (.decode(IO, IO) and .decode(Bytes, Bytes)). Such additions should be proposed in a dedicated issue, including a discussion of their use cases.
And their implementation will need spec coverage.

So, I'd ask to remove these extra methods from this PR and create a proposal issue.

Comment on lines +489 to +491
# Move the pointer by one byte until there is a valid base64 character
while in_pos < in_size && in_ptr.value.in?(10_u8, 13_u8)
in_pos &+= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd. Contrary to the comment, this loop does not move the pointer. It just increases in_pos which is simple integer counter. Should this include in_ptr += 1 perhaps?

@BlobCodes
Copy link
Contributor Author

Sorry for the long wait. This slipped a bit through the cracks. 🙇

No problem, it's only a performance PR anyways.

Besides, I'm concerned about introducing completely new API methods (.decode(IO, IO) and .decode(Bytes, Bytes)). Such additions should be proposed in a dedicated issue, including a discussion of their use cases.
And their implementation will need spec coverage.

So, I'd ask to remove these extra methods from this PR and create a proposal issue.

Yeah, I totally understand that!
I actually only started this PR because I wanted a way to decode from an IO to another IO, which is hard using the current API.
The .decode(Bytes, Bytes) method is used by other methods (like .decode(String)), so I can only make it private - not remove it outright.

@straight-shoota
Copy link
Member

Yeah, adding a :nodoc: for now sounds good. I'm pretty much certain we'll want to have this method in the public API as it seems to be generally useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants