Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized quoted-printable encoder #86

Merged
merged 1 commit into from Feb 2, 2019

Conversation

IceDragon200
Copy link
Contributor

@IceDragon200 IceDragon200 commented Feb 2, 2019

Hello, I've returned with another optimization!

If you've ever used the quoted-printable encoder for anything over 4000 bytes, you've probably ripped your hair out by now:

I learned the hard way when it nuked my ram.

I've significantly reduced the memory usage and increased throughput (up to a 1000 times)

Here is the benchmark used:

len = 0xFFF
bin =
  (len * 2)
  |> :crypto.strong_rand_bytes()
  |> Base.encode64()
  |> String.slice(0, len)

encoded = Mail.Encoders.QuotedPrintable.encode(bin)

Benchee.run(%{
  "OldQuotedPrintable.encode/1" => fn ->
    Mail.Encoders.LegacyQuotedPrintable.encode(bin)
  end,
  "NewQuotedPrintable.encode/1" => fn ->
    Mail.Encoders.QuotedPrintable.encode(bin)
  end,
}, time: 10, memory_time: 2)

Benchee.run(%{
  "OldQuotedPrintable.decode/1" => fn ->
    Mail.Encoders.LegacyQuotedPrintable.decode(encoded)
  end,
  "NewQuotedPrintable.decode/1" => fn ->
    Mail.Encoders.QuotedPrintable.decode(encoded)
  end,
}, time: 10, memory_time: 2)
Compiling 1 file (.ex)
Operating System: Linux"
CPU Information: Intel(R) Core(TM) i7-2710QE CPU @ 2.10GHz
Number of Available Cores: 8
Available memory: 15.58 GB
Elixir 1.8.0
Erlang 21.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 28 s


Benchmarking NewQuotedPrintable.encode/1...
Benchmarking OldQuotedPrintable.encode/1...

Name                                  ips        average  deviation         median         99th %
NewQuotedPrintable.encode/1        865.14      0.00116 s    ±19.58%      0.00105 s      0.00180 s
OldQuotedPrintable.encode/1          0.91         1.09 s     ±2.28%         1.09 s         1.16 s

Comparison: 
NewQuotedPrintable.encode/1        865.14
OldQuotedPrintable.encode/1          0.91 - 946.62x slower

Memory usage statistics:

Name                           Memory usage
NewQuotedPrintable.encode/1         0.52 MB
OldQuotedPrintable.encode/1       640.28 MB - 1236.38x memory usage

**All measurements for memory usage were the same**
Operating System: Linux"
CPU Information: Intel(R) Core(TM) i7-2710QE CPU @ 2.10GHz
Number of Available Cores: 8
Available memory: 15.58 GB
Elixir 1.8.0
Erlang 21.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 28 s


Benchmarking NewQuotedPrintable.decode/1...
Benchmarking OldQuotedPrintable.decode/1...

Name                                  ips        average  deviation         median         99th %
NewQuotedPrintable.decode/1        1.54 K      647.39 μs    ±11.03%         622 μs     1058.47 μs
OldQuotedPrintable.decode/1        1.21 K      828.94 μs    ±10.90%         804 μs        1429 μs

Comparison: 
NewQuotedPrintable.decode/1        1.54 K
OldQuotedPrintable.decode/1        1.21 K - 1.28x slower

Memory usage statistics:

Name                           Memory usage
NewQuotedPrintable.decode/1       363.22 KB
OldQuotedPrintable.decode/1       711.98 KB - 1.96x memory usage

**All measurements for memory usage were the same**

Granted I couldn't run the old encoder with anything over 4096 bytes and expected a result back in a timely manner.

So consider anything over that limit be encoded soon™ (on the old encoder)

Tests

Didn't need to change anything on that side, this is a drop-in replacement for the old module.

Changes proposed in this pull request

A more efficient and faster quoted-printable.

Notes

I've removed all the private functions and instead relied on TCO (Tail-Call-Optimization) by using the same function name.

Some of the code is duplicated in some of the overloads, this was to avoid having extra utility methods.
I'm trying to keep the stack as clean as possible to avoid extra memory allocations.

One may notice I use byte_size/1 instead of String.length/1, this is due to how they work:

byte_size reports the actual number of bytes in the string, since the encoder needs to work in ASCII (8-bit), it makes sense to count the remaining bytes rather than the remaining characters (String.length has to count each character manually causing a massive slow down when it has to count after every-single-character), this becomes very apparent when dealing with UTF-8 characters that span multiple bytes and need to be encoded to their escape sequences in the end

Fixed Bottlenecks:

* Stack thrashing
* String.length for ASCII binaries
* Excessive concatenations
@bcardarella
Copy link
Member

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants