Skip to content

possible performance issue: mbedtls_gcm_update CPU utilization #254

@vustef

Description

@vustef

I've written a simple example to test the performance of MbedTLS. It's unoptimized and probably incorrect in some aspects, but I hope it shows the issue that I'm facing when using MbedTLS through HTTP.jl.

using MbedTLS
using Sockets
using Base.Threads

function tls_test(num_iters, concurrency)
    entropy = MbedTLS.Entropy()
    rng = MbedTLS.CtrDrbg()
    MbedTLS.seed!(rng, entropy)

    size = 1024*1024
    buffer = Array{UInt8}(undef, size)
    p = Ptr{UInt8}(ccall(:jl_value_ptr, Ptr{UInt8}, (Any,), buffer))

    sem = Base.Semaphore(concurrency)
    @sync begin
        for i in 1:num_iters
            @spawn begin
                Base.acquire(sem)
                sock = connect("httpbin.org", 443)

                ctx = MbedTLS.SSLContext()
                conf = MbedTLS.SSLConfig()

                MbedTLS.config_defaults!(conf)
                MbedTLS.authmode!(conf, MbedTLS.MBEDTLS_SSL_VERIFY_REQUIRED)
                MbedTLS.rng!(conf, rng)

                function show_debug(level, filename, number, msg)
                    @show level, filename, number, msg
                end

                MbedTLS.dbg!(conf, show_debug)

                MbedTLS.ca_chain!(conf)

                MbedTLS.setup!(ctx, conf)
                MbedTLS.set_bio!(ctx, sock)

                MbedTLS.handshake(ctx)

                Base.unsafe_write(ctx, p, size)
                close(sock)
                Base.release(sem)
            end
        end
    end
end

tls_test(4096, 512)

On machine with 8 cores and 1.5GB/s NIC throughput, this achieves a bit less than 200 MB/s. CPU is 100%, and it takes ~22s.
mbedtls_gcm_update takes 40%, which means that CPU time spent in that function is ~70s (accounting for 8 cores). My assumption is that this function doesn't do network communication nor invokes it, but does pure processing.

So throughput of mbedtls_gcm_update is effectively ~58 MB/s per core on this machine.
This means that while machine has 1.5GB/s throughput, mbedtls_gcm_update is taking time, allowing only for around ~464MB/s for 8 cores in ideal conditions (no other CPU usage in the callstack), and would require more than 24 cores to utilise full NIC.

For comparison, similar (with a bit higher level of abstraction) test with HTTP put requests in Go, on the same machine, can achieve ~1.5GB/s, hitting NIC's throughput as a bottleneck.

Are there any ideas for how mbedtls_gcm_update could be optimized? Is this something worth submitting as an issue in https://github.com/Mbed-TLS/mbedtls ? I am not sure if this is also what happens if it's used directly, without Julia wrapper though.

PProf profile file:
prof_ssl1.pb.gz

Here's a screenshoot of profile file opened using PProf:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions