Good throughput, but very high latency #48

dfdx · 2015-02-22T01:07:45Z

I've made several performance tests to compare HttpServer.jl with other frameworks, and here are some results:

Flask (Python), 1 thread

latency: 1 ms
throughput: 1200 rps

Flask (Python), 1000 threads

latency: 102 ms
throughput: 1200 rps

Spray (Scala), 1 thread

latency: < 1 ms
throughput: 5500 rps

Spray (Scala), 1000 threads

latency: 12 ms
throughput: 18700 rps

HttpServer.jl, 1 thread

latency: 39 ms
throughput: 24 rps

HttpServer.jl, 1000 threads

latency: 160 ms
throughput: 6200

All tests done with Gatling on my laptop with Intel Core i7 (2 physical / 4 virtual cores) using Julia v0.3.5. Number of threads refers to a number of simultaneous requests on a client side, latency and throughput - to a 50th percentile of corresponding values.

Results for Flask are pretty predictable: Python's GIL doesn't allow to use full power of multithreading, showing almost no difference between 1 and 1000 simultaneous requests from clients. Spray is also no surprise: being based on actor model (via Akka framework), it utilizes both - concurrency and asynchronous IO to gain topnotch results.

I also like how HttpServer.jl handles multiple simultaneous requests (probably thanks to libuv, since as far as I know Julia's @async macro, used to handle separate users, still uses only single system thread). But I'm really surprised and dissapointed by its latencies: compared to average of 1ms for other servers 39ms for HttpServer.jl sounds totally unreasonable.

Is it a known issue? Is it solvable within current architecture? Or HttpServer.jl just isn't meant to provide low latency?

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2015-02-22T01:13:46Z

Given that literally no effort has been put into performance and Flask and Spray are both considered mature, high-performance web servers, this is a pretty good starting point. I've seen much larger performance gaps than 50x fall away quite quickly with a little targeted optimization work. Seems like time to start running @profile on some HttpServer code.

dfdx · 2015-02-22T15:20:57Z

In my little benchmark server spends most of the time (~90%) on the line:

ccall(:jl_run_once,Int32,(Ptr{Void},),loop)

thus boiling down to libuv itself. I'm not very experienced in using this library, are there any performance tests or previous work for optimizing it in Julia?

dfdx · 2015-03-02T00:07:05Z

A little update. Here's portion of Profile.print() output where most of the time is spent:

  5914 ...ver/src/HttpServer.jl; process_client; line: 266
   5914 stream.jl; readavailable; line: 709
    5914 stream.jl; wait_readnb; line: 316
     5914 stream.jl; stream_wait; line: 263
      5914 ./task.jl; wait; line: 194
       5914 ./task.jl; wait; line: 273
        5914 ./stream.jl; process_events; line: 537
         1 ./inference.jl; typeinf_ext; line: 1216
          1 ./inference.jl; typeinf; line: 1544
           1 ./inference.jl; inlining_pass; line: 2553
            1 ./inference.jl; inlining_pass; line: 2590
             1 ./inference.jl; inlining_pass; line: 2656
              1 ./inference.jl; inlineable; line: 2300
  1    ...ver/src/HttpServer.jl; process_client; line: 267

So it starts from readavailable(client.sock) and goes down to process_events(true). However, I tried to measure runtime of readavailable() by itself like this (simplest possible server-client communication):

server = listen(2000)
clientsock = connect(2000)
serversock = accept(server)
@time for i=1:1_000_000
    println(clientsock, "hello")
    readavailable(serversock)
end

and it turns to perform very well, taking only 25 microseconds per loop (compare it to 37 milliseconds for full HttpServer's processing loop).

Thus, I believe issue is not in IO itself, but instead in environment. My guess is that slowdown comes from switching between tasks during read/write operations. I'm going to (slowly) dive further into the code, but if somebody has any pointers, it may speed up things a lot.

dfdx · 2015-03-17T23:27:42Z

It's totally crazy, but the reason for such high latencies in tests is... Java. It turns out that Gatling (Scala), JMeter (Java) and pure Scala console all result in very stable 39ms per request on my machine. At the same time, with even very naive test in Python (using requests library) I was able to get 1.3ms, and with Julia (Requests.jl) it was even lower - about 0.5ms per request.

It's a curious case and it still requires some investigation, but seems like it does nothing with bad performance, so I'm closing this issue as irrelevant.

IainNZ · 2015-03-17T23:46:24Z

Just so I understand, are you saying that the latencies you record depend on which language you test them from?

dfdx · 2015-03-18T07:15:21Z

I believe it depends on HTTP library implementation, just most JVM-based languages use the same library under the hood. Probably, #40 is also related to this issue: both - JVM and AB - spend much more time on request than needed, but JVM "cuts" request after 39ms, while AB waits for maximal timeout and fails.

dfdx closed this as completed Mar 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Good throughput, but very high latency #48

Good throughput, but very high latency #48

dfdx commented Feb 22, 2015

StefanKarpinski commented Feb 22, 2015

dfdx commented Feb 22, 2015

dfdx commented Mar 2, 2015

dfdx commented Mar 17, 2015

IainNZ commented Mar 17, 2015

dfdx commented Mar 18, 2015

Good throughput, but very high latency #48

Good throughput, but very high latency #48

Comments

dfdx commented Feb 22, 2015

StefanKarpinski commented Feb 22, 2015

dfdx commented Feb 22, 2015

dfdx commented Mar 2, 2015

dfdx commented Mar 17, 2015

IainNZ commented Mar 17, 2015

dfdx commented Mar 18, 2015