Skip to content
This repository has been archived by the owner on Sep 4, 2019. It is now read-only.

Good throughput, but very high latency #48

Closed
dfdx opened this issue Feb 22, 2015 · 6 comments
Closed

Good throughput, but very high latency #48

dfdx opened this issue Feb 22, 2015 · 6 comments

Comments

@dfdx
Copy link

dfdx commented Feb 22, 2015

I've made several performance tests to compare HttpServer.jl with other frameworks, and here are some results:

Flask (Python), 1 thread

  • latency: 1 ms
  • throughput: 1200 rps

Flask (Python), 1000 threads

  • latency: 102 ms
  • throughput: 1200 rps

Spray (Scala), 1 thread

  • latency: < 1 ms
  • throughput: 5500 rps

Spray (Scala), 1000 threads

  • latency: 12 ms
  • throughput: 18700 rps

HttpServer.jl, 1 thread

  • latency: 39 ms
  • throughput: 24 rps

HttpServer.jl, 1000 threads

  • latency: 160 ms
  • throughput: 6200

All tests done with Gatling on my laptop with Intel Core i7 (2 physical / 4 virtual cores) using Julia v0.3.5. Number of threads refers to a number of simultaneous requests on a client side, latency and throughput - to a 50th percentile of corresponding values.

Results for Flask are pretty predictable: Python's GIL doesn't allow to use full power of multithreading, showing almost no difference between 1 and 1000 simultaneous requests from clients. Spray is also no surprise: being based on actor model (via Akka framework), it utilizes both - concurrency and asynchronous IO to gain topnotch results.

I also like how HttpServer.jl handles multiple simultaneous requests (probably thanks to libuv, since as far as I know Julia's @async macro, used to handle separate users, still uses only single system thread). But I'm really surprised and dissapointed by its latencies: compared to average of 1ms for other servers 39ms for HttpServer.jl sounds totally unreasonable.

Is it a known issue? Is it solvable within current architecture? Or HttpServer.jl just isn't meant to provide low latency?

@StefanKarpinski
Copy link
Contributor

Given that literally no effort has been put into performance and Flask and Spray are both considered mature, high-performance web servers, this is a pretty good starting point. I've seen much larger performance gaps than 50x fall away quite quickly with a little targeted optimization work. Seems like time to start running @profile on some HttpServer code.

@dfdx
Copy link
Author

dfdx commented Feb 22, 2015

In my little benchmark server spends most of the time (~90%) on the line:

ccall(:jl_run_once,Int32,(Ptr{Void},),loop)

thus boiling down to libuv itself. I'm not very experienced in using this library, are there any performance tests or previous work for optimizing it in Julia?

@dfdx
Copy link
Author

dfdx commented Mar 2, 2015

A little update. Here's portion of Profile.print() output where most of the time is spent:

  5914 ...ver/src/HttpServer.jl; process_client; line: 266
   5914 stream.jl; readavailable; line: 709
    5914 stream.jl; wait_readnb; line: 316
     5914 stream.jl; stream_wait; line: 263
      5914 ./task.jl; wait; line: 194
       5914 ./task.jl; wait; line: 273
        5914 ./stream.jl; process_events; line: 537
         1 ./inference.jl; typeinf_ext; line: 1216
          1 ./inference.jl; typeinf; line: 1544
           1 ./inference.jl; inlining_pass; line: 2553
            1 ./inference.jl; inlining_pass; line: 2590
             1 ./inference.jl; inlining_pass; line: 2656
              1 ./inference.jl; inlineable; line: 2300
  1    ...ver/src/HttpServer.jl; process_client; line: 267

So it starts from readavailable(client.sock) and goes down to process_events(true). However, I tried to measure runtime of readavailable() by itself like this (simplest possible server-client communication):

server = listen(2000)
clientsock = connect(2000)
serversock = accept(server)
@time for i=1:1_000_000
    println(clientsock, "hello")
    readavailable(serversock)
end

and it turns to perform very well, taking only 25 microseconds per loop (compare it to 37 milliseconds for full HttpServer's processing loop).

Thus, I believe issue is not in IO itself, but instead in environment. My guess is that slowdown comes from switching between tasks during read/write operations. I'm going to (slowly) dive further into the code, but if somebody has any pointers, it may speed up things a lot.

@dfdx
Copy link
Author

dfdx commented Mar 17, 2015

It's totally crazy, but the reason for such high latencies in tests is... Java. It turns out that Gatling (Scala), JMeter (Java) and pure Scala console all result in very stable 39ms per request on my machine. At the same time, with even very naive test in Python (using requests library) I was able to get 1.3ms, and with Julia (Requests.jl) it was even lower - about 0.5ms per request.

It's a curious case and it still requires some investigation, but seems like it does nothing with bad performance, so I'm closing this issue as irrelevant.

@dfdx dfdx closed this as completed Mar 17, 2015
@IainNZ
Copy link
Contributor

IainNZ commented Mar 17, 2015

Just so I understand, are you saying that the latencies you record depend on which language you test them from?

@dfdx
Copy link
Author

dfdx commented Mar 18, 2015

I believe it depends on HTTP library implementation, just most JVM-based languages use the same library under the hood. Probably, #40 is also related to this issue: both - JVM and AB - spend much more time on request than needed, but JVM "cuts" request after 39ms, while AB waits for maximal timeout and fails.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants