New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency levels #49

Closed
Licenser opened this Issue Apr 3, 2013 · 20 comments

Comments

Projects
None yet
8 participants
@Licenser
Contributor

Licenser commented Apr 3, 2013

Okay I hope I don't start to get annoying ;)

The concurrency levels used for the benchmark seem ridiculous low, if someone has between 8 and 265 concurrent requests on a web application that does virtually no work the framework should not be a concern.

Modern frameworks talk about concurrency levels in the range of tens of thousands if not millions of concurrent requests I'd think something of 10k upwards would be a more interesting and valuable information.

PS: I'll get some test hardware the next days and will try to run this tests under 10k+ conditions and provide some results if interested.

@bhauer

This comment has been minimized.

Member

bhauer commented Apr 3, 2013

Not at all annoying! It's fun to talk about this stuff.

This particular question is a subject for a longer blog entry that I'd like to write. I am of the opinion that those who talk about concurrency in the tens or hundreds of thousands are more interested in a WebSocket-enabled future than the HTTP request-response present. Not that there is anything wrong with that! I love WebSockets.

But in the present, where WebSockets are not yet widely used, a web server's goal is to receive requests and respond as quickly as possible.

Simulations that process tens or hundreds of thousands of concurrent requests are often written to do something like a "sleep" operation on the server, basically putting the request into some idle state where the socket is connected but nothing (or very little) is happening. The reason for this is that such simulations are intended to demonstrate that the server can peel requests off of an inbound queue and then put them into some handler loop. If the requests are processed too quickly and taken out of that handler loop, they will be closed, reducing the concurrency number, which is working against the goal of such tests.

In the traditional web request-response model, the server's primary goal should be to successfully and fully respond to requests as fast as possible. There is no value in the server holding onto a request in an idle state.

That is why all of the tests we've written immediately work to provide a response as soon as the request is received and routed to the proper handler code. Fetch the object(s), serialize, and send. Get that request closed out so our CPU is free for the next request.

And it's because real work is being done (rather than putting a request into an idle state to increase a server-side concurrency number) that as our client-side concurrency levels rise (e.g., at 64, 128, and 256) most frameworks reach a plateau. That plateau is where the CPU and I/O capacity of the server is fully utilized and can no longer fulfill requests as fast as they arrive.

Increasing client-side concurrency once the server's CPU and I/O are fully utilized simply means that you are building a queue in front of the server component that peels requests off and begins the routing process. Increase it high enough and eventually you'll start seeing 500-series responses from some platforms/frameworks ("server too busy") to keep that queue from unbounded growth.

All that said, we put this test on github precisely to encourage people to experiment. :) So I'd be happy to hear if you find something interesting when you run with 10,000 or more client-side concurrency.

@Licenser

This comment has been minimized.

Contributor

Licenser commented Apr 3, 2013

I agree it's a very interesting topic. When I think about servers handling 10k+ request I mostly think of API servers which isn't a very uncommon abstraction, I'm working on a project that exposes it's entire functionality via a REST API and and having more then 256 requests arrive concurrently is something that can happen very fast in my use case. (PS: I use web sockets for long living connections like live notifications too ;)

Non the less I see that this might not apply for everyone but in the end I think it sums up to that a framework that can handle more requests concurrently means a framework that can serve more users at once without me needing to buy new hardware (or rent new cloud instances).

I totally agree on the point that bencharmks that try to increase concurrency by delaying requests make no sense. I am more thinking about having the load generating tool try to send of concurrent requests at the same time, if some fall over with a 5* that also is a very valuable information in the sense of, lets say 'gemini can handle up to 10k concurrent requests without dropping a requests but while go reaches it's limit at 5k' (numbers are entirely made up here). I think it's kind of a question how well a framework scales over time.

Also with some first quick tests it seems that with higher concurrency levels frameworks reach new peaks (disclaimer numbers are from a VM on my laptop as of now running both client and server on the same host) which would not show at all in the original benchmark.

From the preliminarry local tests I did I'd say the plateaus in the original benchmark are not really the best the frameworks can do (at least speaking of the upper half) but just that it doesn't make much difference for them if there are 128 or 256 requests.

Here an example of nodejs, netty, elli and cowboy for concurrency of 1024 - 8192:

{
    "concurrencyLevels": [
        1024, 
        2048, 
        4096, 
        8192
    ], 
    "frameworks": [
        "gemini", 
        "cake", 
        "compojure", 
        "django", 
        "express", 
        "express-mongodb", 
        "express-mysql", 
        "grails", 
        "nodejs", 
        "nodejs-mongodb", 
        "nodejs-mysql", 
        "php", 
        "php-raw", 
        "play", 
        "rack-ruby", 
        "rack-jruby", 
        "rails-ruby", 
        "rails-jruby", 
        "servlet", 
        "servlet-raw", 
        "sinatra-ruby", 
        "sinatra-jruby", 
        "spring", 
        "tapestry", 
        "vertx", 
        "webgo", 
        "wicket", 
        "go", 
        "nodejs-mysql-raw", 
        "wsgi", 
        "netty", 
        "flask", 
        "play-scala", 
        "django-optimized", 
        "rails-optimized-ruby", 
        "rails-optimized-jruby", 
        "http-kit", 
        "cowboy", 
        "elli"
    ], 
    "queryIntervals": [
        1, 
        5, 
        10, 
        15, 
        20
    ], 
    "rawData": {
        "db": {}, 
        "json": {
            "30": [
                "22992", 
                "97647", 
                "145398", 
                "148603"
            ], 
            "37": [
                "30657", 
                "65578", 
                "128368", 
                "153825"
            ], 
            "38": [
                "39437", 
                "126977", 
                "197625", 
                "226097"
            ], 
            "8": [
                "19529", 
                "46478", 
                "93422", 
                "166966"
            ]
        }, 
        "query": {}
    }, 
    "weighttpData": {
        "db": {}, 
        "json": {
            "30": {
                "2xx": "2910", 
                "3xx": "0", 
                "4xx": "0", 
                "5xx": "0", 
                "errored": "0", 
                "failed": "97090", 
                "success": "2910", 
                "totalTime": 6.732
            }, 
            "37": {
                "2xx": "3103", 
                "3xx": "0", 
                "4xx": "0", 
                "5xx": "0", 
                "errored": "0", 
                "failed": "96897", 
                "success": "3103", 
                "totalTime": 6.214
            }, 
            "38": {
                "2xx": "0", 
                "3xx": "0", 
                "4xx": "0", 
                "5xx": "0", 
                "errored": "0", 
                "failed": "100000", 
                "success": "0", 
                "totalTime": 4.2700000000000005
            }, 
            "8": {
                "2xx": "1920", 
                "3xx": "0", 
                "4xx": "0", 
                "5xx": "0", 
                "errored": "0", 
                "failed": "98080", 
                "success": "1920", 
                "totalTime": 8.939000000000002
            }
        }, 
        "query": {}
    }
}
@Licenser

This comment has been minimized.

Contributor

Licenser commented Apr 3, 2013

Added to that I just noticed that a lot of those requests are failed I guess that is what you were talking about? / Still getting used to reading the results ;)

@bhauer

This comment has been minimized.

Member

bhauer commented Apr 3, 2013

Yes. In order to test that level of client-side concurrency, you will probably need to change the deployment settings for the frameworks and platforms being exercised.

For example, in the "easy" (at least easy to explain) case of a server running Apache HTTPD, the most important settings in question would be ListenBackLog and MaxClients.

http://httpd.apache.org/docs/2.2/mod/mpm_common.html#listenbacklog

http://httpd.apache.org/docs/2.2/mod/mpm_common.html#maxclients

Of course, Apache may not be the best suited to that kind of exercise; I'm just mentioning it because most readers will be familiar with the two configuration options above.

@Skamander

This comment has been minimized.

Skamander commented Apr 3, 2013

@Licenser

When you test on your Hardware could you please bumb the max-connections in the resin conf to 256 (it defaults to 128). The Java based Frameworks drop on dedicated Hardware after 128 concurrent requests. Seems related.

@bhauer

This comment has been minimized.

Member

bhauer commented Apr 5, 2013

Hi again @Licenser.

I wrote a blog entry today that, while only tangentially related to our conversation earlier today, was nevertheless inspired by that conversation.

Requests-per-second versus concurrent connections: http://tiamat.tsotech.com/rps-vs-connections

@Licenser

This comment has been minimized.

Contributor

Licenser commented Apr 5, 2013

Hi mate,
just read your article and it is indeed very interesting :) gave me a few insights and a better understanding why you benchmark what you benchmark. It makes sense :).

@bhauer

This comment has been minimized.

Member

bhauer commented Apr 5, 2013

Incidentally, all of this talk about WebSockets makes me ever more interested in their widespread use. I've only built small applications using WebSockets, but the model feels completely natural: that the server and client can communicate at will whenever they want to.

I think we'll close this issue, but I welcome any further thoughts or findings! Thanks again!

@robertmeta

This comment has been minimized.

robertmeta commented Apr 13, 2013

In any sort of exchange (middle man) situation (where you talk to a client on side A, and a service or API outside of your control on side B) you can easily end up with thousands of concurrent requests. Dependencies on outside APIs be an ever increasing problem.

Small businesses often find themselves in this place if they are the middle man between thousands of users across their small handful of machines and on the other side, a massive network of company X -- who responds slowly but consistently across a great number of machines.

It is also a design decision for how to work with any AJAX style interaction, many req/rep with separate synchronization, or a single process/actor/etc that handles all interaction within that page.

@bhauer

This comment has been minimized.

Member

bhauer commented Apr 13, 2013

I like that. Thank you for the notes!

We have a new test suggestion on the to-do list for future rounds that would exercise that scenario by simulating the use (and waiting for) an external service. With that test, we'll have a good reason to leave the connection idle while we wait for the simulated external service to provide its result. That will be a good use-case for increasing the client-side concurrency level.

@robertmeta

This comment has been minimized.

robertmeta commented Apr 13, 2013

I think that test will do a good job of testing concurrency, because it hits the true ugly recursive nature of it perfectly. When the external service adds an additional 100 ms delay, this increases concurrency on the local app, which slows performance, which increases concurrency, which slows performances, which increase concurrency... splat.

Finding tools and frameworks that can handle this case elegantly is going to be one of the major challenges of developers over the next few years as they go from depending on one external service (their own DB mostly) to multiple. Having a tech stack they know can handle this also allows them to outsource things they don't care about in a sane way.

@purplefox

This comment has been minimized.

Contributor

purplefox commented May 19, 2013

Being able to support large numbers of connections is not all about websockets.

HTTP keep alive is the default in 1.1, and a good web server should hold on to a connection for around 10s or so before closing it.

This means, that if you expect to be able to handle 1000 req/s on a server from different clients, which seems entirely reasonable, you will need to support at least 10000 concurrent connections

@bhauer

This comment has been minimized.

Member

bhauer commented May 19, 2013

Indeed, and we are using HTTP keep alive where supported.

The scenario you describe, while certainly realistic, implicitly introduces a pause between an initial socket connection and each subsequent request on that connection. Otherwise, it's not realistic. And if it doesn't introduce a pause, the net behavior will be much the same as the current 256-concurrency test: fully-utilized CPU for the CPU-limited tests or fully-utilized Ethernet for the network-limited tests.

We have not made implementing such a test a priority yet.

@purplefox

This comment has been minimized.

Contributor

purplefox commented May 19, 2013

I'm not sure I understand your point, but 256 connections seems a woefully low number to me, and I don't think it's representative of what an actual popular website will have to deal with.

@michaelhixson

This comment has been minimized.

Member

michaelhixson commented May 20, 2013

The present 256-concurrency tests are like: 256 users who live in the same data center as the server making requests one after another immediately and repeatedly for the duration of the test. To me that represents the kind of load you might expect from tens or hundreds of thousands of "real" concurrent users. So, 256 in this context does not seem low to me. If we had higher concurrency for the present tests, it's doubtful that we'd see much of interest because the server's CPU is already fully utilized for (almost) all of the tests.

A delay would make higher concurrency levels more interesting because then we'd potentially have CPU bandwidth to handle all the requests. The delay could be client side (idle time between each request) or server side (wait for an external service), or both. It would be interesting, not because it would simulate more "work" for the server, but because it would show whether the particulars of managing a large number of connections (as opposed to other kinds of work the server does) causes certain frameworks or platforms to break down.

@purplefox

This comment has been minimized.

Contributor

purplefox commented May 20, 2013

Your assumption that 256 connections firing a lot of requests per connection per sec should have the same behaviour as 10000 connections firing proportionately less requests per connection per second seems flawed to me for at least a couple of reasons:

  1. Context switching. For those frameworks that use a traditional "Thread per connection" much more context switching will be going on for 10K+ connections, to the point where context switching overhead eats the CPU (almost) completely, and expect to see performance drop off.
  2. Thread stack overhead. For "Thread per connection" frameworks each thread will have significant RAM overhead for stack, to the point where you'll run out of RAM very quickly on your server unless you have a really tiny stack
  3. TCP buffer RAM overhead. For both "Thread per connection" and non blocking servers there will be RAM overhead for the TCP buffers.

Solving 1. + 2. is probably the biggest motivation for the new breed of non blocking frameworks such as Vert.x, Play/Akka, Spray, Netty etc to exist. And this is going to be even more important as we're starting to enter a new world of (mainly mobile and embedded) devices which have really long lived connections to the server (websockets, MQTT etc).

@bhauer

This comment has been minimized.

Member

bhauer commented May 20, 2013

Hi @purplefox. Thanks for the additional insights. We will be testing with higher concurrency levels in a future "external API simulation" test, and we will reconsider adding higher concurrency levels to other tests as well.

@hrj

This comment has been minimized.

hrj commented Dec 18, 2013

@bhauer If you are going to reconsider higher concurrency levels, I suggest you mark the issue as "open".

@bhauer

This comment has been minimized.

Member

bhauer commented Dec 18, 2013

@hrj Sounds reasonable.

@bhauer bhauer reopened this Dec 18, 2013

@bhauer bhauer changed the title from concurrency levels to Concurrency levels Apr 3, 2014

@LadyMozzarella

This comment has been minimized.

Member

LadyMozzarella commented Mar 23, 2015

Closing due to inactivity and it appears that in the end this issue would be covered by the future test type (#12 "Tests that exercise requests made to external services and therefore must go idle until the external service provides a response." ) listed in issue #133.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment