Sync vs Async workers #1409

gukoff opened this Issue Dec 15, 2016 · 3 comments


None yet

3 participants

gukoff commented Dec 15, 2016 edited


The documentation doesn't state clearly enough, how the async workers are different from the sync ones and what should be done by a programmer to make use of their difference.

I assume, that asynchronous workers are spawned in separate processes based on the pre-fork model.

Say, we want to see a difference between a sync and a gevent worker classes in an example of a simple application. Here go four scenarios:

Scenario №1

The application accepts a request, makes 10 external calls using the requests library and returns an array of 10 responses.

My assumption: there is no difference, the class of the workers doesn't matter.

Scenario №2

The application calls gevent.monkey.patch_all() in the pre_fork() function of the master process. Then the first scenario takes place: the app accepts a request, makes 10 external calls using the requests library and returns an array of 10 responses.

My assumption: synchronous workers implicitly turn into gevent workers.

Scenario №3

The same as the Scenario№2, but the monkey-patch is called in a worker.

My assumptions:

  1. gevent.monkey.patch_all() doesn't affect the way workers listen on the socket. Synchronous workers don't turn into gevent workers and don't accept new calls until the previous are handled.
  2. gevent worker might accept a new call when an external call with requests occurs. The allowed count of concurrently handled calls is capped by worker_connections setting. That's the only difference.

Scenario №4

The application accepts a request, spawns 5 gevent jobs and joins them; their 5 responses will be the result. After that it spawns another 10 jobs, doesn't join them and returns immediately.

My assumptions:

  1. After finishing the first request a synchronous worker keeps listening on a socket. 10 spawned jobs are waiting for the second request to be accepted, where the gevent. joinall(...) will be called and they might be scheduled to be executed.
  2. After finishing the first request a gevent worker subsequently executes 10 spawned jobs until the second request is issued. It can switch to handling the second request only after a gevent context switch(gevent. joinall(...), after having any of the jobs done, etc.).
  3. In case workers reload(SIGHUP, MAX_REQUESTS, etc.) synchronous ones lose all pending spawned jobs. Gevent workers are terminated gracefully.

I feel, many of these assumptions must be wrong. Could you please correct me and expand on it in the documentation respectively?

benoitc commented Dec 22, 2016

This page describe the design an give some informations about the workers:

I will answer in a generic manner if it's OK for you. Hopefully it will give you enough hint to answer yourself to the scenarios above.

If you run gunicorn behind a proxy that buffer the connection the key point is not the number the number of connections that can accept gunicorn,but rather the number of slow connections (worker that do a huge task, connecting to an api, a database ...) or connections that will be used for a long time (longpolling, ..) . In such case an async worker is advised. When you return quite immediately, then a sync worker is enough and in most case when a database is locale or in the same network with a low latency it can also be used.

If your run gunicorn standalone. Then you will need a threaded or an async worker if you expect to have a lot of concurrent connections. Increasing the number of worker when using a sync worker is also enough sometimes when you don't expect a large number of connectios or can handle some latency in the responses.

benoitc commented Dec 22, 2016

I will also add that monley patching add some border effects to your application which cna be an issue or not. Using other async workers don't suffer such border effects. At least for the tornado and threaded workers.

gukoff commented Jan 16, 2017 edited

@benoitc thanks for your answer!

I've already read the docs. Essentially, my point is that the docs are way to short. There are important implementation details, which aren't mentioned yet.

Firstly, it came as a surprise to me, that gevent-workers implicitly call gevent.monkey.patch_all(). It is quite a rough strategy, unacceptable in many cases. There should be another type of gevent workers, which simply listen on a gevent socket and don't monkey-patch anything. And this behaviour isn't explicitly documented. It's also important to know, whether the main process gets monkey-patched as well as the worker processes.

Secondly, it's not very clear, how the max-requests option works. Say, if given, does it use the graceful_timeout option? If so, how does the graceful_timeout option work? Does it make a worker stop accepting new requests, or it's up to a developer?

Thirdly, how exactly does gunicorn restart after the HUP signal? The documentation states as follows:

HUP: Reload the configuration, start the new worker processes with a new configuration and gracefully shutdown older workers

So, in case I have a server with 30 workers, a long-running pre_fork function(1 minute) and the graceful timeout of 20 seconds, what are the actions after the HUP? I suppose, they are:

  1. Reload the application and configuration in the master process;
  2. run the pre_fork function in the master process. Wait a minute for it to finish. Don't touch the workers;
  3. fork 30 new workers. Let them work together with the older ones. In other words, for a short period of time consume double RAM and let 60 workers run on the same socket;
  4. gracefully shutdown the older workers. Give them 20 seconds to handle the pending queries and terminate.
    Am I right?

Fourthly, what happens if the master process is sent with two HUP signals
at the same time? Are they put is some kind of signal queue and handled consecutively? What about other signals?

Fifthly, has the recommendation about 2*CORES + 1 workers something to do with asynchronous workers? I think, that the gevent workers are expected to utilise CPU to the limit and never wait in any IO-bound tasks, and ~CORES workers are OK. Otherwise the load isn't high enough and the number of workers can be even lower.

And so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment