Workers silently exit when allocating more memory than the system allows. #1937

jonathanlunt · 2018-12-17T00:09:00Z

Problem Description

Gunicorn workers silently exit when allocating more memory than the system allows. This causes the master to enter an infinite loop where it keeps booting new workers unsuccessfully. No hook/logging exists to detect or handle this behavior.

[2018-12-16 23:29:45 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2018-12-16 23:29:45 +0000] [1] [INFO] Listening at: http://127.0.0.1:8080 (1)
[2018-12-16 23:29:45 +0000] [1] [INFO] Using worker: sync
[2018-12-16 23:29:45 +0000] [10] [INFO] Booting worker with pid: 10
post_fork 10
child_exit 10
[2018-12-16 23:29:46 +0000] [11] [INFO] Booting worker with pid: 11
post_fork 11
child_exit 11
[2018-12-16 23:29:47 +0000] [12] [INFO] Booting worker with pid: 12
post_fork 12
child_exit 12
[2018-12-16 23:29:47 +0000] [13] [INFO] Booting worker with pid: 13
post_fork 13
child_exit 13

Files

These file will allow you to to reproduce the behavior (clone-able version here: https://github.com/jonathanlunt/gunicorn-memory-example). docker is used to artificially constrain system resources.

app.py: Generic "hello world" from gunicorn documentation

def app(environ, start_response):
    data = b"Hello, World!\n"
    start_response("200 OK", [
        ("Content-Type", "text/plain"),
        ("Content-Length", str(len(data)))
    ])
    return iter([data])

config.py: This includes a post_fork function that allocates 100MB

bind = '%s:%s' % ('127.0.0.1', '8080')
workers = 1

def post_fork(server, worker):
    """Allocate ~100MB of integers"""
    print('post_fork', worker.pid)
    items = list(range(4300800))

def child_exit(server, worker):
    print('child_exit', worker.pid)

Dockerfile

FROM python:3.6-alpine3.8

RUN pip install gunicorn

COPY *.py /opt/
WORKDIR /opt

ENTRYPOINT ["gunicorn", "-c", "config.py", "app:app"]

Usage

The run command limits the container to allow only 50MB in memory

docker build -t gunicorn-example .
docker run -m 50mb -t gunicorn-example

Proposed Solutions

Arbiter Logging:

Since no error is logged, it is difficult to determine when this behavior is triggered other than track the number of times a worker is created. One possible way would be to log the status code in the Arbiter.

Update Arbiter.reap_workers:

gunicorn/gunicorn/arbiter.py

Line 524 in 33025cf

if exitcode == self.WORKER_BOOT_ERROR:

To include:

WORKER_SUCCESS = 0
if status != WORKER_SUCCESS:
    self.log.error("Worker exited with status code: %s", status)

Worker Exit Code Tracking:

Another way to handle the error is to allow the user to perform an action depending on the process exit code. However, currently exitcode doesn't appear to be tracked by the worker class.

Update Worker.__init__:

gunicorn/gunicorn/workers/base.py

Line 36 in 33025cf

def __init__(self, age, ppid, sockets, app, timeout, cfg, log):

To include:

self.exitcode = None

Update Arbiter.reap_workers:

gunicorn/gunicorn/arbiter.py

Line 534 in 33025cf

worker.tmp.close()

To include:

worker.exitcode = status

The status will come up as 9 in this case. This would allow the user-provided child_exit code make a decision based on Worker.exitcode

Comments

If there are other solutions to this issue, I'd be happy to hear them, but for now I don't know if there's a good way to track/handle this situation with gunicorn by default.

I would be willing to submit a PR for the proposed solutions, but I wanted to raise this up as an issue in order to get feedback on what the best way to handle this behavior.

The text was updated successfully, but these errors were encountered:

tilgovi · 2019-01-22T07:58:59Z

I like both proposals.

benoitc · 2019-01-22T10:22:26Z

i don't think we should try to spawn indefinitely in any case anyway. We should probably handle the number of times we tried to respawn a worker in a time window and decide to stop at some point, shouldn't we?

Additionally we should indeed log the status code error. IMO it's better to crash and let the user do something at some point if it needs to restart and co.

adoukkali · 2021-09-15T21:22:59Z

Hi @benoitc
I would like to know if there is any mechanism by which we can prevent the worker from exiting silently. I would prefer to drop the request that is being processed by the worker rather than killing the worker (in case the worker is bootstrapping things for instance, it will be expensive to restart it).

benoitc · 2021-09-16T22:03:22Z

@adoukkali What do you mean by "silently" ? If something is crashing then it will crash. If you don't want to have the worker crashing, you should take measures in your applications to not trigger an exception that will make it crash.

sp1rs · 2022-03-29T09:04:35Z

Any update on this? @adoukkali @benoitc What about the 2nd proposal? Worker Exit Code Tracking?

tilgovi mentioned this issue Jan 22, 2019

When given few resources gunicorn will spawn worker threads indefinitely #1956

Closed

benoitc self-assigned this Jan 22, 2019

benoitc added the Feature/Core label Jan 22, 2019

sp1rs mentioned this issue Mar 29, 2022

More verbose exit logging #2315

Merged

benoitc closed this as completed May 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workers silently exit when allocating more memory than the system allows. #1937

Workers silently exit when allocating more memory than the system allows. #1937

jonathanlunt commented Dec 17, 2018 •

edited

Loading

tilgovi commented Jan 22, 2019

benoitc commented Jan 22, 2019

adoukkali commented Sep 15, 2021

benoitc commented Sep 16, 2021 •

edited

Loading

sp1rs commented Mar 29, 2022

Workers silently exit when allocating more memory than the system allows. #1937

Workers silently exit when allocating more memory than the system allows. #1937

Comments

jonathanlunt commented Dec 17, 2018 • edited Loading

Problem Description

Files

Usage

Proposed Solutions

Comments

tilgovi commented Jan 22, 2019

benoitc commented Jan 22, 2019

adoukkali commented Sep 15, 2021

benoitc commented Sep 16, 2021 • edited Loading

sp1rs commented Mar 29, 2022

jonathanlunt commented Dec 17, 2018 •

edited

Loading

benoitc commented Sep 16, 2021 •

edited

Loading