Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers silently exit when allocating more memory than the system allows. #1937

Closed
jonathanlunt opened this issue Dec 17, 2018 · 5 comments
Closed
Assignees

Comments

@jonathanlunt
Copy link

jonathanlunt commented Dec 17, 2018

Problem Description

Gunicorn workers silently exit when allocating more memory than the system allows. This causes the master to enter an infinite loop where it keeps booting new workers unsuccessfully. No hook/logging exists to detect or handle this behavior.

[2018-12-16 23:29:45 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2018-12-16 23:29:45 +0000] [1] [INFO] Listening at: http://127.0.0.1:8080 (1)
[2018-12-16 23:29:45 +0000] [1] [INFO] Using worker: sync
[2018-12-16 23:29:45 +0000] [10] [INFO] Booting worker with pid: 10
post_fork 10
child_exit 10
[2018-12-16 23:29:46 +0000] [11] [INFO] Booting worker with pid: 11
post_fork 11
child_exit 11
[2018-12-16 23:29:47 +0000] [12] [INFO] Booting worker with pid: 12
post_fork 12
child_exit 12
[2018-12-16 23:29:47 +0000] [13] [INFO] Booting worker with pid: 13
post_fork 13
child_exit 13

Files

These file will allow you to to reproduce the behavior (clone-able version here: https://github.com/jonathanlunt/gunicorn-memory-example). docker is used to artificially constrain system resources.

app.py: Generic "hello world" from gunicorn documentation

def app(environ, start_response):
    data = b"Hello, World!\n"
    start_response("200 OK", [
        ("Content-Type", "text/plain"),
        ("Content-Length", str(len(data)))
    ])
    return iter([data])

config.py: This includes a post_fork function that allocates 100MB

bind = '%s:%s' % ('127.0.0.1', '8080')
workers = 1

def post_fork(server, worker):
    """Allocate ~100MB of integers"""
    print('post_fork', worker.pid)
    items = list(range(4300800))

def child_exit(server, worker):
    print('child_exit', worker.pid)

Dockerfile

FROM python:3.6-alpine3.8

RUN pip install gunicorn

COPY *.py /opt/
WORKDIR /opt

ENTRYPOINT ["gunicorn", "-c", "config.py", "app:app"]

Usage

The run command limits the container to allow only 50MB in memory

docker build -t gunicorn-example .
docker run -m 50mb -t gunicorn-example

Proposed Solutions

Arbiter Logging:

Since no error is logged, it is difficult to determine when this behavior is triggered other than track the number of times a worker is created. One possible way would be to log the status code in the Arbiter.

Update Arbiter.reap_workers:

if exitcode == self.WORKER_BOOT_ERROR:

To include:

WORKER_SUCCESS = 0
if status != WORKER_SUCCESS:
    self.log.error("Worker exited with status code: %s", status)

Worker Exit Code Tracking:

Another way to handle the error is to allow the user to perform an action depending on the process exit code. However, currently exitcode doesn't appear to be tracked by the worker class.

Update Worker.__init__:

def __init__(self, age, ppid, sockets, app, timeout, cfg, log):

To include:

self.exitcode = None

Update Arbiter.reap_workers:

worker.tmp.close()

To include:

worker.exitcode = status

The status will come up as 9 in this case. This would allow the user-provided child_exit code make a decision based on Worker.exitcode

Comments

If there are other solutions to this issue, I'd be happy to hear them, but for now I don't know if there's a good way to track/handle this situation with gunicorn by default.

I would be willing to submit a PR for the proposed solutions, but I wanted to raise this up as an issue in order to get feedback on what the best way to handle this behavior.

@tilgovi
Copy link
Collaborator

tilgovi commented Jan 22, 2019

I like both proposals.

@benoitc
Copy link
Owner

benoitc commented Jan 22, 2019

i don't think we should try to spawn indefinitely in any case anyway. We should probably handle the number of times we tried to respawn a worker in a time window and decide to stop at some point, shouldn't we?

Additionally we should indeed log the status code error. IMO it's better to crash and let the user do something at some point if it needs to restart and co.

@benoitc benoitc self-assigned this Jan 22, 2019
@adoukkali
Copy link

Hi @benoitc
I would like to know if there is any mechanism by which we can prevent the worker from exiting silently. I would prefer to drop the request that is being processed by the worker rather than killing the worker (in case the worker is bootstrapping things for instance, it will be expensive to restart it).

@benoitc
Copy link
Owner

benoitc commented Sep 16, 2021

@adoukkali What do you mean by "silently" ? If something is crashing then it will crash. If you don't want to have the worker crashing, you should take measures in your applications to not trigger an exception that will make it crash.

@sp1rs
Copy link

sp1rs commented Mar 29, 2022

Any update on this? @adoukkali @benoitc What about the 2nd proposal? Worker Exit Code Tracking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants