Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1235] In webserver, when gunicorn master dies, bubble up exc #3144

Closed
wants to merge 1 commit into from

Conversation

johnarnold
Copy link
Contributor

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:

This catches the condition where gunicorn master process exits (usually due to a problem starting workers, e.g. SQL unavailable), but airflow isn't aware and the webserver just keeps running. This change will not leave airflow webserver in a hung state. It should crash all the way out, so that systemd/supervisord/docker/whatever init system can restart it.

This is a partial fix so we should not close out the Issue. We need to fix the underlying condition and make the airflow www app more robust (perhaps some retries) and potentially consider restarting gunicorn if it crashes.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    This is a trivial change. The positive case is already covered in test coverage. I'm not totally sure how to cover the negative case (e.g. gunicorn dies). Mocking?

Commits

  • [X ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":

    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"
  • [X ] Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@artwr
Copy link
Contributor

artwr commented Mar 20, 2018

LGTM. +1
Is there any chance you could also add a test covering this? Since the behaviour to reproduce is documented in the ticket, it would be helpful.

@johnarnold
Copy link
Contributor Author

@artwr I added a unit test for the new code. I didn't specifically repro the root cause that i saw in the Issue (database comms issues) but just mocked the conditions being checked, e.g. the gunicorn process not running or in a zombie or stopped state. Should be good enough.

I also fixed a minor bug for the difference between running in daemon mode or not -- a different object class is used (Process vs Popen).

@codecov-io
Copy link

codecov-io commented Mar 21, 2018

Codecov Report

Merging #3144 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3144      +/-   ##
==========================================
+ Coverage   73.05%   73.13%   +0.07%     
==========================================
  Files         180      180              
  Lines       12654    12656       +2     
==========================================
+ Hits         9245     9256      +11     
+ Misses       3409     3400       -9
Impacted Files Coverage Δ
airflow/bin/cli.py 56.35% <100%> (+1.33%) ⬆️
airflow/models.py 87.33% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f642242...8fa5b6b. Read the comment docs.

@sekikn
Copy link
Contributor

sekikn commented Mar 21, 2018

I’m afraid that this fix doesn't seem to cover all infinity-loop cases. For example, if gunicorn becomes a zombie within calling wait_until_true, it doesn't return and the newly added if-statement won't be executed. I've already sent a PR for this problem as #2330. Is that insufficient?

@johnarnold
Copy link
Contributor Author

@sekikn I didn't see your PR before writing this. I'll check it out!

@johnarnold
Copy link
Contributor Author

Close in favor of #2330

@johnarnold johnarnold closed this Mar 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants