Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1235] Fix webserver's odd behaviour #2330

Closed
wants to merge 1 commit into from

Conversation

sekikn
Copy link
Contributor

@sekikn sekikn commented May 27, 2017

Dear Airflow maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:

In some cases, the gunicorn master shuts down
but the webserver monitor process doesn't.
This PR add timeout functionality to shutdown
all related processes in such cases.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

tests.core:CliTests.test_cli_webserver_shutdown_when_gunicorn_master_is_killed

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@mention-bot
Copy link

@sekikn, thanks for your PR! By analyzing the history of the files in this pull request, we identified @plypaul, @mistercrunch and @saguziel to be potential reviewers.

@sekikn
Copy link
Contributor Author

sekikn commented May 28, 2017

CI failed on test_trigger_dag_for_date, which is reported as AIRFLOW-1245. It's irrelevant to this fix since CI failed only on Python 2.7 x PostgreSQL backend.

@icorbett
Copy link

icorbett commented Aug 19, 2017

AIRFLOW-1245 has been merged: #2325

@sekikn
Copy link
Contributor Author

sekikn commented Oct 1, 2017

Thanks for the information @icorbett. I've just rebased the PR.

@codecov-io
Copy link

codecov-io commented Oct 1, 2017

Codecov Report

Merging #2330 into master will increase coverage by 0.49%.
The diff coverage is 44.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2330      +/-   ##
==========================================
+ Coverage   72.98%   73.47%   +0.49%     
==========================================
  Files         180      180              
  Lines       12654    12668      +14     
==========================================
+ Hits         9235     9308      +73     
+ Misses       3419     3360      -59
Impacted Files Coverage Δ
airflow/exceptions.py 100% <100%> (ø) ⬆️
airflow/bin/cli.py 62.73% <41.86%> (+7.71%) ⬆️
airflow/jobs.py 80.08% <0%> (+0.44%) ⬆️
airflow/www/app.py 100% <0%> (+1.04%) ⬆️
airflow/utils/helpers.py 54.02% <0%> (+2.87%) ⬆️
airflow/task/task_runner/bash_task_runner.py 100% <0%> (+6.66%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8754cb1...70fc3f3. Read the comment docs.

@sekikn sekikn force-pushed the AIRFLOW-1235 branch 2 times, most recently from f10acf9 to ba1c283 Compare February 24, 2018 07:58
@sekikn
Copy link
Contributor Author

sekikn commented Feb 24, 2018

Rebased on master.

@Fokko
Copy link
Contributor

Fokko commented Feb 24, 2018

Why isn't this test executed on Travis?

@sekikn
Copy link
Contributor Author

sekikn commented Feb 25, 2018

Thanks for the review @Fokko, I have some reasons to disable it on Travis:

  • AFAIR accessing pidfile doesn't work as expected on Travis for some environmental reason. Strangely, when I tried to run this test on Travis, it seemed to cause another test failure.
  • This test uses another thread to kill gunicorn with some delay. It's non-deterministic and can occur transient test failure.

Instead of enabling this on Travis, I ran it locally and confirmed that it succeeded.

@Fokko
Copy link
Contributor

Fokko commented Feb 26, 2018

@sekikn Thanks for elaboration on the problem. I think we should find a way of isolating these tests. For me, a test that isn't automated, is as valuable as no test. I hope you understand, and this is something at the core of Airflow, so we have to make sure that it is tested well.

@sekikn
Copy link
Contributor Author

sekikn commented Feb 26, 2018

I understand your concern. I'll update the PR if I come up with some good solution. Thanks :)

@Fokko
Copy link
Contributor

Fokko commented Feb 27, 2018

@sekikn Thanks!

@sekikn
Copy link
Contributor Author

sekikn commented Mar 3, 2018

Updated PR. Instead of killing gunicorn actually using pidfile, it uses mock.patch to emulate the situation that gunicorn doesn't respond. @Fokko would you take a look?

@sekikn sekikn force-pushed the AIRFLOW-1235 branch 2 times, most recently from 14065d2 to 52c944d Compare March 13, 2018 22:59
@sekikn
Copy link
Contributor Author

sekikn commented Mar 13, 2018

Updated PR. Removed unnecessary code in newly added test as #3123 was merged.

@Fokko
Copy link
Contributor

Fokko commented Mar 14, 2018

@sekikn Could you rebase on master?

@sekikn
Copy link
Contributor Author

sekikn commented Mar 14, 2018

Rebased on master. CI failure occurred on test_authorized and it's irrelevant to this PR. It's continued for at least 3 days (e.g. https://travis-ci.org/apache/incubator-airflow/builds/351858876).

Copy link
Contributor

@artwr artwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from one variable which seems to need renaming, this looks good. Let's fix this and merge. @sekikn

if 0 < timeout and timeout <= time.time() - t:
raise AirflowWebServerTimeout(
"No response from gunicorn master within {0} seconds"
.format(master_timeout))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called just timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I'll update the PR. Thanks!

In some cases, the gunicorn master shuts down
but the webserver monitor process doesn't.
This PR add timeout functionality to shutdown
all related processes in such cases.
@johnarnold
Copy link
Contributor

+1 I like it.

@johnarnold
Copy link
Contributor

@artwr merge?

@asfgit asfgit closed this in 7e762d4 Mar 22, 2018
@artwr
Copy link
Contributor

artwr commented Mar 22, 2018

Done

@sekikn
Copy link
Contributor Author

sekikn commented Mar 23, 2018

@artwr @johnarnold Thanks a lot!

aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
In some cases, the gunicorn master shuts down
but the webserver monitor process doesn't.
This PR add timeout functionality to shutdown
all related processes in such cases.

Dear Airflow maintainers,

Please accept this PR. I understand that it will
not be reviewed until I have checked off all the
steps below!

### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "[AIRFLOW-XXX] My Airflow PR"
    -
https://issues.apache.org/jira/browse/AIRFLOW-1235

### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:

In some cases, the gunicorn master shuts down
but the webserver monitor process doesn't.
This PR add timeout functionality to shutdown
all related processes in such cases.

### Tests
- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:

tests.core:CliTests.test_cli_webserver_shutdown_wh
en_gunicorn_master_is_killed

### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not
"adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Closes apache#2330 from sekikn/AIRFLOW-1235
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants