Skip to content

[AIRFLOW-1885] Fix access of cmdline when a gunicorn workers becomes a zombie#2844

Closed
j16r wants to merge 1 commit intoapache:masterfrom
j16r:bugfix/poll_zombie_process
Closed

[AIRFLOW-1885] Fix access of cmdline when a gunicorn workers becomes a zombie#2844
j16r wants to merge 1 commit intoapache:masterfrom
j16r:bugfix/poll_zombie_process

Conversation

@j16r
Copy link
Contributor

@j16r j16r commented Dec 5, 2017

JIRA

https://issues.apache.org/jira/browse/AIRFLOW-1885

Description

  • Here are some details about my PR, including screenshots of any UI changes:

If while trying to obtain a list of ready gunicorn workers, one of them
becomes a zombie, psutil.cmdline returns [] (see here:
https://github.com/giampaolo/psutil/blob/release-4.2.0/psutil/_pslinux.py#L1007)

Boom:

Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 28, in <module>
    args.func(args)
  File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 803, in webserver
    restart_workers(gunicorn_master_proc, num_workers)
  File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 687, in restart_workers
    num_ready_workers_running = get_num_ready_workers_running(gunicorn_master_proc)
  File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 663, in get_num_ready_workers_running
    proc for proc in workers
  File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 664, in <listcomp>
    if settings.GUNICORN_WORKER_READY_PREFIX in proc.cmdline()[0]
IndexError: list index out of range

So ensure a cmdline is actually returned before doing the cmdline prefix
check in ready_prefix_on_cmdline.

Also treats psutil.NoSuchProcess error as non ready worker.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

My PR adds 4 unit tests for the get_num_ready_workers_running function within cli.py, this module previously had no unit testing and I moved get_num_ready_workers_running into a higher scope in order to test it.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":

    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"
  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@bolkedebruin
Copy link
Contributor

Thanks for the contribution. Please follow the commit guidelines.

@j16r
Copy link
Contributor Author

j16r commented Dec 5, 2017

@bolkedebruin yup! just working backwards...

@j16r j16r changed the title Fix access of cmdline when a gunicorn workers becomes a zombie [AIRFLOW-1885] Fix access of cmdline when a gunicorn workers becomes a zombie Dec 5, 2017
@codecov-io
Copy link

codecov-io commented Dec 6, 2017

Codecov Report

Merging #2844 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2844      +/-   ##
==========================================
+ Coverage   73.98%   74.03%   +0.05%     
==========================================
  Files         160      160              
  Lines       12153    12161       +8     
==========================================
+ Hits         8991     9003      +12     
+ Misses       3162     3158       -4
Impacted Files Coverage Δ
airflow/bin/cli.py 54.92% <100%> (+1.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02ff8ae...e8ec908. Read the comment docs.

@j16r
Copy link
Contributor Author

j16r commented Dec 11, 2017

:bowtie:

If while trying to obtain a list of ready gunicorn workers, one of them
becomes a zombie, psutil.cmdline returns [] (see here:
https://github.com/giampaolo/psutil/blob/release-4.2.0/psutil/_pslinux.py#L1007)

Boom:

    Traceback (most recent call last):
      File "/usr/local/bin/airflow", line 28, in <module>
        args.func(args)
      File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 803, in webserver
        restart_workers(gunicorn_master_proc, num_workers)
      File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 687, in restart_workers
        num_ready_workers_running = get_num_ready_workers_running(gunicorn_master_proc)
      File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 663, in get_num_ready_workers_running
        proc for proc in workers
      File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 664, in <listcomp>
        if settings.GUNICORN_WORKER_READY_PREFIX in proc.cmdline()[0]
    IndexError: list index out of range

So ensure a cmdline is actually returned before doing the cmdline prefix
check in ready_prefix_on_cmdline.

Also:

 * Treat psutil.NoSuchProcess error as non ready worker
 * Add in tests for get_num_ready_workers_running
@asfgit asfgit closed this in be54f04 Dec 12, 2017
Acehaidrey pushed a commit to Acehaidrey/incubator-airflow that referenced this pull request Jan 19, 2018
If while trying to obtain a list of ready gunicorn
workers, one of them
becomes a zombie, psutil.cmdline returns [] (see
here:
https://github.com/giampaolo/psutil/blob/release-4
.2.0/psutil/_pslinux.py#L1007)

Boom:

    Traceback (most recent call last):
      File "/usr/local/bin/airflow", line 28, in
<module>
        args.func(args)
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 803, in
webserver
        restart_workers(gunicorn_master_proc, num_workers)
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 687, in
restart_workers
        num_ready_workers_running = get_num_ready_workers_
running(gunicorn_master_proc)
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 663, in
get_num_ready_workers_running
        proc for proc in workers
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 664, in
<listcomp>
        if settings.GUNICORN_WORKER_READY_PREFIX in
proc.cmdline()[0]
    IndexError: list index out of range

So ensure a cmdline is actually returned before
doing the cmdline prefix
check in ready_prefix_on_cmdline.

Also:

 * Treat psutil.NoSuchProcess error as non ready
worker
 * Add in tests for get_num_ready_workers_running

Closes apache#2844 from j16r/bugfix/poll_zombie_process
hougs pushed a commit to stitchfix/incubator-airflow that referenced this pull request Jun 29, 2018
If while trying to obtain a list of ready gunicorn
workers, one of them
becomes a zombie, psutil.cmdline returns [] (see
here:
https://github.com/giampaolo/psutil/blob/release-4
.2.0/psutil/_pslinux.py#L1007)

Boom:

    Traceback (most recent call last):
      File "/usr/local/bin/airflow", line 28, in
<module>
        args.func(args)
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 803, in
webserver
        restart_workers(gunicorn_master_proc, num_workers)
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 687, in
restart_workers
        num_ready_workers_running = get_num_ready_workers_
running(gunicorn_master_proc)
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 663, in
get_num_ready_workers_running
        proc for proc in workers
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 664, in
<listcomp>
        if settings.GUNICORN_WORKER_READY_PREFIX in
proc.cmdline()[0]
    IndexError: list index out of range

So ensure a cmdline is actually returned before
doing the cmdline prefix
check in ready_prefix_on_cmdline.

Also:

 * Treat psutil.NoSuchProcess error as non ready
worker
 * Add in tests for get_num_ready_workers_running

Closes apache#2844 from j16r/bugfix/poll_zombie_process
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants