Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue150 force termination upon pgctl stop/restart #181

Merged

Conversation

yalinhuang
Copy link
Contributor

No description provided.

pgctl/cli.py Outdated
@@ -517,6 +520,10 @@ def parser():
'--json', action='store_true', default=False,
help='output in JSON (only supported by some commands)',
)
parser.add_argument(
'--force', action='store_true', default=False,
help='forcefully terminate zombie processes that prevent services from starting/stopping',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wouldn't call these zombie processes, "zombie processes" is a specific thing in unix terminology (processes which have exited but not been reaped)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "running-away-from-supervisor processes"? Or suggestions I can get some help please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's some messageing already in pgctl calling them "run-away processes", I think.

pgctl/service.py Outdated
"""attempt to show the user a better message on failure, and handle the race condition"""
def handle_race(path):
if force_release:
terminate_runaway_processes(path)
show_runaway_processes(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to do this from within show-runaway-processes, after listing them to the user.

Maybe?

def it_kills_processes_holding_the_lock(self, tmpdir, mock_subprocess_call):
lockfile = tmpdir.ensure('lock')
lock = lockfile.open()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd instead set the lock fd as inheritable, Popen a 'sleep infinity' (which holds the lock forever), then assert that the sleep died after calling terminate-runaway-processes.

@@ -50,7 +50,7 @@ class DirtyTest(object):
\\S+ +\\d+ +\\d+ +\\d+ +\\d+ +\\d+ +\\S+ +\\S+ +\\S+ +\\S+ +{cmd}

There are two ways you can fix this:
\\* temporarily: lsof -t playground/{service} | xargs kill -9
\\* temporarily: pgctl-fuser playground/{service} | xargs kill -9
\\* permanently: http://pgctl.readthedocs.org/en/latest/user/quickstart.html#writing-playground-services

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of --force, show this message, but instead of explaining the workarounds, say "Killing forcefully at user's request." or something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the link to the wiki actually. It's where I explain how to fix this problem "for real".

''',
0,
norm=norm.pgctl,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file this under DescribeSlowShutdown.it_succeeds_with_force().
It should be exactly the same as the it_fails_by_default test, but showing the difference if we use --force.

I don't believe you should allow pgctl to use kill -9 without throwing some kind of warning.
More specifics below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will treat "--force" as a permission from users that we can do something with force; but pgctl should still exist with error (i.e., return code = 1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

erm I think it's a success if you do what the user asked.
But showing a warning message is still appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bukzor++ if the command does not encounter errors, it should probably exit 0.

@yalinhuang
Copy link
Contributor Author

The CircleCI test failure seems flaky :( I invoked a build on my local branch, and the tests pass while the coverage fails.

@bukzor
Copy link
Contributor

bukzor commented Feb 25, 2017

@smallredbean: Great work! It's ready to ship as far as I can see.

As long as you show that the test suite can pass, that's good enough for me.
There's a known flakiness with pytest-xdist (multiprocessing) and coverage, but I thought @0xngold turned off parallelism during CI.

Ship It! 🚢

@yalinhuang
Copy link
Contributor Author

One question arises...

While the current "--force" option only kills runaway processes (processes running away from their s6 supervisors, which have died), should we consider killing slow-responding processes? These processes take longer than expected to react to pgctl stop but they are still under a s6 supervisor. That is, after pgctl stop yelling "ERROR: service 'foo' failed to stop after bar seconds", pgctl status shows foo in ready state.

I'd vote for cleaning them up.

call(
('kill', '-9',) + tuple([str(pid) for pid in pids]),
stderr=open(os.devnull, 'w'),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an os.kill which usually beats subprocess

@bukzor
Copy link
Contributor

bukzor commented Mar 8, 2017

shipit

@0xngold 0xngold merged commit 8d9bdd8 into Yelp:master Mar 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants