New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue150 force termination upon pgctl stop/restart #181
Issue150 force termination upon pgctl stop/restart #181
Conversation
pgctl/cli.py
Outdated
@@ -517,6 +520,10 @@ def parser(): | |||
'--json', action='store_true', default=False, | |||
help='output in JSON (only supported by some commands)', | |||
) | |||
parser.add_argument( | |||
'--force', action='store_true', default=False, | |||
help='forcefully terminate zombie processes that prevent services from starting/stopping', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wouldn't call these zombie processes, "zombie processes" is a specific thing in unix terminology (processes which have exited but not been reaped)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "running-away-from-supervisor processes"? Or suggestions I can get some help please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's some messageing already in pgctl calling them "run-away processes", I think.
pgctl/service.py
Outdated
"""attempt to show the user a better message on failure, and handle the race condition""" | ||
def handle_race(path): | ||
if force_release: | ||
terminate_runaway_processes(path) | ||
show_runaway_processes(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be better to do this from within show-runaway-processes, after listing them to the user.
Maybe?
tests/unit/functions.py
Outdated
def it_kills_processes_holding_the_lock(self, tmpdir, mock_subprocess_call): | ||
lockfile = tmpdir.ensure('lock') | ||
lock = lockfile.open() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd instead set the lock fd as inheritable, Popen a 'sleep infinity' (which holds the lock forever), then assert that the sleep died after calling terminate-runaway-processes.
tests/spec/dirty_tests.py
Outdated
@@ -50,7 +50,7 @@ class DirtyTest(object): | |||
\\S+ +\\d+ +\\d+ +\\d+ +\\d+ +\\d+ +\\S+ +\\S+ +\\S+ +\\S+ +{cmd} | |||
|
|||
There are two ways you can fix this: | |||
\\* temporarily: lsof -t playground/{service} | xargs kill -9 | |||
\\* temporarily: pgctl-fuser playground/{service} | xargs kill -9 | |||
\\* permanently: http://pgctl.readthedocs.org/en/latest/user/quickstart.html#writing-playground-services | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of --force, show this message, but instead of explaining the workarounds, say "Killing forcefully at user's request." or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the link to the wiki actually. It's where I explain how to fix this problem "for real".
''', | ||
0, | ||
norm=norm.pgctl, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file this under DescribeSlowShutdown.it_succeeds_with_force()
.
It should be exactly the same as the it_fails_by_default
test, but showing the difference if we use --force.
I don't believe you should allow pgctl to use kill -9 without throwing some kind of warning.
More specifics below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will treat "--force" as a permission from users that we can do something with force; but pgctl should still exist with error (i.e., return code = 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
erm I think it's a success if you do what the user asked.
But showing a warning message is still appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bukzor++ if the command does not encounter errors, it should probably exit 0.
The CircleCI test failure seems flaky :( I invoked a build on my local branch, and the tests pass while the coverage fails. |
@smallredbean: Great work! It's ready to ship as far as I can see. As long as you show that the test suite can pass, that's good enough for me. Ship It! 🚢 |
One question arises... While the current "--force" option only kills runaway processes (processes running away from their s6 supervisors, which have died), should we consider killing slow-responding processes? These processes take longer than expected to react to I'd vote for cleaning them up. |
pgctl/functions.py
Outdated
call( | ||
('kill', '-9',) + tuple([str(pid) for pid in pids]), | ||
stderr=open(os.devnull, 'w'), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an os.kill
which usually beats subprocess
shipit |
No description provided.