New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terminate proccess's when experiencing a fatal error in ductape runner #323
Conversation
tested by running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ian!
self.service = SimpleEchoService(self.test_context) | ||
|
||
@cluster(num_nodes=1) | ||
def timeout_test(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably want to test with multiple tests in flight - some scheduled in parallel, some still yet to schedule (maybe just run for all systests?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well i run in parallel for the unit test as well, but yes ran in systest with some tests yet to schedule etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #323 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in #323 you're saying that you've tested with test_runner_operations
, which is a single test method - maybe worth testing with simply systests
folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah for sure ill give it a run, ran it with repeat to simulate a bunch of test being run but yeah lets get the complete coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, thanks Ian! Just please run it with all of ducktape systests, maybe also play with test sizes to put the fatal test in the middle of the run or parallel with other tests etc
Also, this PR cannot be built, please don't merge without fixing that 😄 |
@@ -210,6 +210,9 @@ def run_all_tests(self): | |||
self._log(logging.ERROR, err_str) | |||
|
|||
# All processes are on the same machine, so treat communication failure as a fatal error | |||
for proc in self._client_procs.values(): | |||
proc.terminate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also compare this to https://github.com/confluentinc/ducktape/blob/master/ducktape/tests/runner.py#L124 which uses os.kill
vs terminate()
- what's the difference and pros/cons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would probably be good to have a unified cleanup_child_processes
method or smth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you read the docs for terminate:
Terminate the process. On Unix this is done using the SIGTERM signal; on Windows TerminateProcess()
which seems to be more platform agnostic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about modifying that other line too then? Can be a separate PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah might be best to touch it in another pr with more testing.
confluentinc#323) * update test runner * update docstring * readd newline * add a simple test to run against * fix formating (cherry picked from commit a214102)
confluentinc#323) * update test runner * update docstring * readd newline * add a simple test to run against * fix formating
closes #322
Simply terminates process when an exception is caught in the run cycle