Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate proccess's when experiencing a fatal error in ductape runner #323

Merged
merged 5 commits into from Jun 16, 2022

Conversation

imcdo
Copy link
Member

@imcdo imcdo commented Jun 8, 2022

closes #322
Simply terminates process when an exception is caught in the run cycle

@imcdo imcdo requested a review from a team June 8, 2022 18:04
@CLAassistant
Copy link

CLAassistant commented Jun 8, 2022

CLA assistant check
All committers have signed the CLA.

@imcdo
Copy link
Member Author

imcdo commented Jun 8, 2022

tested by running ducktape --max-parallel 10000 --repeat 6 --test-runner-timeout 1 systests/cluster/test_runner_operations.py against a vagrant cluster on top of pytests.

Copy link
Member

@stan-is-hate stan-is-hate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ian!

tests/runner/check_runner.py Show resolved Hide resolved
self.service = SimpleEchoService(self.test_context)

@cluster(num_nodes=1)
def timeout_test(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably want to test with multiple tests in flight - some scheduled in parallel, some still yet to schedule (maybe just run for all systests?)

Copy link
Member Author

@imcdo imcdo Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well i run in parallel for the unit test as well, but yes ran in systest with some tests yet to schedule etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in #323 you're saying that you've tested with test_runner_operations, which is a single test method - maybe worth testing with simply systests folder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for sure ill give it a run, ran it with repeat to simulate a bunch of test being run but yeah lets get the complete coverage.

Copy link
Member

@stan-is-hate stan-is-hate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, thanks Ian! Just please run it with all of ducktape systests, maybe also play with test sizes to put the fatal test in the middle of the run or parallel with other tests etc

@stan-is-hate
Copy link
Member

Also, this PR cannot be built, please don't merge without fixing that 😄

@@ -210,6 +210,9 @@ def run_all_tests(self):
self._log(logging.ERROR, err_str)

# All processes are on the same machine, so treat communication failure as a fatal error
for proc in self._client_procs.values():
proc.terminate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also compare this to https://github.com/confluentinc/ducktape/blob/master/ducktape/tests/runner.py#L124 which uses os.kill vs terminate() - what's the difference and pros/cons?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would probably be good to have a unified cleanup_child_processes method or smth

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you read the docs for terminate:

Terminate the process. On Unix this is done using the SIGTERM signal; on Windows TerminateProcess()

which seems to be more platform agnostic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about modifying that other line too then? Can be a separate PR though.

Copy link
Member Author

@imcdo imcdo Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah might be best to touch it in another pr with more testing.

@imcdo imcdo merged commit a214102 into confluentinc:0.7.x Jun 16, 2022
andrewhsu pushed a commit to andrewhsu/ducktape that referenced this pull request Mar 22, 2023
confluentinc#323)

* update test runner

* update docstring

* readd newline

* add a simple test to run against

* fix formating

(cherry picked from commit a214102)
gousteris pushed a commit to gousteris/ducktape that referenced this pull request Aug 30, 2023
confluentinc#323)

* update test runner

* update docstring

* readd newline

* add a simple test to run against

* fix formating
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants