Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate proccess's when experiencing a fatal error in ductape runner #323

Merged
merged 5 commits into from
Jun 16, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions ducktape/tests/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,9 @@ def run_all_tests(self):
self._log(logging.ERROR, err_str)

# All processes are on the same machine, so treat communication failure as a fatal error
for proc in self._client_procs.values():
proc.terminate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also compare this to https://github.com/confluentinc/ducktape/blob/master/ducktape/tests/runner.py#L124 which uses os.kill vs terminate() - what's the difference and pros/cons?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would probably be good to have a unified cleanup_child_processes method or smth

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you read the docs for terminate:

Terminate the process. On Unix this is done using the SIGTERM signal; on Windows TerminateProcess()

which seems to be more platform agnostic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about modifying that other line too then? Can be a separate PR though.

Copy link
Member Author

@imcdo imcdo Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah might be best to touch it in another pr with more testing.

self._client_procs = {}
raise
except KeyboardInterrupt:
# If SIGINT is received, stop triggering new tests, and let the currently running tests finish
Expand Down
54 changes: 54 additions & 0 deletions systests/cluster/test_runner_operations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Copyright 2022 Confluent Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.cluster.cluster_spec import ClusterSpec, WINDOWS, LINUX, NodeSpec
from ducktape.services.service import Service
from ducktape.tests.test import Test
from ducktape.errors import TimeoutError
from ducktape.mark.resource import cluster
import time




class SimpleEchoService(Service):
"""Simple service that allocates one node for performing tests of RemoteAccount functionality"""
logs = {
"my_log": {
"path": "/tmp/log",
"collect_default": True
},
}

def __init__(self, context):
super(SimpleEchoService, self).__init__(context, num_nodes=1)
self.count = 0

def echo(self):
self.nodes[0].account.ssh("echo {} >> /tmp/log".format(self.count))
self.count += 1

class SimpleRunnerTest(Test):
def setup(self):
self.service = SimpleEchoService(self.test_context)

@cluster(num_nodes=1)
def timeout_test(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably want to test with multiple tests in flight - some scheduled in parallel, some still yet to schedule (maybe just run for all systests?)

Copy link
Member Author

@imcdo imcdo Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well i run in parallel for the unit test as well, but yes ran in systest with some tests yet to schedule etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in #323 you're saying that you've tested with test_runner_operations, which is a single test method - maybe worth testing with simply systests folder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for sure ill give it a run, ran it with repeat to simulate a bunch of test being run but yeah lets get the complete coverage.

"""
a simple longer running test to test special run flags agaisnt.
"""
self.service.start()

while self.service.count < 100000000:
self.service.echo()
time.sleep(.2)
18 changes: 17 additions & 1 deletion tests/runner/check_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
except ImportError:
from mock import patch, MagicMock # noqa: F401

import pytest
from ducktape.tests.runner_client import RunnerClient
from ducktape.tests.test import TestContext
from ducktape.tests.runner import TestRunner
Expand All @@ -29,7 +30,7 @@
from .resources.test_thingy import ClusterTestThingy, TestThingy
from .resources.test_failing_tests import FailingTest
from ducktape.tests.reporter import JUnitReporter

from ducktape.errors import TimeoutError

from mock import Mock
import os
Expand Down Expand Up @@ -227,3 +228,18 @@ def check_run_failure_with_bad_cluster_allocation(self):
assert results.num_failed == 1
assert results.num_passed == 1
assert results.num_ignored == 0

def check_runner_timeout(self):
"""Check process cleanup and error handling in a parallel runner client run."""
mock_cluster = LocalhostCluster(num_nodes=1000)
session_context = tests.ducktape_mock.session_context(max_parallel=1000, test_runner_timeout=1 )

test_methods = [TestThingy.test_delayed, TestThingy.test_failure]
ctx_list = self._do_expand(test_file=TEST_THINGY_FILE, test_class=TestThingy, test_methods=test_methods,
cluster=mock_cluster, session_context=session_context)
runner = TestRunner(mock_cluster, session_context, Mock(), ctx_list, 1)

with pytest.raises(TimeoutError):
stan-is-hate marked this conversation as resolved.
Show resolved Hide resolved
results = runner.run_all_tests()

assert not runner._client_procs
4 changes: 4 additions & 0 deletions tests/runner/resources/test_thingy.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from time import time
from ducktape.cluster.cluster_spec import ClusterSpec
from ducktape.tests.test import Test
from ducktape.mark import ignore, parametrize
Expand All @@ -30,6 +31,9 @@ def min_cluster_spec(self):

def test_pi(self):
return {"data": 3.14159}

def test_delayed(self):
time.sleep(1)

@ignore
def test_ignore1(self):
Expand Down