Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add default cancel_callback to handle common signals #862

Merged

Conversation

jladdjr
Copy link
Contributor

@jladdjr jladdjr commented Oct 5, 2021

Fixes #861

--

If you invoke runner as a python library, you have the option of specifying a cancel_callback (example). When calling runner via the CLI, however, there isn't an option to pass in a cancel_callback, so it defaults to being left undefined.

For the CLI, it seems like it would be reasonable to default to using a cancel_callback with a signal handler that would catch SIGINT and SIGTERM. This would allow runner to recognize an incoming signal as a request to gracefully shut down the running job, including ensuring that a container spun up as part of process isolation was stopped.

This PR updates runner to use a default cancel_callback when none is provided. The default cancel_callback catches both SIGINT and SIGTERM and gives the runner process (and other spawned processes, such as those started by a container runtime) a chance to exit gracefully.

steps to reproduce

  1. Modify demo/project/test.yml so that it runs a pause task w/ minutes: 90
  2. ansible-runner run --process-isolation --process-isolation-executable podman -p test.yml demo
  3. pkill ansible-runner
  4. Verify container continues running even after ansible-runner process exits.

@jladdjr jladdjr force-pushed the add_default_cancel_callback_for_cli branch from 1ae7d69 to f273b43 Compare October 5, 2021 22:03
@jladdjr jladdjr force-pushed the add_default_cancel_callback_for_cli branch from f273b43 to 50a676e Compare October 5, 2021 22:05
@AlanCoding
Copy link
Member

👍 This appears to me to be doing the right thing.

There's a lot of testing work still to be done, of course, but it's important to show people the WIP to set expectations. I could see this being the final patch to the code itself.

@AlanCoding
Copy link
Member

I wrote a test here expecting to reproduce the original bug

https://github.com/ansible/ansible-runner/compare/devel...AlanCoding:container_kill_test?expand=1

However, this test was not able to replicate the issue. When the ansible-runner process is killed, the container dies with it. That is without this patch.

Co-authored-by: Alan Rominger <arominge@redhat.com>
@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 6, 2021

@AlanCoding made a few tweaks to the test that were needed for it to run for me:

  • added --process-isolation flags so that when we run ansible-runner start we ensure that podman is used
  • instead of podman pdd, used sleep pdd (only place I saw the sleep.yml playbook in the repo)

With those changes I found that the test passed on devel runner (w/out my patch). Suspecting that this might be a matter of timing, I added a five second sleep after we call ansible-runner start. Sure enough, this resulted in the test failing 80% of the time (the same results we saw when running a test at the controller level).

I think that even with the five second sleep, the test still represents a very real situation that users could find themselves in. If anything, it seems likely that a user would cancel a job once it's well underway, moreso than cancelling an instant after it launched.

@AlanCoding I went ahead and added your changeset into this PR. We can continue polishing it as needed, but it seems like a reliable reproducer of the issue we're trying to address here.

ansible_runner/interface.py Outdated Show resolved Hide resolved
@jladdjr jladdjr marked this pull request as ready for review October 7, 2021 02:37
@jladdjr jladdjr requested a review from a team as a code owner October 7, 2021 02:37
@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 7, 2021

@samdoran @Shrews - Wanted to check in with you to see if you have any feedback / concerns about getting this in. This PR helps address a scenario we're seeing with awx where receptor sends an ansible-runner process a SIGTERM which results in the container started by runner to be orphaned. The net effect is that the awx job is listed as cancelled despite a container continuing to execute the associated job. We would make use of a cancel_callback, except that runner is being invoked via CLI by receptor. Since that's not an option, thought we could introduce signal-handling behavior as a sane default implementation of the cancel_callback. Let us know what you think.

Jim Ladd added 2 commits October 6, 2021 20:34
Copy link
Contributor

@samdoran samdoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add unit/functional tests to ensure the correct the cancel callback is set properly and that the handler is called as expected.

ansible_runner/interface.py Outdated Show resolved Hide resolved
Comment on lines 103 to 104
logger.warning("Unable to set cancel_callback for signal handling; not in main thread")
cancel_callback = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially be somewhat noisy. Anytime this is run in a thread using the default cancel callback, this warning will be emitted. Maybe use a Sentinel or some other check to make sure this is emitted only once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I didn't want to silently disable the default, but I can see how this could get to be a bit much. I'll see if there's something I can do here.

Copy link
Contributor Author

@jladdjr jladdjr Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wrestling with this one a bit. I tried writing this up so that we had a flag indicating whether or not the warning has been issued already. But then I wondered, where would that flag need to be set so that separate runs were retrieving the same flag? In the scenario you outlined the warning is being generated anytime we're in a new thread. That seems to suggest that when we're in a thread, we need to set a flag so that future threads know not to send a warning. It seems like this would require sending a message via a queue, I'm not sure. Again, I'm just not exactly sure how we would do this. Do you have thoughts on how to make this work? How strongly do you feel about having this warning? It seems like it might be fine to leave it out. Just adding the signal handling behavior by itself is a big improvement, even if we can't always set one (and even if we don't notify the user that they're missing out).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Pending feedback)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible_runner/interface.py Outdated Show resolved Hide resolved
ansible_runner/interface.py Outdated Show resolved Hide resolved
test/integration/containerized/test_cli_containerized.py Outdated Show resolved Hide resolved
ansible_runner/utils/__init__.py Outdated Show resolved Hide resolved
@AlanCoding
Copy link
Member

I think I've about finalized my independent testing script:

rm -rf /tmp/ansible-runner
cd /tmp
git clone https://github.com/jladdjr/ansible-runner.git
# git clone https://github.com/alancoding/ansible-runner.git
cd ansible-runner
git checkout add_default_cancel_callback_for_cli
# git checkout container_kill_test

python3 -m venv env
source env/bin/activate

pip install -e .
pip install ansible-core
pip install mock

for i in `seq 10`; do py.test test/integration -k test_cli_kill_cleanup -n0 > /dev/null; echo $?; done
deactivate

I run 2 variations of this, where it runs the test in the branch I linked above, or it runs the tests as a part of this PR.

Running my branch (with the test but without the fix) produces about 1 or 2 failures over the 10 runs. These failures are failures to teardown the container. Running with this branch consistently gives 0 failures. That's the confirmation I was looking for to say that this does what was intended.

Of course this is all with podman (what we care about for AWX).

@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 7, 2021

@samdoran I think I've gone through most of the feedback at this point. Still waiting on some unit test results locally and from zuul atm.

It doesn't look like I have perms needed to mark discussions as resolved so I'll go through and drop a ✅ in places where I believe things have been addressed.

@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 7, 2021

Only see two items of feedback left here:

@samdoran any thoughts on the warning flag? ^

@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 7, 2021

@samdoran - @thenets and I started testing the newest implementation using the threading / signal library and are seeing flakiness. The unit test that we added here has gone from passing 100% of the time to passing 55% of the time. I'm not exactly sure where the flakiness is coming from, but I was wondering how strongly you felt about taking this approach compared to the original one using SignalHandler?

For the purposes of testing, going to revert back to the Signal Handler approach. The threading / signal approach that I tried out today can be seen at 303794c.

@jladdjr jladdjr force-pushed the add_default_cancel_callback_for_cli branch 2 times, most recently from b920ce4 to 303794c Compare October 7, 2021 23:54
@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 8, 2021

@samdoran - took a break and came back to make fresh pass at the tests again. Very carefully re-ran the tests for the new version of cancel_callback using the threading / signal libraries and can confirm that I got a clean set of results this time, both from the runner unit tests and from awx testing. Here are ten back-to-back test runs of the new unit test added in this PR:

image

So, I think this is looking good. I reviewed everything and think the PR should be good-to-go again. So, the only remaining item to revisit (apart from a general review of how I implemented your suggested signal handler) is:

@@ -34,7 +34,8 @@
from ansible_runner.utils import (
dump_artifacts,
check_isolation_executable_installed,
santize_json_response
santize_json_response,
signal_handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing commas make future diffs easier to read.

Suggested change
signal_handler
signal_handler,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 12, 2021

I'm not able to get them to pass at all on macOS with Docker. This may not be an issue with the test but an issue with ansible-runner start on macOS since running the same command run by the test also does nothing.

@samdoran - can you provide more details on what went wrong here?

@jladdjr jladdjr force-pushed the add_default_cancel_callback_for_cli branch from e7cab39 to 0d7ee0e Compare October 12, 2021 04:29
@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 12, 2021

ansible-runner-build-container-image-stable-2.9 zuul check failing:

podman push zuul-jobs.buildset-registry:5000/quay.io/ansible/ansible-runner:stable-2.9-devel
...
Copying blob sha256:25d101049a9c1ac3507bfd2306116336482ca8964a9f29d99ba19ee9e3f1fa42
time="2021-10-12T05:07:54Z" level=warning msg="failed, retrying in 1s ... (1/3). Error: writing blob: Patch \"https://zuul-jobs.buildset-registry:5000/v2/quay.io/ansible/ansible-runner/blobs/uploads/d425b3662e234b168166ecea81645aeb\": write tcp 162.253.43.20:59688->162.253.42.209:5000: write: connection timed out"
Getting image source signatures
Error: trying to reuse blob sha256:74ddd0ec08fa43d09f32636ba91a0a3053b02cb4627c35051aff89f853606b59 at destination: pinging container registry zuul-jobs.buildset-registry:5000: Get "https://zuul-jobs.buildset-registry:5000/v2/": dial tcp 162.253.42.209:5000: i/o timeout

looks like maybe it was unable to reach the container registry while trying to push an image

@jladdjr jladdjr force-pushed the add_default_cancel_callback_for_cli branch from 92fc156 to 34d0e76 Compare October 12, 2021 08:08
@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 12, 2021

@samdoran @Shrews - all feedback addressed, let me know if there's anything else you think should be tweaked.

@shanemcd
Copy link
Member

recheck

Comment on lines +77 to +79
# clean up settings artifacts generated by test
if os.path.exists(env_dir):
shutil.rmtree(env_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is necessary, then the test is doing something wrong. At least add a FIXME note here that this can be removed once #847 is merged.

Copy link
Contributor

@samdoran samdoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current integration test fails on macOS due to an unrelated issue with the daemon library (just a guess). I would rather not merge tests that do not pass locally.

I think we also need unit tests of signal_handler() and functional tests to make sure the signals are handled as expected.

If we decide to keep this integration test, it needs a skipif so it will be skipped on platforms where the tests do not work.

@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 12, 2021

The current integration test fails on macOS due to an unrelated issue with the daemon library (just a guess). I would rather not merge tests that do not pass locally.

Can you provide some more details on the failure? How are you invoking the test? I needed to run with pytest -n 0 to ensure serial execution. I believe this is how zuul invokes serial tests. What specific pytest failure are you seeing? You mention a daemon library, so I'm assuming this fails for docker? Are you seeing failures for both container runtimes? Are you running the latest version of docker / podman (this made a significant difference for me). When the failure happens, what is the current state of any running containers? (I typically run a watch <container runtime> ps while the test is running so I can also observe what is happening).

The two main points where I've seen an assertion fail in the test fail are:

  1. Test fails because it is not seeing a container spun up when runner is first invoked.
  2. Container is successfully started but fails to exit when we send a signal to the ansible-runner process.

@AlanCoding I believe that you are running with a docker for mac setup. Can you confirm if the tests fail for you when the fix is in place?

@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 12, 2021

@samdoran has mentioned that the macos failures are likely unrelated to this PR. Given that this fix addresses a significant error we're seeing in awx (cancel a job, job continues running in EE container), recommend we get this in as soon as we can and not block on existing / unrelated failures.

@samdoran
Copy link
Contributor

Running tox -e py test/integration/containerized/test_cli_containerized.py -- -n0 results in the following failure:

FAILED test/integration/containerized/test_cli_containerized.py::test_cli_kill_cleanup[docker] - Exception: Timeout waiting for confirm ansible-runner started container
> tox -e py test/integration/containerized/test_cli_containerized.py -- -n0
/Users/sdoran/Source/ansible-runner/.tox/py/lib/python3.10/site-packages/setuptools/dist.py:487: UserWarning: Normalizing '2.0.0.0alpha4.dev249' to '2.0.0.0a4.dev249'
  warnings.warn(tmpl.format(**locals()))
py develop-inst-noop: /Users/sdoran/Source/ansible-runner
py installed: ansible-core==2.11.5,-e git+ssh://git@github.com/ansible/ansible-runner.git@34d0e7639cec4e77a6645058fa67a67f84b8e9a8#egg=ansible_runner,attrs==21.2.0,cffi==1.14.6,cryptography==35.0.0,docutils==0.17.1,execnet==1.9.0,flake8==3.9.2,iniconfig==1.1.1,Jinja2==3.0.2,lockfile==0.12.2,MarkupSafe==2.0.1,mccabe==0.6.1,packaging==21.0,pathspec==0.9.0,pexpect==4.8.0,pluggy==1.0.0,ptyprocess==0.7.0,py==1.10.0,pycodestyle==2.7.0,pycparser==2.20,pyflakes==2.3.1,pyparsing==2.4.7,pytest==6.2.5,pytest-forked==1.3.0,pytest-mock==3.6.1,pytest-repeat==0.9.1,pytest-timeout==1.4.2,pytest-xdist==2.4.0,python-daemon==2.3.0,PyYAML==5.4.1,resolvelib==0.5.4,six==1.16.0,toml==0.10.2,yamllint==1.26.3
py run-test-pre: PYTHONHASHSEED='1417195590'
py run-test: commands[0] | pytest -m 'not serial' test/integration/containerized/test_cli_containerized.py -n0
====================================================================================== test session starts =======================================================================================
platform darwin -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- /Users/sdoran/Source/ansible-runner/.tox/py/bin/python
cachedir: .tox/py/.pytest_cache
rootdir: /Users/sdoran/Source/ansible-runner, configfile: pytest.ini
plugins: mock-3.6.1, xdist-2.4.0, repeat-0.9.1, timeout-1.4.2, forked-1.3.0
collected 5 items / 5 deselected

===================================================================================== 5 deselected in 0.02s ======================================================================================
ERROR: InvocationError for command /Users/sdoran/Source/ansible-runner/.tox/py/bin/pytest -m 'not serial' test/integration/containerized/test_cli_containerized.py -n0 (exited with code 5)
____________________________________________________________________________________________ summary _____________________________________________________________________________________________
ERROR:   py: commands failed
sdoran@mbp-2018 ~/S/ansible-runner (PR/862) [1]> vim tox.ini
sdoran@mbp-2018 ~/S/ansible-runner (PR/862)> tox -e py test/integration/containerized/test_cli_containerized.py -- -n0
/Users/sdoran/Source/ansible-runner/.tox/py/lib/python3.10/site-packages/setuptools/dist.py:487: UserWarning: Normalizing '2.0.0.0alpha4.dev249' to '2.0.0.0a4.dev249'
  warnings.warn(tmpl.format(**locals()))
py develop-inst-noop: /Users/sdoran/Source/ansible-runner
py installed: ansible-core==2.11.5,-e git+ssh://git@github.com/ansible/ansible-runner.git@34d0e7639cec4e77a6645058fa67a67f84b8e9a8#egg=ansible_runner,attrs==21.2.0,cffi==1.14.6,cryptography==35.0.0,docutils==0.17.1,execnet==1.9.0,flake8==3.9.2,iniconfig==1.1.1,Jinja2==3.0.2,lockfile==0.12.2,MarkupSafe==2.0.1,mccabe==0.6.1,packaging==21.0,pathspec==0.9.0,pexpect==4.8.0,pluggy==1.0.0,ptyprocess==0.7.0,py==1.10.0,pycodestyle==2.7.0,pycparser==2.20,pyflakes==2.3.1,pyparsing==2.4.7,pytest==6.2.5,pytest-forked==1.3.0,pytest-mock==3.6.1,pytest-repeat==0.9.1,pytest-timeout==1.4.2,pytest-xdist==2.4.0,python-daemon==2.3.0,PyYAML==5.4.1,resolvelib==0.5.4,six==1.16.0,toml==0.10.2,yamllint==1.26.3
py run-test-pre: PYTHONHASHSEED='999930756'
py run-test: commands[0] | pytest -n 0 -m serial test/integration/containerized/test_cli_containerized.py -n0
====================================================================================== test session starts =======================================================================================
platform darwin -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- /Users/sdoran/Source/ansible-runner/.tox/py/bin/python
cachedir: .tox/py/.pytest_cache
rootdir: /Users/sdoran/Source/ansible-runner, configfile: pytest.ini
plugins: mock-3.6.1, xdist-2.4.0, repeat-0.9.1, timeout-1.4.2, forked-1.3.0
collected 5 items

test/integration/containerized/test_cli_containerized.py::test_module_run SKIPPED (podman container runtime(s) not available)                                                              [ 20%]
test/integration/containerized/test_cli_containerized.py::test_playbook_run SKIPPED (podman container runtime(s) not available)                                                            [ 40%]
test/integration/containerized/test_cli_containerized.py::test_provide_env_var SKIPPED (podman container runtime(s) not available)                                                         [ 60%]
test/integration/containerized/test_cli_containerized.py::test_cli_kill_cleanup[podman] SKIPPED (podman is unavailable)                                                                    [ 80%]
test/integration/containerized/test_cli_containerized.py::test_cli_kill_cleanup[docker] FAILED                                                                                             [100%]

============================================================================================ FAILURES ============================================================================================
_________________________________________________________________________________ test_cli_kill_cleanup[docker] __________________________________________________________________________________

cli = <function cli.<locals>.run at 0x106f85480>, runtime = 'docker', test_data_dir = '/Users/sdoran/Source/ansible-runner/test/data'

    @pytest.mark.serial
    @pytest.mark.parametrize('runtime', ['podman', 'docker'])
    def test_cli_kill_cleanup(cli, runtime, test_data_dir):
        if shutil.which(runtime) is None:
            pytest.skip(f'{runtime} is unavailable')

        unique_string = str(uuid4()).replace('-', '')
        ident = f'kill_test_{unique_string}'
        pdd = os.path.join(test_data_dir, 'sleep')
        cli_args = ['start', pdd, '-p', 'sleep.yml', '--ident', ident,
                    '--process-isolation', '--process-isolation-executable', runtime]
        cli(cli_args)

        def container_is_running():
            r = cli([runtime, 'ps', '-f', f'name=ansible_runner_{ident}', '--format={{.Names}}'], bare=True)
            return ident in r.stdout

        timeout = 10
>       for _ in iterate_timeout(timeout, 'confirm ansible-runner started container', interval=1):

_          = 9
cli        = <function cli.<locals>.run at 0x106f85480>
cli_args   = ['start', '/Users/sdoran/Source/ansible-runner/test/data/sleep', '-p', 'sleep.yml', '--ident', 'kill_test_a1dbe8595a5a485c8a0c77d665eddb78', ...]
container_is_running = <function test_cli_kill_cleanup.<locals>.container_is_running at 0x106f85510>
ident      = 'kill_test_a1dbe8595a5a485c8a0c77d665eddb78'
pdd        = '/Users/sdoran/Source/ansible-runner/test/data/sleep'
runtime    = 'docker'
test_data_dir = '/Users/sdoran/Source/ansible-runner/test/data'
timeout    = 10
unique_string = 'a1dbe8595a5a485c8a0c77d665eddb78'

test/integration/containerized/test_cli_containerized.py:57:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

max_seconds = 10, purpose = 'confirm ansible-runner started container', interval = 1

    def iterate_timeout(max_seconds, purpose, interval=2):
        start = time.time()
        count = 0
        while (time.time() < start + max_seconds):
            count += 1
            yield count
            time.sleep(interval)
>       raise Exception("Timeout waiting for %s" % purpose)
E       Exception: Timeout waiting for confirm ansible-runner started container

count      = 9
interval   = 1
max_seconds = 10
purpose    = 'confirm ansible-runner started container'
start      = 1634055424.229028

test/utils/common.py:11: Exception
======================================================================================== warnings summary ========================================================================================
test/integration/containerized/test_cli_containerized.py::test_module_run
  /Users/sdoran/Source/ansible-runner/test/integration/conftest.py:41: UserWarning: podman not available
    warnings.warn(UserWarning(f"{runtime} not available"))

-- Docs: https://docs.pytest.org/en/stable/warnings.html
====================================================================================== slowest 10 durations ======================================================================================
11.11s call     test/integration/containerized/test_cli_containerized.py::test_cli_kill_cleanup[docker]

(9 durations < 1s hidden.  Use -vv to show these durations.)
==================================================================================== short test summary info =====================================================================================
SKIPPED [3] test/integration/containerized/test_cli_containerized.py:17: podman container runtime(s) not available
SKIPPED [1] test/integration/containerized/test_cli_containerized.py:43: podman is unavailable
FAILED test/integration/containerized/test_cli_containerized.py::test_cli_kill_cleanup[docker] - Exception: Timeout waiting for confirm ansible-runner started container
============================================================================ 1 failed, 4 skipped, 1 warning in 11.88s ============================================================================
ERROR: InvocationError for command /Users/sdoran/Source/ansible-runner/.tox/py/bin/pytest -n 0 -m serial test/integration/containerized/test_cli_containerized.py -n0 (exited with code 1)
____________________________________________________________________________________________ summary _____________________________________________________________________________________________
ERROR:   py: commands failed

@AlanCoding
Copy link
Member

I'm on Fedora 34 with both podman and docker installed, checking out current branch I'm able to run the docker test.

$ py.test test/ -n0 test/integration/containerized/test_cli_containerized.py -k test_cli_kill_cleanup[docker]
==================================================== test session starts ====================================================
platform linux -- Python 3.9.6, pytest-6.2.5, py-1.9.0, pluggy-0.13.1 -- /home/alancoding/repos/awx/env/bin/python3
cachedir: .pytest_cache
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
ansible: 2.13.0.dev0
rootdir: /home/alancoding/repos/ansible-runner, configfile: pytest.ini
plugins: forked-1.3.0, github-0.3.0, ordering-0.6, xdist-2.2.0, mock-3.6.1, cov-2.12.1, ansible-2.2.4, benchmark-3.2.3
collecting ... collected 0 github issues
collected 2041 items / 2039 deselected / 2 selected                                                                         

test/integration/containerized/test_cli_containerized.py::test_cli_kill_cleanup[docker] PASSED                        [ 50%]
test/integration/containerized/test_cli_containerized.py::test_cli_kill_cleanup[docker] PASSED                        [ 50%]

=================================================== slowest 10 durations ====================================================
6.90s call     test/integration/containerized/test_cli_containerized.py::test_cli_kill_cleanup[docker]
5.76s call     test/integration/containerized/test_cli_containerized.py::test_cli_kill_cleanup[docker]

(4 durations < 1s hidden.  Use -vv to show these durations.)
============================================ 2 passed, 2039 deselected in 13.23s ============================================

Failing waiting for the container to start sounds like it might be either a timing issue, or something more basic isn't working.

@samdoran
Copy link
Contributor

Yup, it works fine on Linux run serially.

@Shrews
Copy link
Contributor

Shrews commented Oct 12, 2021

FYI, any test that uses ansible-runner start, which forks the main process using the python-daemon library, and tries to use a subprocess (e.g., subprocess.Popen()) from within the new forked process to do anything is just not going to work on a Mac. It seems to start the container ok because it defaults to using pexpect.spawn(), but anything else, like killing the running container (which uses subprocess.Popen()) is likely to not work. For that reason, we should skip such tests on OSX.

@jladdjr
Copy link
Contributor Author

jladdjr commented Oct 12, 2021

is just not going to work on a Mac.

@Shrews do you know what it is about OS X that isn't supporting the use of the libraries / calls you mentioned?

@samdoran samdoran added the gate label Oct 12, 2021
Copy link
Contributor

@ansible-zuul ansible-zuul bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

mocker.patch('ansible_runner.utils.threading.main_thread', return_value='thread0')
mocker.patch('ansible_runner.utils.threading.current_thread', return_value='thread1')

assert signal_handler() is None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


assert signal_handler()() is False
assert mock_signal.call_args_list[0][0][0] == signal.SIGTERM
assert mock_signal.call_args_list[1][0][0] == signal.SIGINT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


signal_handler()

with pytest.raises(AttributeError, match='Raised intentionally'):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ansible-zuul ansible-zuul bot merged commit 6d06aad into ansible:devel Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add default cancel_callback for ansible-runner CLI invocation
5 participants