-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ansible-runner worker cleanup
as cleanup of last-resort for remote execution nodes
#846
Conversation
8fa836c
to
afce151
Compare
ansible-runner worker cleanup
as cleanup of last-resort for remote execution nodesansible-runner worker cleanup
as cleanup of last-resort for remote execution nodes
cc0f232
to
57f64a9
Compare
|
||
def prune_images(runtime='podman'): | ||
"""Run the prune images command and return changed status""" | ||
stdout = run_command([runtime, 'image', 'prune', '-f']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fosterseth in AWX we did something a little different, and I'm not totally sure why. It kind of manually deleted dangling images instead of using the prune command.
https://github.com/ansible/awx/blob/7c9626b0e7c0b7835c8ff42970c6f1740f3b10fe/awx/main/tasks.py#L397
This might have been to fix some podman-specific bug, but I lack the knowledge to reproduce that bug. We could add it here using the same commands as you used. Wouldn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prune here is probably better -- in awx it looks like we wanted to get proper stdout for the images being removed, so it first grabs a list of images, then removes them. Maybe I wasn't able to get it work properly with prune at the time
'FROM {}'.format(default_container_image), | ||
'RUN echo {} > /tmp/for_test.txt'.format(special_string) | ||
])) | ||
image_name = 'quay.io/fortest/hasfile:latest' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this unique enough so that multiple versions of this test running simultaneously on the same host will not encounter errors when the image is later removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this were a fixture used by multiple tests I would have randomized the name. But why would the same test run in parallel on the same host? Yes, tests will run in parallel but I imagined it would never run the same test more than once. I still don't have a problem adding randomization because it won't hurt anything, I just wanted to explain myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the use of @pytest.mark.parametrize
on this test, it's possible this test could be running in parallel (once with podman and once with docker). We aren't currently using podman for these tests, only docker, but the plan is to eventually enable podman as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thing is multiple instances of the same tests with different Python versions running at the same time.
ansible_runner/cleanup.py
Outdated
def delete_associated_folders(dir): | ||
"""Where dir is the private_data_dir for a completed job, this deletes related tmp folders it used""" | ||
for ident in project_idents(dir): | ||
registry_auth_pattern = f'/tmp/{registry_auth_prefix}{ident}_*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is hardcoding /tmp
here problematic? What if the system is using a different tmp directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> from tempfile import gettempdir
>>> gettempdir()
'/tmp'
Just drop that in instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there isn't a config option in runner for specifying the temp dir, then yes, tempfile.gettempdir()
is a better option than hard coding /tmp
.
ansible_runner/cleanup.py
Outdated
__all__ = ['add_cleanup_args', 'run_cleanup'] | ||
|
||
|
||
GRACE_PERIOD_DEFAULT = 60 # minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a better place for storing configuration values rather than setting a global in just this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's ansible_runner.defaults
, which is fine with me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with that. Keep it all caps since it is a global (the other values in that file should be all caps too, but that's for another PR).
@@ -27,19 +27,21 @@ | |||
from six import string_types, PY2, PY3, text_type, binary_type | |||
|
|||
|
|||
def _cleanup_folder(folder): | |||
def cleanup_folder(folder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs tests, especially if it's going to be public.
a61bca6
to
6eaf4f7
Compare
@samdoran thank you so much for the review comments. I hope that most of these will be resolved, but there's a question or two that still seems unresolved, like what pattern we want for invoking a subprocess. |
@@ -27,19 +27,21 @@ | |||
from six import string_types, PY2, PY3, text_type, binary_type | |||
|
|||
|
|||
def _cleanup_folder(folder): | |||
def cleanup_folder(folder): | |||
"""Deletes folder, returns True or False based on whether a change happened.""" | |||
try: | |||
shutil.rmtree(folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For systems that we care about, it seems that rmtree()
(by default) will NOT follow symlinks. Which I like because security.
https://docs.python.org/3/library/shutil.html#shutil.rmtree.avoids_symlink_attacks
Co-authored-by: Sam Doran <sdoran@redhat.com>
Co-authored-by: Sam Doran <sdoran@redhat.com>
52e9136
to
50401dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of Sam's comments here have been addressed. If not, those can be addressed in followup PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No blockers, just some suggestions.
from ansible_runner import run | ||
from ansible_runner import output | ||
from ansible_runner import cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: combine multiple imports.
from ansible_runner import run | |
from ansible_runner import output | |
from ansible_runner import cleanup | |
from ansible_runner import ( | |
cleanup, | |
output, | |
run, | |
) |
) | ||
|
||
|
||
def run_command(cmd): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I was talking about run_command()
and run_command_async()
but those do a bit more than run commands since they also instantiate a Runner
object.
We should add a generic and simple public function for running commands at a later date.
@@ -1,2 +1,6 @@ | |||
default_process_isolation_executable = 'podman' | |||
default_container_image = 'quay.io/ansible/ansible-runner:devel' | |||
registry_auth_prefix = 'ansible_runner_registry_' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should be all caps as well since it's a global.
registry_auth_prefix = 'ansible_runner_registry_' | |
REGISTRY_AUTH_PREFIX = 'ansible_runner_registry_' |
When running `ansible-runner worker`, if no `--private-data-dir` is given, | ||
it will extract the contents to a temporary directory which is deleted at the end of execution. | ||
You can use the `--delete` flag in conjunction with `--private-data-dir` to assure that | ||
the provided directory is deleted at the end of execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For RST, double backticks are needed for inline code formatting.
When running `ansible-runner worker`, if no `--private-data-dir` is given, | |
it will extract the contents to a temporary directory which is deleted at the end of execution. | |
You can use the `--delete` flag in conjunction with `--private-data-dir` to assure that | |
the provided directory is deleted at the end of execution. | |
When running ``ansible-runner worker``, if no ``--private-data-dir`` is given, | |
it will extract the contents to a temporary directory which is deleted at the end of execution. | |
You can use the ``--delete`` flag in conjunction with ``--private-data-dir`` to assure that | |
the provided directory is deleted at the end of execution. |
|
||
The following command offers out-of-band cleanup. | ||
|
||
$ ansible-runner worker cleanup --file-pattern=/tmp/foo_* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to quote this to avoid shell expansion.
$ ansible-runner worker cleanup --file-pattern=/tmp/foo_* | |
$ ansible-runner worker cleanup --file-pattern='/tmp/foo_*' |
`ansible-runner worker --private_data_dir=/tmp/foo_3`, for example. | ||
NOTE: see the `--grace-period` option, which sets the time window. | ||
|
||
This command also takes a `--remove-images` option to run the podman or docker `rmi` command. | ||
There is otherwise no automatic cleanup of images used by a run, | ||
even if `container_auth_data` is used to pull from a private container registry. | ||
To be sure that layers are deleted as well, the `--image-prune` flag is necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`ansible-runner worker --private_data_dir=/tmp/foo_3`, for example. | |
NOTE: see the `--grace-period` option, which sets the time window. | |
This command also takes a `--remove-images` option to run the podman or docker `rmi` command. | |
There is otherwise no automatic cleanup of images used by a run, | |
even if `container_auth_data` is used to pull from a private container registry. | |
To be sure that layers are deleted as well, the `--image-prune` flag is necessary. | |
``ansible-runner worker --private_data_dir=/tmp/foo_3``, for example. | |
NOTE: see the ``--grace-period`` option, which sets the time window. | |
This command also takes a ``--remove-images`` option to run the podman or docker ``rmi`` command. | |
There is otherwise no automatic cleanup of images used by a run, | |
even if ``container_auth_data`` is used to pull from a private container registry. | |
To be sure that layers are deleted as well, the ``--image-prune`` flag is necessary. |
old_dir = str(tmp_path / 'modtime_old_xyz') | ||
new_dir = str(tmp_path / 'modtime_new_abc') | ||
os.mkdir(old_dir) | ||
time.sleep(1) | ||
os.mkdir(new_dir) | ||
ct = cleanup_dirs(pattern=str(tmp_path / 'modtime_*_*'), grace_period=1. / 60.) | ||
assert ct == 1 | ||
assert not os.path.exists(old_dir) | ||
assert os.path.exists(new_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could keep these as Path
objects.
old_dir = str(tmp_path / 'modtime_old_xyz') | |
new_dir = str(tmp_path / 'modtime_new_abc') | |
os.mkdir(old_dir) | |
time.sleep(1) | |
os.mkdir(new_dir) | |
ct = cleanup_dirs(pattern=str(tmp_path / 'modtime_*_*'), grace_period=1. / 60.) | |
assert ct == 1 | |
assert not os.path.exists(old_dir) | |
assert os.path.exists(new_dir) | |
old_dir = tmp_path / 'modtime_old_xyz' | |
new_dir = tmp_path / 'modtime_new_abc' | |
old_dir.mkdir() | |
time.sleep(1) | |
new_dir.mkdir() | |
ct = cleanup_dirs(pattern=str(tmp_path / 'modtime_*_*'), grace_period=1. / 60.) | |
assert ct == 1 | |
assert not old_dir.exists() | |
assert new_dir.exists() |
Side note: using time.sleep()
in tests is usually asking for instability. It's better to just set a/c/mtimes explicitly rather than slow down the tests unnecessary.
def test_cleanup_folder(tmp_path): | ||
folder_path = tmp_path / 'a_folder' | ||
folder_path.mkdir() | ||
assert folder_path.exists() # sanity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assertion and the same on line 32 are testing pathlib
or the state of the test environment, not our function, and are unnecessary.
As of creating the draft PR, I might have all the test cases I brainstormed taken care of. I had some content for manually running demos:
https://gist.github.com/AlanCoding/99de3b85031cae3b7f2c7c8f81560359#file-cleanup_demo-md
I don't want to merge this until AWX changes have been finished (in basic terms) and tested locally. I do want to make sure runner CI is running and that I don't get any surprises.
EDIT: I am now pretty happy with the testing in conjunction with AWX.