Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up docker containers by default #3629

Merged
merged 6 commits into from
Jun 14, 2021

Conversation

w-gao
Copy link
Contributor

@w-gao w-gao commented May 24, 2021

Make sure Toil removes the docker containers after they are finished running, so we don't have a bunch of "Exited" Docker containers.

Closes #3505.

Changelog Entry

To be copied to the draft changelog by merger:

  • Clean up docker containers by default. Set remove=False in apiDockerCall() to leave the containers untouched and running.

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/cliOptions.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

src/toil/lib/docker.py Outdated Show resolved Hide resolved
Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some small complaints, but mostly I'm now suspicious of the whole deferParam system. I think we need to at least re-document it so it makes sense along with the remove flag. As is it's not clear what all the combinations are supposed to do.

A much simpler fix for #3505 might be possible: I don't think the documented default deferParam behavior of RM has been working, which would account for all the left-behind containers from toil-vg.

Do we have any unit tests for the cleanup here? If not, the first step should really be to add some. Then we can be certain of what the combinations of parameters are meant to do, because they will be tested.

with open(streamfile, 'w') as f:
f.write(line)
for line in container.logs(stdout=stdout, stderr=stderr, stream=True):
f.write(line.decode() if isinstance(line, bytes) else line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to raise an error in Toil if a container happens to be sending non-UTF8 data; we often stream binary files in and out of Docker containers in toil-vg. We ought to just pass the bytes along here in binary mode.

src/toil/lib/docker.py Outdated Show resolved Hide resolved
@@ -356,27 +362,29 @@ def dockerKill(container_name, gentleKill=False, timeout=365 * 24 * 60 * 60):
raise create_api_error_from_http_exception(e)


def dockerStop(container_name):
def dockerStop(container_name: str, remove: bool = True) -> None:
"""
Gracefully kills a container. Equivalent to "docker stop":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be the right way to explain this anymore, since now by default we do more than "docker stop".

Maybe drop the link to the Docker docs and say that this gracefully kills and removes a container?

:param container_name: Name of the container being stopped.
:param client: The docker API client object to call.
:param remove: If True, remove the container after it exits. (default: True).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be a keep flag everywhere instead of a default-true remove flag? Having flags that are default-true and having to say they =False takes up more brain space than the other way around. Plus, when we document them like this, by saying what happens when they're true, then when you actually go to use them (by setting them to false), you have to invert the sense of the documentation to figure out what you are accomplishing by using the flag, which takes more brain space.

def dockerKill(container_name: str,
gentleKill: bool = False,
remove: bool = True,
timeout: int = 365 * 24 * 60 * 60) -> None:
"""
Immediately kills a container. Equivalent to "docker kill":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be the best way to describe this method if it now has a different default behavior than docker kill.

src/toil/lib/docker.py Outdated Show resolved Hide resolved
src/toil/lib/docker.py Outdated Show resolved Hide resolved
src/toil/lib/docker.py Outdated Show resolved Hide resolved
@adamnovak
Copy link
Member

@w-gao I've added a bunch of complaints here, but I don't think it would make sense to go through and address all of them. I think we want to do one of:

  1. Not actually change the old remove flag at all and instead test the RM deferParam and make sure it works as documented to clean up containers after the job when not explicitly set.
  2. Refactor remove to actually work through the deferred cleanup system like you have done, give up on apiDockerCall looking quite like the Python Docker API, and then start revising its API to make more sense (maybe deprecating deferParam in favor of remove to control cleanup).

@w-gao
Copy link
Contributor Author

w-gao commented May 25, 2021

Thanks @adamnovak for the review!

I'm leaning towards fixing the default deferParams behavior for clean up (your first suggestion), because I do want apiDockerCall() to look like the Python Docker API, and I think it would make sense to not change the old remove flag so users can control the underlying client.containers.run() call if needed.

As for unit tests, we actually have these here that test the different combinations of options. Although, I don't think they are tested on the CI. I'll look into the tests and see if the default deferParam behavior of RM is being tested.

@adamnovak
Copy link
Member

It does look like the dockerTest.py file might not be getting run on CI; maybe we can add it to the Gitlab tests next to other lib tests?

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely simpler now, and I think it should work. @w-gao Are you still adding tests?

@w-gao
Copy link
Contributor Author

w-gao commented Jun 3, 2021

Yeah @adamnovak, I'm trying to make sure all combinations work as intended, so this is not quite ready yet.

@w-gao w-gao marked this pull request as draft June 7, 2021 17:46
@w-gao w-gao force-pushed the issues/3505-auto-remove-docker-containers branch from 882a489 to e14e9fd Compare June 9, 2021 00:01
@w-gao
Copy link
Contributor Author

w-gao commented Jun 9, 2021

@adamnovak This should be ready now! I changed a few things since your last review, so this may need a re-review.

For some reason the file permission check doesn't pass on the CI, but works fine locally. I'll take a deeper look into that but that's probably a different issue.

@w-gao w-gao marked this pull request as ready for review June 9, 2021 00:09
@adamnovak adamnovak self-requested a review June 14, 2021 16:30
Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this.

@adamnovak adamnovak merged commit 941228d into master Jun 14, 2021
@adamnovak
Copy link
Member

I think the file permission test fails because we actually are running as root on CI, whereas the test expects us to be running as someone other than root, who it expects to see owning the relevant files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exited Docker containers are not cleaned up
2 participants