Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix podman rm -fa with dependencies #18507

Merged
merged 14 commits into from
Jun 12, 2023

Commits on Jun 1, 2023

  1. Add an API for removing a container and dependencies

    This is the initial stage of implementation. The current API
    functions but does not report the additional containers and pods
    removed. This is necessary to properly display results to the
    user after `podman rm --all`.
    
    The existing remove-dependencies code has been removed in favor
    of this more native solution.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    e8d7456 View commit details
    Browse the repository at this point in the history
  2. Make RemoveContainer return containers and pods removed

    This allows for accurate reporting of dependency removal, but the
    work is still incomplete: pods can be removed, but do not report
    the containers they removed as part of said removal. Will add
    this in a subsequent commit.
    
    Major note: I made ignoring no-such-container errors automatic
    once it has been determined that a container did exist in the
    first place. I can't think of any case where this would not be a
    TOCTOU - IE, no reason not to ignore them. The `--ignore` option
    to `podman rm` should still retain meaning as it will ignore
    errors from containers that didn't exist in the first place.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    bc1a31c View commit details
    Browse the repository at this point in the history
  3. Pods now return what containers were removed with them

    This probably should have been in the API since the beginning,
    but it's not too late to start now.
    
    The extra information is returned (both via the REST API, and to
    the CLI handler for `podman rm`) but is not yet printed - it
    feels like adding it to the output could be a breaking change?
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    8cb5d39 View commit details
    Browse the repository at this point in the history
  4. Fix a deadlock when removing pods

    The infra container would try to remove the pod, despite the pod
    already being in the process of being removed - oops. Add a check
    to ensure we don't try and remove the pod when called by the
    `podman pod rm` command.
    
    Also, wire up noLockPod - it wasn't previously wired in, which is
    concerning, and could be related?
    
    Finally, make a few minor fixes to un-break lint.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    ef1a22c View commit details
    Browse the repository at this point in the history
  5. Revert "ginkgo-v2 cleanup workaround for containers#18180"

    This reverts commit c4b9f4b.
    
    This was a temporary workaround until a fix for containers#18180 landed.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    bafb3d6 View commit details
    Browse the repository at this point in the history
  6. Add a test for removing dependencies with rm -fa

    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    b75ff3a View commit details
    Browse the repository at this point in the history
  7. Revert "test/e2e: fix "podman run ipcns ipcmk container test""

    This reverts commit 9bd833b.
    
    With the fix for `podman rm -fa` merged, we no longer require
    this patch.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    4e6efbb View commit details
    Browse the repository at this point in the history
  8. The removeContainer function now accepts a struct

    We had something like 6 different boolean options (removing a
    container turns out to be rather complicated, because there are a
    million-odd things that want to do it), and the function
    signature was getting unreasonably large. Change to a struct to
    clean things up.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    2c9f181 View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2023

  1. Change Inherit to use a pointer to a container

    This fixes a lint issue, but I'm keeping it in its own commit so
    it can be reverted independently if necessary; I don't know what
    side effects this may have. I don't *think* there are any
    issues, but I'm not sure why it wasn't a pointer in the first
    place, so there may have been a reason.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    398e48a View commit details
    Browse the repository at this point in the history
  2. Discard errors when a pod is already removed

    This was causing some CI flakes. I'm pretty sure that the pods
    being removed already isn't a bug, but just the result of another
    container in the pod removing it first - so no reason not to
    ignore the errors.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    0e47465 View commit details
    Browse the repository at this point in the history
  3. Fix a race removing multiple containers in the same pod

    If the first container to get the pod lock is the infra container
    it's going to want to remove the entire pod, which will also
    remove every other container in the pod. Subsequent containers
    will get the pod lock and try to access the pod, only to realize
    it no longer exists - and that, actually, the container being
    removed also no longer exists.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    a750cd9 View commit details
    Browse the repository at this point in the history
  4. Fix an expected error message from pod removal

    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    3100824 View commit details
    Browse the repository at this point in the history
  5. Ensure our mutexes handle recursive locking properly

    We use shared-memory pthread mutexes to handle mutual exclusion
    in Libpod. It turns out that these have configurable options for
    how to handle a recursive lock (IE, a thread trying to lock a
    lock that the same thread had previously locked). The mutex can
    either deadlock, or allow the duplicate lock without deadlocking.
    Default behavior is, helpfully, unspecified, so if not explicitly
    set there is no clear indication of which of these behaviors will
    be seen. Unfortunately, today is the first I learned of this, so
    our initial implementation did *not* explicitly set our preferred
    behavior.
    
    This turns out to be a major problem with a language like Golang,
    where multiple goroutines can (and often do) use the same OS
    thread. So we can have two goroutines trying to stop the same
    container, and if the no-deadlock mutex behavior is in use, both
    threads will successfully acquire the lock because the C library,
    not knowing about Go's lightweight threads, sees the same PID
    trying to lock a mutex twice, and allows it without question.
    
    It appears that, at least on Fedora/RHEL/Debian libc, the default
    (unspecified) behavior of the locks is the non-deadlocking
    version - so, effectively, our locks have been of questionable
    utility within the same Podman process for the last four years.
    This is somewhat concerning.
    
    What's even more concerning is that the Golang-native sync.Mutex
    that was also in use did nothing to prevent the duplicate locking
    (I don't know if I like the implications of this).
    
    Anyways, this resolves the major issue of our locks not working
    correctly by explicitly setting the correct pthread mutex
    behavior.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    f1ecdca View commit details
    Browse the repository at this point in the history

Commits on Jun 8, 2023

  1. Ignore spurious warnings when killing containers

    There are certain messages logged by OCI runtimes when killing a
    container that has already stopped that we really do not care
    about when stopping a container. Due to our architecture, there
    are inherent races around stopping containers, and so we cannot
    guarantee that *we* are the people to kill it - but that doesn't
    matter because Podman only cares that the container has stopped,
    not who delivered the fatal signal.
    
    Unfortunately, the OCI runtimes don't understand this, and log
    various warning messages when the `kill` command is invoked on a
    container that was already dead. These cause our tests to fail,
    as we now check for clean STDERR when running Podman. To work
    around this, capture STDERR for the OCI runtime in a buffer only
    for stopping containers, and go through and discard any of the
    warnings we identified as spurious.
    
    Signed-off-by: Matthew Heon <matthew.heon@pm.me>
    mheon committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    2ebc900 View commit details
    Browse the repository at this point in the history