Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition when using --rm? #5483

Closed
bundi78 opened this issue Mar 12, 2020 · 8 comments
Closed

race condition when using --rm? #5483

bundi78 opened this issue Mar 12, 2020 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@bundi78
Copy link

bundi78 commented Mar 12, 2020

/kind bug

Description

When running a podman container, an error is thrown when exiting.

Steps to reproduce the issue:

  1. podman run -t --rm hello-world

Describe the results you received:

ERRO[0000] Error forwarding signal 23 to container fa034e6e2b78ba590b4dbc917e2d5e69473314e6b3237ff6f79c6f976c1e5750: container has already been removed

or

ERRO[0000] Error forwarding signal 23 to container 1600116d8922dea4f86c4fc40c062cee26f2f8357af25e5b3cbbd96b20defe03: can only kill running containers. 1600116d8922dea4f86c4fc40c062cee26f2f8357af25e5b3cbbd96b20defe03 is in state stopped: container state improper

Describe the results you expected:

Clean shutdown as a few days before.

Additional information you deem important (e.g. issue happens only occasionally):

This happens on 2 machines running the latest arch linux (last update 2020/03/12). Shutting down without removing seems to be ok and afterwards the stopped container can manually be removed without issue.

Output of podman version:

Version: 1.8.1
RemoteAPI Version: 1
Go Version: go1.14
Git Commit: 444a19c
Built: Wed Mar 11 22:49:18 2020
OS/Arch: linux/amd64

Output of podman info --debug:

debug:
compiler: gc
git commit: 444a19c
go version: go1.14
podman version: 1.8.1
host:
BuildahVersion: 1.14.2
CgroupVersion: v1
Conmon:
package: Unknown
path: /usr/bin/conmon
version: 'conmon version 2.0.11, commit: ff9d97a08d7a4b58267ac03719786e4e7258cecf'
Distribution:
distribution: arch
version: unknown
IDMappings:
gidmap:

  • container_id: 0
    host_id: 1000
    size: 1
  • container_id: 1
    host_id: 1000000
    size: 65536
    uidmap:
  • container_id: 0
    host_id: 1000
    size: 1
  • container_id: 1
    host_id: 1000000
    size: 65536
    MemFree: 8784392192
    MemTotal: 16747130880
    OCIRuntime:
    name: runc
    package: Unknown
    path: /usr/bin/runc
    version: |-
    runc version 1.0.0-rc10
    commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
    spec: 1.0.1-dev
    SwapFree: 39280701440
    SwapTotal: 39280701440
    arch: amd64
    cpus: 8
    eventlogger: journald
    hostname: dev1
    kernel: 5.5.8-1-ck
    os: linux
    rootless: true
    slirp4netns:
    Executable: /bin/slirp4netns
    Package: Unknown
    Version: |-
    slirp4netns version 0.4.3
    commit: 2244b9b6461afeccad1678fac3d6e478c28b4ad6
    uptime: 1h 30m 3.46s (Approximately 0.04 days)
    registries:
    localhost:
    Blocked: false
    Insecure: true
    Location: localhost
    MirrorByDigestOnly: false
    Mirrors: []
    Prefix: localhost
    search:
  • docker.io
  • registry.fedoraproject.org
  • quay.io
  • registry.access.redhat.com
  • registry.centos.org
    store:
    ConfigFile: /home/developer/.config/containers/storage.conf
    ContainerStore:
    number: 0
    GraphDriverName: overlay
    GraphOptions:
    overlay.mount_program:
    Executable: /bin/fuse-overlayfs
    Package: Unknown
    Version: |-
    fusermount3 version: 3.9.0
    fuse-overlayfs: version 0.7.7
    FUSE library version 3.9.0
    using FUSE kernel interface version 7.31
    GraphRoot: /home/developer/.local/share/containers/storage
    GraphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
    ImageStore:
    number: 1
    RunRoot: /run/user/1000/containers
    VolumePath: /home/developer/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

pacman -Qi podman
Name : podman
Version : 1.8.1-1
Description : Tool and library for running OCI-based containers in pods
Architecture : x86_64
URL : https://github.com/containers/libpod
Licenses : Apache
Groups : None
Provides : None
Depends On : cni-plugins conmon device-mapper iptables libseccomp runc skopeo btrfs-progs slirp4netns libsystemd
Optional Deps : podman-docker: for Docker-compatible CLI
Required By : None
Optional For : xilinx-ise-utils
Conflicts With : None
Replaces : None
Installed Size : 100,63 MiB
Packager : Morten Linderud foxboron@archlinux.org
Build Date : Mi 11 Mär 2020 22:49:18 CET
Install Date : Do 12 Mär 2020 18:10:55 CET
Install Reason : Explicitly installed
Install Script : No
Validated By : Signature

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 12, 2020
@mheon
Copy link
Member

mheon commented Mar 12, 2020

Uh-oh.

@edsantiago @baude @vrothberg It spread to arch

@mheon
Copy link
Member

mheon commented Mar 12, 2020

For context, this is likely related to #5034

We've been seeing it elsewhere (Fedora rawhide) but haven't had the chance to properly investigate. There are theories that it is related to building with the latest version of the Golang compiler

@baude
Copy link
Member

baude commented Mar 12, 2020

this is due to the 1.14 compiler. if compiled with arch's 1.13, works fine. investigating continues.

@mheon
Copy link
Member

mheon commented Mar 12, 2020

It appears to be the result of a deliberate change in Go 1.14. From the release notes:

 Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is supported on all platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.

A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases.

It looks like the Go runtime is generating Signal 23 to tell a goroutine to yield (or something similar - definitely some form of IPC), which we are dutifully catching as part of --sig-proxy and forwarding into the container. This is:

  • Flooding the container with spurious signals
  • Generating errors during stop/removal as we can't forward signals any more

Recommended workaround while we sort this out: disable sig-proxy with --sig-proxy=false. Most people likely don't use it (and if you do, I recommend investigating alternatives like deliberately invoking podman kill, as it's kind of a kludge).

@edsantiago
Copy link
Collaborator

Thank you for tracking it down!

@baude
Copy link
Member

baude commented Mar 13, 2020

I'm not convinced yet as I observed differences in 1.14 and go head. I want a chance to chase a little further. I will try to have some more information by weekend's end.

@rhatdan
Copy link
Member

rhatdan commented Mar 13, 2020

Can we just tell the sig-proxy to not send Signal 23?

@mheon
Copy link
Member

mheon commented Mar 13, 2020

I think we're going to have to, which kind of sucks - I'm fully expecting a bug in a few months with the title "Podman with --sig-proxy does not forward Signal 23". I really wish this change from Go had been more clearly communicated and had some way to disable.

baude added a commit to baude/podman that referenced this issue Mar 13, 2020
due to a change in golang-1.14 and it's changes to make go funcs with tight loops preemptive, signals are now getting "through" that never were before.

From the golang-1.14 announce:

Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is supported on all platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.

A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. Those programs will have to handle those errors in some way, most likely looping to try the system call again. For more information about this see man 7 signal for Linux systems or similar documentation for other systems.

Fixes containers#5483

Signed-off-by: Brent Baude <bbaude@redhat.com>
snj33v pushed a commit to snj33v/libpod that referenced this issue May 31, 2020
due to a change in golang-1.14 and it's changes to make go funcs with tight loops preemptive, signals are now getting "through" that never were before.

From the golang-1.14 announce:

Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is supported on all platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.

A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. Those programs will have to handle those errors in some way, most likely looping to try the system call again. For more information about this see man 7 signal for Linux systems or similar documentation for other systems.

Fixes containers#5483

Signed-off-by: Brent Baude <bbaude@redhat.com>
mheon pushed a commit to mheon/libpod that referenced this issue Sep 23, 2020
65;6003;1c
due to a change in golang-1.14 and it's changes to make go funcs with tight loops preemptive, signals are now getting "through" that never were before.

From the golang-1.14 announce:

Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is supported on all platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.

A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. Those programs will have to handle those errors in some way, most likely looping to try the system call again. For more information about this see man 7 signal for Linux systems or similar documentation for other systems.

Fixes containers#5483

Signed-off-by: Brent Baude <bbaude@redhat.com>

<MH: Fixed build after cherry-pick>

Signed-off-by: Matthew Heon <mheon@redhat.com>
mheon pushed a commit to mheon/libpod that referenced this issue Sep 23, 2020
due to a change in golang-1.14 and it's changes to make go funcs with tight loops preemptive, signals are now getting "through" that never were before.

From the golang-1.14 announce:

Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is supported on all platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.

A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. Those programs will have to handle those errors in some way, most likely looping to try the system call again. For more information about this see man 7 signal for Linux systems or similar documentation for other systems.

Fixes containers#5483

Signed-off-by: Brent Baude <bbaude@redhat.com>

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1875289

<MH: Fixed build after cherry-pick>

Signed-off-by: Matthew Heon <mheon@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

6 participants