Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user podman service leaks "pause" process, breaks API after idle timeout #7180

Closed
martinpitt opened this issue Aug 1, 2020 · 7 comments · Fixed by #7192
Closed

user podman service leaks "pause" process, breaks API after idle timeout #7180

martinpitt opened this issue Aug 1, 2020 · 7 comments · Fixed by #7192
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@martinpitt
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Stopping user's podman.service, or letting it time out, often leaves behind a "podman pause" process.

Steps to reproduce the issue:

  1. Log in as user, and clean slate. There are no running containers, and service is not running:

     systemctl --user stop podman.service; systemctl --user start podman.socket
     # make sure there are no podman processes
     ps ux|grep podman
    
  2. Make an API request:

     $ curl --silent --show-error --unix-socket $XDG_RUNTIME_DIR/podman/podman.sock http://d/libpod/_ping
     OK
    
  3. Check processes. podman.service is running:

$ systemctl --user status podman
● podman.service - Podman API Service
     Loaded: loaded (/usr/lib/systemd/user/podman.service; disabled; vendor preset: disabled)
     Active: active (running) since Sat 2020-08-01 03:11:59 EDT; 1s ago
TriggeredBy: ● podman.socket
       Docs: man:podman-api(1)
   Main PID: 59799 (podman)
      Tasks: 14 (limit: 1286)
     Memory: 64.0M
        CPU: 145ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/podman.service
             ├─59799 /usr/bin/podman system service
             ├─59805 /usr/bin/podman system service
             └─59809 /usr/bin/podman

There is also a zombie process (but that's not the main issue of this bug report):

$ ps ux | grep podman
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
admin      59799  1.2  4.4 1206952 50508 ?       Ssl  03:11   0:00 /usr/bin/podman system service
admin      59805  1.2  4.6 1060976 51756 ?       Sl   03:11   0:00 /usr/bin/podman system service
admin      59808  0.0  0.0      0     0 ?        Zs   03:11   0:00 [podman] <defunct>
admin      59809  0.2  2.4  54832 28048 ?        S    03:11   0:00 /usr/bin/podman
  1. Wait a few seconds until podman.service times out, check with systemctl --user status podman

Describe the results you received:

After that I expect no running podman processes any more.

Describe the results you expected:

$ ps ux|grep podman
admin      59809  0.0  2.4  54832 28048 ?        S    03:11   0:00 /usr/bin/podman

$ cat /proc/59809/comm 
podman pause

Additional information you deem important (e.g. issue happens only occasionally):

It's not 100% reliable, sometimes the pause process does get cleaned up.

This is podman 2.0.2, as 2.0.3's API is broken (issue #7078). 2.0.2 and 2.0.3 also have a few bugs in their systemd units (most importantly, KillMode=process). But I already got the latest unit from master, i. e. with default KillMode. This "pause" process seems to get detached from the unit somehow, even though it originally appears inside.

Output of podman version:

Version:      2.0.2
API Version:  1
Go Version:   go1.14.3
Built:        Wed Dec 31 19:00:00 1969
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.18-1.fc32.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.18, commit: 6e8799f576f11f902cd8a8d8b45b2b2caf636a85'
  cpus: 1
  distribution:
    distribution: fedora
    version: "32"
  eventLogger: file
  hostname: m1.cockpit.lan
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.7.9-200.fc32.x86_64
  linkmode: dynamic
  memFree: 360267776
  memTotal: 1151725568
  ociRuntime:
    name: crun
    package: crun-0.14.1-1.fc32.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.14.1
      commit: 598ea5e192ca12d4f6378217d3ab1415efeddefa
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.4-1.fc32.x86_64
    version: |-
      slirp4netns version 1.1.4
      commit: b66ffa8e262507e37fca689822d23430f3357fe8
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 2
  swapFree: 1284456448
  swapTotal: 1291841536
  uptime: 1h 49m 22.44s (Approximately 0.04 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/admin/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.1.2-1.fc32.x86_64
      Version: |-
        fusermount3 version: 3.9.1
        fuse-overlayfs: version 1.1.0
        FUSE library version 3.9.1
        using FUSE kernel interface version 7.31
  graphRoot: /home/admin/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 3
  runRoot: /run/user/1000/containers
  volumePath: /home/admin/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Wed Dec 31 19:00:00 1969
  GitCommit: ""
  GoVersion: go1.14.3
  OsArch: linux/amd64
  Version: 2.0.2

Package info (e.g. output of rpm -q podman or apt list podman):

podman-2.0.2-1.fc32.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 1, 2020
martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Aug 1, 2020
Working with user podman API leaves behind a `podman pause` process
which is not attached to/stopped by podman.service.

Avoid that leaking into the next test while the data directories get
restored.

See containers/podman#7180
@martinpitt
Copy link
Contributor Author

This is actually quite serious, as it completely breaks the API on timeout:

$ systemctl --user stop podman.socket podman.service; pkill -e podman
podman pause killed (pid 7600)
$ systemctl --user start podman.socket
$ curl --unix-socket $XDG_RUNTIME_DIR/podman/podman.sock http://d/libpod/_ping
OK

Now let podman.service timeout, see journalctl --user -f -u podman.service:

Aug 01 04:35:06 m1.cockpit.lan systemd[5900]: Started Podman API Service.
Aug 01 04:35:16 m1.cockpit.lan systemd[5900]: podman.service: Succeeded.

Now further API requests are broken: Running the same curl command from above hangs forever, and the journal shows a never-ending start/stop cycle:

Aug 01 04:36:46 m1.cockpit.lan systemd[5900]: Started Podman API Service.
Aug 01 04:36:56 m1.cockpit.lan systemd[5900]: podman.service: Succeeded.
Aug 01 04:36:56 m1.cockpit.lan systemd[5900]: Started Podman API Service.
Aug 01 04:37:06 m1.cockpit.lan systemd[5900]: podman.service: Succeeded.
Aug 01 04:37:06 m1.cockpit.lan systemd[5900]: Started Podman API Service.
[...]

@martinpitt martinpitt changed the title user podman service leaks "pause" process user podman service leaks "pause" process, breaks API after idle timeout Aug 1, 2020
martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Aug 1, 2020
Working with user podman API leaves behind a `podman pause` process
which is not attached to/stopped by podman.service. This completely
breaks the API: containers/podman#7180

Closes cockpit-project#473
@marusak
Copy link
Contributor

marusak commented Aug 1, 2020

The behaviour is rather similar to #6660 so maybe related?

marusak pushed a commit to cockpit-project/cockpit-podman that referenced this issue Aug 1, 2020
Working with user podman API leaves behind a `podman pause` process
which is not attached to/stopped by podman.service. This completely
breaks the API: containers/podman#7180

Closes #473
@mheon
Copy link
Member

mheon commented Aug 1, 2020

The pause process remaining behind is intentional - see discussion in #7133

Are you certain that the timeout issue is related to the pause process?

@martinpitt
Copy link
Contributor Author

@mheon: Yes, as soon as I kill it, I can connect again.

@mheon
Copy link
Member

mheon commented Aug 2, 2020

Alright, that's serious.

@baude @jwhonce @giuseppe Looks like the pause process is somehow interfering with podman system service.

@martinpitt
Copy link
Contributor Author

Confirmed with current 2.0.4 from Fedora 32 updates-testing

giuseppe added a commit to giuseppe/libpod that referenced this issue Aug 3, 2020
when there is a pause process running, let the "system service" podman
instance join immediately the existing namespaces.

Closes: containers#7180
Closes: containers#6660

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

giuseppe commented Aug 3, 2020

This "pause" process seems to get detached from the unit somehow, even though it originally appears inside.

that is by design, we want it to be shared among all the podman processes for an user.

Could you give a try to #7192 ?

giuseppe added a commit to giuseppe/libpod that referenced this issue Aug 3, 2020
when there is a pause process running, let the "system service" podman
instance join immediately the existing namespaces.

Closes: containers#7180
Closes: containers#6660

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
vrothberg pushed a commit to vrothberg/libpod that referenced this issue Aug 11, 2020
when there is a pause process running, let the "system service" podman
instance join immediately the existing namespaces.

Closes: containers#7180
Closes: containers#6660

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
vrothberg pushed a commit to vrothberg/libpod that referenced this issue Aug 11, 2020
when there is a pause process running, let the "system service" podman
instance join immediately the existing namespaces.

Closes: containers#7180
Closes: containers#6660

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
vrothberg pushed a commit to vrothberg/libpod that referenced this issue Aug 11, 2020
when there is a pause process running, let the "system service" podman
instance join immediately the existing namespaces.

Closes: containers#7180
Closes: containers#6660

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
mheon pushed a commit to mheon/libpod that referenced this issue Aug 17, 2020
when there is a pause process running, let the "system service" podman
instance join immediately the existing namespaces.

Closes: containers#7180
Closes: containers#6660

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
mheon pushed a commit to mheon/libpod that referenced this issue Aug 17, 2020
when there is a pause process running, let the "system service" podman
instance join immediately the existing namespaces.

Closes: containers#7180
Closes: containers#6660

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
marusak added a commit to marusak/cockpit-podman that referenced this issue Nov 2, 2020
containers/podman#7180 is fixed and fixed
version is present in all of our images.
marusak added a commit to cockpit-project/cockpit-podman that referenced this issue Nov 2, 2020
containers/podman#7180 is fixed and fixed
version is present in all of our images.
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants