Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to receive console file descriptor Communication error on send (Rootless, above 4.7.2+) #22274

Closed
dezza opened this issue Apr 5, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@dezza
Copy link

dezza commented Apr 5, 2024

Issue Description

Been updating my dev server centos-stream-9 as usual and around the 4.8.2 version I started seeing these cryptic errors:

Error: crun: sd-bus call: Interactive authentication required.: Permission denied: OCI permission denied

Both when building/running containers.

Initially I shrugged it off as being a temporary centos-stream-9 issue so I downgraded to 4.7.2 and a matching crun version (2.1.8?), I think this issue is related: containers/conmon#475

Now podman 5.0 is out I tried to upgrade the packages again, but the issue returned. I've tried all versions in-between available to centos-stream-9 and only 4.7.2 or earlier works.

podman run --log-level=debug -it --name netshoot --rm localhost/netshoot

INFO[0000] Failed to add conmon to systemd sandbox cgroup
DEBU[0000] ExitCode msg: "container create failed (no logs from conmon): conmon bytes \"\": readobjectstart: expect { or n, but found \x00, error found in #0 byte of ...||..., bigger context ...||..."
Error: container create failed (no logs from conmon): conmon bytes "": readObjectStart: expect { or n, but found , error found in #0 byte of ...||..., bigger context ...||...

with strace of interest is:

setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)

So after raising these limits manually /etc/limits.d/ct.conf

ct - nproc 4194304
ct - nofile 1048576

There's another issue left:

[pid 281728<podman>] epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1588592641, u64=9175308412348530689}}) = -1 EPERM (Operation not permitted)

journalctl contains:

Apr 22 01:05:25 cos conmon[299753]: conmon 4df08ad692c3d3112036 <error>: Failed to receive console file descriptor Communication error on send

Removing -it gives a different error (quite likely unrelated but #3024):

DEBU[0000] running conmon: /usr/bin/conmon               args="[--api-version 1 -c cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e -u cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e -r /usr/bin/crun -b /home/ct/.local/share/containers/storage/overlay-containers/cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e/userdata -p /run/user/1001/containers/overlay-containers/cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e/userdata/pidfile -n netshoot --exit-dir /run/user/1001/libpod/tmp/exits --full-attach -s -l journald --log-level debug --syslog --conmon-pidfile /run/user/1001/containers/overlay-containers/cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/ct/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1001/containers --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1001/libpod/tmp --exit-command-arg --network-config-dir --exit-command-arg  --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /home/ct/.local/share/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg boltdb --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg --syslog --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e]"
INFO[0000] Running conmon under slice user.slice and unitName libpod-conmon-cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e.scope
INFO[0000] Failed to add conmon to systemd sandbox cgroup: dial unix /run/user/1001: connect: connection refused
[conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied

and journalctl

Apr 22 01:19:14 cos conmon[303223]: conmon 01ea7bed14b9b244c3ec <nwarn>: runtime stderr: sd-bus call: Interactive authentication required.: Permission denied
Apr 22 01:19:14 cos conmon[303223]: conmon 01ea7bed14b9b244c3ec <error>: Failed to create container: exit status 1

podman build . -t localhost/netshoot

DEBU[0000] Running ["/usr/bin/crun" "--systemd-cgroup" "create" "--bundle" "/var/tmp/buildah3749706510" "--pid-file" "/var/tmp/buildah3749706510/pid" "--no-new-keyring" "buildah-buildah3749706510"]
DEBU[0000] "/var/tmp/buildah3749706510/mnt/buildah-bind-target-10" is apparently not really mounted, skipping
DEBU[0000] "/var/tmp/buildah3749706510/mnt/rootfs" is apparently not really mounted, skipping
DEBU[0000] "/var/tmp/buildah3749706510/mnt" is apparently not really mounted, skipping
error running container: from /usr/bin/crun creating container for [/bin/sh -c apt-get update && apt-get install -y   curl   wget]: sd-bus call: Interactive authentication required.: Permission denied
: exit status 1
ERRO[0000] did not get container create message from subprocess: EOF
DEBU[0000] Error building at step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[apt-get update && apt-get install -y   curl   wget] Flags:[] Attrs:map[] Message:RUN apt-get update && apt-get install -y   curl   wget Heredocs:[] Original:RUN apt-get update && apt-get install -y   curl   wget}: while running runtime: exit status 1
Error: building at STEP "RUN apt-get update && apt-get install -y   curl   wget": while running runtime: exit status 1
DEBU[0000] Shutting down engines

HOWEVER buildah bud -t localhost/netshoot . works, so its definetily related to podman or any of its dependencies like conmon.
Also if adding --cgroups=disabled it works.

I installed a fresh centos-stream-9 vm and could not reproduce

working fresh install cos9 podman info

host:
  arch: amd64
  buildahVersion: 1.33.5
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: 5c6ed42ed821d0a028d0006e6f9c8a69ae0806ab'
  cpuUtilization:
    idlePercent: 99.72
    systemPercent: 0.12
    userPercent: 0.16
  cpus: 1
  databaseBackend: sqlite
  distribution:
    distribution: centos
    version: "9"
  eventLogger: journald
  freeLocks: 2048
  hostname: 2a00.8200.a12f.0000.be24.11ff.feb5.e06d.static6.ewii.dk
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.0-432.el9.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 387522560
  memTotal: 2063220736
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.9.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: netavark-1.10.3-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.4-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: false
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.3-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.3
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 2194640896
  swapTotal: 2197811200
  uptime: 23h 7m 15.00s (Approximately 0.96 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/dza/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/dza/.local/share/containers/storage
  graphRootAllocated: 34082914304
  graphRootUsed: 2721624064
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 40
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/dza/.local/share/containers/storage/volumes
version:
  APIVersion: 4.9.4-dev
  Built: 1710930166
  BuiltTime: Wed Mar 20 11:22:46 2024
  GitCommit: ""
  GoVersion: go1.21.7 (Red Hat 1.21.7-1.el9)
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.4-dev

podman --log-level=debug run -it --name netshoot --rm localhost/netshoot
podman build . -t localhost/netshoot

Works as expected on that fresh install.

What can the issue be? How can I debug this further? I don't use custom configurations such as containers.conf or similar, so they should work equally well.

Steps to reproduce the issue

Its periodic or a certain state, it can't be easily reproduced it must come from updating through several podman versions and running containers.

Describe the results you received

Error: crun: sd-bus call: Interactive authentication required.: Permission denied: OCI permission denied

or
with --log-level=debug

INFO[0000] Failed to add conmon to systemd sandbox cgroup
DEBU[0000] ExitCode msg: "container create failed (no logs from conmon): conmon bytes \"\": readobjectstart: expect { or n, but found \x00, error found in #0 byte of ...||..., bigger context ...||..."
Error: container create failed (no logs from conmon): conmon bytes "": readObjectStart: expect { or n, but found , error found in #0 byte of ...||..., bigger context ...||...

or without -it

DEBU[0000] running conmon: /usr/bin/conmon               args="[--api-version 1 -c cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e -u cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e -r /usr/bin/crun -b /home/ct/.local/share/containers/storage/overlay-containers/cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e/userdata -p /run/user/1001/containers/overlay-containers/cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e/userdata/pidfile -n netshoot --exit-dir /run/user/1001/libpod/tmp/exits --full-attach -s -l journald --log-level debug --syslog --conmon-pidfile /run/user/1001/containers/overlay-containers/cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/ct/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1001/containers --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1001/libpod/tmp --exit-command-arg --network-config-dir --exit-command-arg  --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /home/ct/.local/share/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg boltdb --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg --syslog --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e]"
INFO[0000] Running conmon under slice user.slice and unitName libpod-conmon-cea94c48d17e65646efa1febe9bc6869fe1431bf7e81855adcd2229a23a3645e.scope
INFO[0000] Failed to add conmon to systemd sandbox cgroup: dial unix /run/user/1001: connect: connection refused
[conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied

Describe the results you expected

Starting the container without errors

podman info output

host:
  arch: amd64
  buildahVersion: 1.33.5
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: 5c6ed42ed821d0a028d0006e6f9c8a69ae0806ab'
  cpuUtilization:
    idlePercent: 91.03
    systemPercent: 5.4
    userPercent: 3.57
  cpus: 2
  databaseBackend: boltdb
  distribution:
    distribution: centos
    version: "9"
  eventLogger: journald
  freeLocks: 1901
  hostname: cos
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.14.0-432.el9.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 152358912
  memTotal: 3837390848
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.9.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: netavark-1.10.3-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.4-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231204.gb86afe3-1.el9.x86_64
    version: |
      pasta 0^20231204.gb86afe3-1.el9.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.3-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.3
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 3181195264
  swapTotal: 4293914624
  uptime: 28h 9m 1.00s (Approximately 1.17 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/ct/.config/containers/storage.conf
  containerStore:
    number: 11
    paused: 0
    running: 9
    stopped: 2
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/ct/.local/share/containers/storage
  graphRootAllocated: 101980831744
  graphRootUsed: 65694781440
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 96
  runRoot: /run/user/1001/containers
  transientStore: false
  volumePath: /home/ct/.local/share/containers/storage/volumes
version:
  APIVersion: 4.9.4-dev
  Built: 1710930166
  BuiltTime: Wed Mar 20 11:22:46 2024
  GitCommit: ""
  GoVersion: go1.21.7 (Red Hat 1.21.7-1.el9)
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.4-dev

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Both machines have equal package versions

podman-5.0.0-1.el9
conmon-2.1.10-1.el9
crun-1.14.4-1.el9
conmon-2.1.10-1.el9
systemd-252-32.el9

Additional information

Also tried podman system prune, podman system reset, podman system migrate and moving entire .local/share/containers to a temp directory - still same issue.

Downgrading back to podman-4.7.2-2.el9 also works. ALL later versions are affected.

@dezza dezza added the kind/bug Categorizes issue or PR as related to a bug. label Apr 5, 2024
@dezza dezza changed the title possible conmon issue possible conmon issue (expect { or n) Apr 5, 2024
@dezza dezza changed the title possible conmon issue (expect { or n) possible conmon issue "Failed to add conmon to systemd sandbox cgroup", expect { or n) Apr 8, 2024
@dezza
Copy link
Author

dezza commented Apr 20, 2024

Likely related: #22441

@dezza dezza changed the title possible conmon issue "Failed to add conmon to systemd sandbox cgroup", expect { or n) RLIMIT_NPROC|RLIMIT_NOFILE "Failed to add conmon to systemd sandbox cgroup", "failed to write to /proc/self/oom_score_adj" Apr 21, 2024
@dezza dezza changed the title RLIMIT_NPROC|RLIMIT_NOFILE "Failed to add conmon to systemd sandbox cgroup", "failed to write to /proc/self/oom_score_adj" RLIMIT_NPROC|RLIMIT_NOFILE "Failed to receive console file descriptor Communication error on send" "Failed to add conmon to systemd sandbox cgroup", "failed to write to /proc/self/oom_score_adj" Apr 21, 2024
@dezza dezza changed the title RLIMIT_NPROC|RLIMIT_NOFILE "Failed to receive console file descriptor Communication error on send" "Failed to add conmon to systemd sandbox cgroup", "failed to write to /proc/self/oom_score_adj" Failed to receive console file descriptor Communication error on send Apr 21, 2024
@dezza dezza changed the title Failed to receive console file descriptor Communication error on send Failed to receive console file descriptor Communication error on send (Rootless containers only) Apr 23, 2024
@dezza
Copy link
Author

dezza commented Apr 23, 2024

Since the error happened above 4.7.2 and 4.8.2 is affected I believe this commit could be introducing the error:
b7cfcea

Fact is the error does not happen with a rootful container, so it seems to be isolated to rootless.

What do you think @giuseppe ?

Its odd this only happens on some machines, but it makes it impossible to update beyond 4.7.2.

@dezza dezza changed the title Failed to receive console file descriptor Communication error on send (Rootless containers only) Failed to receive console file descriptor Communication error on send (Rootless, above 4.7.2+) Apr 23, 2024
@Romain-Geissler-1A
Copy link
Contributor

Hi,

Just quickly lurking into this. This may be fixed by #19927 (which I think it present into podman 4.9).

So maybe the "fix" is to migrate to podman 4.9 or 5.0 if possible on your side. Podman 4.9 was just released for RHEL 8 and 9.

@dezza dezza closed this as completed May 6, 2024
@dezza
Copy link
Author

dezza commented May 6, 2024

Hi,

Just quickly lurking into this. This may be fixed by #19927 (which I think it present into podman 4.9).

So maybe the "fix" is to migrate to podman 4.9 or 5.0 if possible on your side. Podman 4.9 was just released for RHEL 8 and 9.

I just replaced that box (with fresh deploy) yesterday as there was no traction on this issue.

No version was working above 4.7.2+ I tried all several times

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 5, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Aug 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

2 participants