Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman errors after unclean shutdown #9824

Closed
martinetd opened this issue Mar 26, 2021 · 8 comments
Closed

podman errors after unclean shutdown #9824

martinetd opened this issue Mar 26, 2021 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@martinetd
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Had an abrupt shutdown with containers running yesterday (rootless), now trying to do things like list containers fails quite verbosely:

$ ./bin/podman ps
ERRO[0000] error joining network namespace for container 4f9ac2d7b85e5b0c4dc6556294b9eb2465cc0988b8570879c49bdae1d8cd1e9f: error retrieving network namespace at /run/user/1000/netns/cni-d4b508d5-8f6e-2c92-c84b-9af57040ac2b: failed to Statfs "/run/user/1000/netns/cni-d4b508d5-8f6e-2c92-c84b-9af57040ac2b": no such file or directory 
ERRO[0000] error joining network namespace for container 8f0266ac850cb8aec65ceb21ad6f885b088541a84373bbbeecebc4ca20faae2f: error retrieving network namespace at /run/user/1000/netns/cni-39219b0e-380e-3beb-f45d-4c90d03ef6aa: failed to Statfs "/run/user/1000/netns/cni-39219b0e-380e-3beb-f45d-4c90d03ef6aa": no such file or directory 
ERRO[0000] error joining network namespace for container bb9813e24b53ac52682485f67925929fa5d5f02d97f545947c996540e427bb72: error retrieving network namespace at /run/user/1000/netns/cni-b8917bba-e000-eb64-6dc5-9ab5ec6f5ff8: failed to Statfs "/run/user/1000/netns/cni-b8917bba-e000-eb64-6dc5-9ab5ec6f5ff8": no such file or directory 
ERRO[0000] error joining network namespace for container d2e5ea5f2962b2c38f675a60cd26c9d4a93faafc956a9ea5e3348e0a8a22bce4: error retrieving network namespace at /run/user/1000/netns/cni-7e07b6de-7311-2197-05fe-b2141e758d60: failed to Statfs "/run/user/1000/netns/cni-7e07b6de-7311-2197-05fe-b2141e758d60": no such file or directory 
ERRO[0000] error joining network namespace for container df0c0ef53188652b38c7407bb0d36204c7ebb41ef8e61d2a99ed451738f24fc2: error retrieving network namespace at /run/user/1000/netns/cni-b0a30e07-b641-1c72-1d6c-6c6aed628889: failed to Statfs "/run/user/1000/netns/cni-b0a30e07-b641-1c72-1d6c-6c6aed628889": no such file or directory 
Error: error joining network namespace of container 4f9ac2d7b85e5b0c4dc6556294b9eb2465cc0988b8570879c49bdae1d8cd1e9f: error retrieving network namespace at /run/user/1000/netns/cni-d4b508d5-8f6e-2c92-c84b-9af57040ac2b: failed to Statfs "/run/user/1000/netns/cni-d4b508d5-8f6e-2c92-c84b-9af57040ac2b": no such file or directory

Steps to reproduce the issue:

  1. run containers with podman run? I guess. I had a few pods running too.

  2. unplug PC, reboot

  3. podman ps, podman kill -a all fail.

Describe the results you received:

plenty of errors, expected output not shown even if I run new containers (starting new containers works fine)

Describe the results you expected:

plenty of errors once with automatic cleanup of no longer existing containers perhaps? followed by normal output, then back to normal with no error.

Output of podman version:

Version:      3.1.0-dev
API Version:  3.1.0-dev
Go Version:   go1.15.9
Git Commit:   9e23e0b3e3b219cbdc42fac4f843d6d2ec97421b
Built:        Fri Mar 26 11:25:35 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

ERRO[0000] error joining network namespace for container 4f9ac2d7b85e5b0c4dc6556294b9eb2465cc0988b8570879c49bdae1d8cd1e9f: error retrieving network namespace at /run/user/1000/netns/cni-d4b508d5-8f6e-2c92-c84b-9af57040ac2b: failed to Statfs "/run/user/1000/netns/cni-d4b508d5-8f6e-2c92-c84b-9af57040ac2b": no such file or directory 
ERRO[0000] error joining network namespace for container 8f0266ac850cb8aec65ceb21ad6f885b088541a84373bbbeecebc4ca20faae2f: error retrieving network namespace at /run/user/1000/netns/cni-39219b0e-380e-3beb-f45d-4c90d03ef6aa: failed to Statfs "/run/user/1000/netns/cni-39219b0e-380e-3beb-f45d-4c90d03ef6aa": no such file or directory 
ERRO[0000] error joining network namespace for container bb9813e24b53ac52682485f67925929fa5d5f02d97f545947c996540e427bb72: error retrieving network namespace at /run/user/1000/netns/cni-b8917bba-e000-eb64-6dc5-9ab5ec6f5ff8: failed to Statfs "/run/user/1000/netns/cni-b8917bba-e000-eb64-6dc5-9ab5ec6f5ff8": no such file or directory 
ERRO[0000] error joining network namespace for container d2e5ea5f2962b2c38f675a60cd26c9d4a93faafc956a9ea5e3348e0a8a22bce4: error retrieving network namespace at /run/user/1000/netns/cni-7e07b6de-7311-2197-05fe-b2141e758d60: failed to Statfs "/run/user/1000/netns/cni-7e07b6de-7311-2197-05fe-b2141e758d60": no such file or directory 
ERRO[0000] error joining network namespace for container df0c0ef53188652b38c7407bb0d36204c7ebb41ef8e61d2a99ed451738f24fc2: error retrieving network namespace at /run/user/1000/netns/cni-b0a30e07-b641-1c72-1d6c-6c6aed628889: failed to Statfs "/run/user/1000/netns/cni-b0a30e07-b641-1c72-1d6c-6c6aed628889": no such file or directory 
host:
  arch: amd64
  buildahVersion: 1.19.8
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.25, commit: unknown'
  cpus: 8
  distribution:
    distribution: debian
    version: unknown
  eventLogger: journald
  hostname: xyz
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.10.0-4-amd64
  linkmode: dynamic
  memFree: 962813952
  memTotal: 16556802048
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.1
      commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
      libslirp: 4.4.0
  swapFree: 27965931520
  swapTotal: 28276834304
  uptime: 26h 52m 4.93s (Approximately 1.08 days)
registries:
  localhost:80:
    Blocked: false
    Insecure: true
    Location: localhost:80
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: localhost:80
  localhost:5000:
    Blocked: false
    Insecure: true
    Location: localhost:5000
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: localhost:5000
store:
  configFile: /home/xxx/.config/containers/storage.conf
  containerStore:
    number: 10
    paused: 0
    running: 8
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.10.2
        fuse-overlayfs: version 1.4
        FUSE library version 3.10.2
        using FUSE kernel interface version 7.31
  graphRoot: /home/xxx/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 95
  runRoot: /run/user/1000/containers
  volumePath: /home/xxx/.local/share/containers/storage/volumes
version:
  APIVersion: 3.1.0-dev
  Built: 1616725535
  BuiltTime: Fri Mar 26 11:25:35 2021
  GitCommit: 9e23e0b3e3b219cbdc42fac4f843d6d2ec97421b
  GoVersion: go1.15.9
  OsArch: linux/amd64
  Version: 3.1.0-dev

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 26, 2021
@martinetd
Copy link
Author

As that got too annoying I manually cleaned up my containers storage: a few big bad podman rm <container id> --force for each container id in the for container 4f9ac2d7b85e5b... messages.

One of them was a pod according to error on podman rm (as expected), so podman pod rm for that one and it worked.
Some had no relation to pods altogether (I checked in ~/.local/share/containers/storage/overlay-containers/containers.json to get container names out of the IDs before deleting them); so this issue isn't pod related though.

Also some containers refused to be removed because supposedly running, so I had to pass force -- I don't think they were still running after a hard reset...

$ podman rm registry
ERRO[0000] error joining network namespace for container 8f0266ac850cb8aec65ceb21ad6f885b088541a84373bbbeecebc4ca20faae2f: error retrieving network namespace at /run/user/xxx/netns/cni-39219b0e-380e-3beb-f45d-4c90d03ef6aa: failed to Statfs "/run/user/xxx/netns/cni-39219b0e-380e-3beb-f45d-4c90d03ef6aa": no such file or directory 
Error: cannot remove container 8f0266ac850cb8aec65ceb21ad6f885b088541a84373bbbeecebc4ca20faae2f as it is running - running or paused containers cannot be removed without force: container state improper

$ podman rm --force registry
ERRO[0000] error joining network namespace for container 8f0266ac850cb8aec65ceb21ad6f885b088541a84373bbbeecebc4ca20faae2f: error retrieving network namespace at /run/user/xxx/netns/cni-39219b0e-380e-3beb-f45d-4c90d03ef6aa: failed to Statfs "/run/user/xxx/netns/cni-39219b0e-380e-3beb-f45d-4c90d03ef6aa": no such file or directory 
ERRO[0000] Storage for container 8f0266ac850cb8aec65ceb21ad6f885b088541a84373bbbeecebc4ca20faae2f has been removed 
Error: error freeing lock for container 8f0266ac850cb8aec65ceb21ad6f885b088541a84373bbbeecebc4ca20faae2f: no such file or directory

Now everything's been cleaned up commands work as normal again.

@vrothberg
Copy link
Member

Thanks for reaching out and providing all the date, @martinetd.

@mheon PTAL. Are the containers in the wrong state?

@mheon
Copy link
Member

mheon commented Mar 31, 2021

From his later description, yes. It looks like Podman failed to detect a restart and clean the state (usually, we get the opposite of this, with systemd or others cleaning out the state even when a restart did not occur). Best guess: Podman is using /tmp as our temporary files directory, it is not mounted as tmpfs, and the tmpfiles.d script we have either does not cover the directory in /tmp or was not installed due to a too-old Podman.

@mheon
Copy link
Member

mheon commented Mar 31, 2021

@martinetd If your /tmp is, indeed, not a tmpfs, you can verify what temporary files directory we are using with podman info --log-level=debug and looking for a line like:

DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp  

@martinetd
Copy link
Author

@mheon thanks - I have the same tmpdir (/run/usr/../libpod/tmp), which is a tmpfs.

So if I understand what you said, podman is supposed to detect that this directory is empty immediately after a restart and clean up things in home dir/persistent data for futher commands?

Would it be possible that the first command I ran after reboot populated that directory without checking if it was empty, and following commands then couldn't follow up on cleanup? Unfortunately I don't remember what that would have been and can't think of a way to check now... Probably an interactive podman run -it... but not 100% sure.

@mheon
Copy link
Member

mheon commented Mar 31, 2021

Shouldn't be possible for us to populate without checking. All Podman commands begin by performing a check of the directory (really, a single file in said directory, named alive). If it does not exist, we perform a refresh of the Podman state to account for the reboot and create it. This process is locked to prevent races, and the Alive file is only created after the rest of the process has run. The only theory I have is that the alive file can be created if the refresh has run, but not successfully (to ensure we don't lock Podman in a permanent unable-to-run state if an error occurs with it) - perhaps something happened that prevented us from writing changes to the DB, and the refresh failed?

@martinetd
Copy link
Author

hm; that alive file is there alright now; looking at its mtime it was likely created just after reboot a few days ago, so I'll take your guess on that db update somehow failed.

I'm sure I'll fail to notice the AC isn't plugged in correctly again and will try to remember to take more traces on the first podman command I run after that next time.. Don't think there's much we can do until then.
Thanks for the insights

@martinetd
Copy link
Author

Just had another unexpected shutdown, but this time podman cleaned up properly on first execution with podman ps. It's a shame I don't remember what command I ran first at least :/ Running containers/pods should have been pretty similar so it's very curious.

Anyway, I don't think this will be possible to reproduce in a timely manner -- it could also have been fixed by an update in the past ten days. I'll close for now and re-open if it ever happens again, hopefully I'll think of taking traces when it does!

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

4 participants