Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman rm -a shows "No such file or directory" #2900

Closed
jtudelag opened this issue Apr 11, 2019 · 18 comments · Fixed by #3073
Closed

podman rm -a shows "No such file or directory" #2900

jtudelag opened this issue Apr 11, 2019 · 18 comments · Fixed by #3073
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@jtudelag
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When running podman rm -a, an error message: "No such file or directory is shown"

Steps to reproduce the issue:

  1. podman rm -a

Describe the results you received:

$ podman ps                                                                                                                                                                                                                                                                                                                                                                                  
CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES                                                                                                                                                                                                                                                                                                                                  
$ podman ps -a                                                                                                                                                                                                                                                                                                                                                                               
CONTAINER ID  IMAGE                              COMMAND               CREATED            STATUS                          PORTS                 NAMES                                                                                                                                                                                                                                       
d28ccf5e7ab7  localhost/jorget/pbench-fio:v1.0   /bin/bash             2 minutes ago      Exited (0) 2 minutes ago                              elated_nobel                                                                                                                                                                                                                                
a2a771643f10  localhost/jorget/pbench-fio:v1.0   /opt/pbench-fio-t...  3 minutes ago      Created                                               objective_pare                                                                                                                                                                                                                              
84140756a6fb  localhost/jorget/pbench-fio:v1.0   /bin/bash             4 minutes ago      Exited (127) 3 minutes ago                            youthful_mcclintock                                                                                                                                                                                                                         
92bec0ca4591  localhost/jorget/pbench-fio:v1.0   /bin/bash             4 minutes ago      Exited (0) 4 minutes ago                              elegant_chatelet
b62b463b8865  localhost/jorget/pbench-fio:v1.0   /bin/bash             21 minutes ago     Exited (0) 7 minutes ago        0.0.0.0:2323->22/tcp  recursing_rhodes
7690f5398d3d  localhost/jorget/pbench-fio:v1.0   /bin/bash             22 minutes ago     Exited (137) 21 minutes ago     0.0.0.0:2323->22/tcp  vigilant_mendel
faa9013e2340  localhost/jorget/pbench-fio:v1.0   /usr/sbin/init        About an hour ago  Exited (137) About an hour ago  0.0.0.0:2323->22/tcp  nostalgic_antonelli
e30a57fa2823  localhost/jorget/pbench-fio:v1.0   /usr/sbin/init        About an hour ago  Exited (137) About an hour ago  0.0.0.0:2323->22/tcp  jovial_pike
97e2404083fd  localhost/jorget/pbench-fio:v1.0   /usr/sbin/init        About an hour ago  Created                                               inspiring_kepler
903b36684896  localhost/jorget/pbench-fio:v1.0   /usr/sbin/init        About an hour ago  Exited (137) About an hour ago  0.0.0.0:2323->22/tcp  fervent_meitner
715e64988c56  localhost/jorget/pbench-fio:v1.0   /usr/sbin/init        About an hour ago  Exited (137) About an hour ago  0.0.0.0:2323->22/tcp  clever_fermi
5e3f3ee7ee94  localhost/jorget/pbench-fio:v1.0   /usr/sbin/init        About an hour ago  Exited (137) About an hour ago  0.0.0.0:2323->22/tcp  crazy_jennings
ac45d9e6a55e  docker.io/library/elasticsearch:5  /docker-entrypoin...  8 weeks ago        Created                                               blissful_golick
b2767dbf6c6b  docker.io/library/elasticsearch:5  /docker-entrypoin...  4 months ago       Created                                               romantic_mayer
7336ba5ca27a  docker.io/pihole/pihole:latest     /s6-init              4 months ago       Created                                               elegant_knuth
$ podman --log-level=debug rm -a
INFO[0000] running as rootless
DEBU[0000] Initializing boltdb state at /home/jtudelag/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Overriding run root "/run/user/1000" with "/run/user/1000/run" from database
DEBU[0000] Using graph driver vfs
DEBU[0000] Using graph root /home/jtudelag/.local/share/containers/storage
DEBU[0000] Using run root /run/user/1000/run
DEBU[0000] Using static dir /home/jtudelag/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp
DEBU[0000] Using volume path /home/jtudelag/.local/share/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] Not configuring container store
INFO[0000] running as rootless
WARN[0000] The configuration is using `runtime_path`, which is deprecated and will be removed in future.  Please use `runtimes` and `runtime`
WARN[0000] If you are using both `runtime_path` and `runtime`, the configuration from `runtime_path` is used
DEBU[0000] Initializing boltdb state at /home/jtudelag/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Overriding run root "/run/user/1000" with "/run/user/1000/run" from database
DEBU[0000] Using graph driver vfs
DEBU[0000] Using graph root /home/jtudelag/.local/share/containers/storage
DEBU[0000] Using run root /run/user/1000/run
DEBU[0000] Using static dir /home/jtudelag/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp
DEBU[0000] Using volume path /home/jtudelag/.local/share/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "vfs"
DEBU[0000] Setting maximum workers to 16
DEBU[0000] Cleaning up container 5e3f3ee7ee945b3d628639c1515d52abf84bbbec22855776857cb9f2765e5e33
DEBU[0000] Network is already cleaned up, skipping...
DEBU[0000] Storage is already unmounted, skipping...
DEBU[0000] Storage is already unmounted, skipping...
5e3f3ee7ee945b3d628639c1515d52abf84bbbec22855776857cb9f2765e5e33
INFO[0000] running as rootless
WARN[0000] The configuration is using `runtime_path`, which is deprecated and will be removed in future.  Please use `runtimes` and `runtime`
WARN[0000] If you are using both `runtime_path` and `runtime`, the configuration from `runtime_path` is used
DEBU[0000] Initializing boltdb state at /home/jtudelag/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Overriding run root "/run/user/1000" with "/run/user/1000/run" from database
DEBU[0000] Using graph driver vfs
DEBU[0000] Using graph root /home/jtudelag/.local/share/containers/storage
DEBU[0000] Using run root /run/user/1000/run
DEBU[0000] Using static dir /home/jtudelag/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp
DEBU[0000] Using volume path /home/jtudelag/.local/share/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "vfs"
DEBU[0000] Setting maximum workers to 16
DEBU[0000] Cleaning up container 715e64988c56b837da5af52f3f7dbbfac936dbb4404c28ae094bddd7d60a8667
DEBU[0000] Network is already cleaned up, skipping...
DEBU[0000] Storage is already unmounted, skipping...
DEBU[0000] Storage is already unmounted, skipping...
715e64988c56b837da5af52f3f7dbbfac936dbb4404c28ae094bddd7d60a8667
INFO[0000] running as rootless
WARN[0000] The configuration is using `runtime_path`, which is deprecated and will be removed in future.  Please use `runtimes` and `runtime`
WARN[0000] If you are using both `runtime_path` and `runtime`, the configuration from `runtime_path` is used
DEBU[0000] Initializing boltdb state at /home/jtudelag/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Overriding run root "/run/user/1000" with "/run/user/1000/run" from database
DEBU[0000] Using graph driver vfs
DEBU[0000] Using graph root /home/jtudelag/.local/share/containers/storage
DEBU[0000] Using run root /run/user/1000/run
DEBU[0000] Using static dir /home/jtudelag/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp
DEBU[0000] Using volume path /home/jtudelag/.local/share/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "vfs"
DEBU[0000] Setting maximum workers to 16
DEBU[0000] Cleaning up container 7336ba5ca27acf31ec13286ac499e7d716c8d3637d54e3e65c749a2568ada7ac
DEBU[0000] Network is already cleaned up, skipping...
DEBU[0000] Storage is already unmounted, skipping...
DEBU[0000] Storage is already unmounted, skipping...
ERRO[0000] no such file or directory

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

podman version
Version:            1.2.0
RemoteAPI Version:  1
Go Version:         go1.11.5
OS/Arch:            linux/amd64

Output of podman info --debug:

$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.11.5
  podman version: 1.2.0
host:
  BuildahVersion: 1.7.2
  Conmon:
    package: podman-1.2.0-2.git3bd528e.fc29.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.12.0-dev, commit: d88bb0e63cb70f9787a8e410716924f380af361f'
  Distribution:
    distribution: fedora
    version: "29"
  MemFree: 3411947520
  MemTotal: 20434337792
  OCIRuntime:
    package: runc-1.0.0-68.dev.git6635b4f.fc29.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc6+dev
      commit: ef9132178ccc3d2775d4fb51f1e431f30cac1398-dirty
      spec: 1.0.1-dev
  SwapFree: 10287050752
  SwapTotal: 10292817920
  arch: amd64
  cpus: 4
  hostname: jtudelag-t460s
  kernel: 5.0.5-200.fc29.x86_64
  os: linux
  rootless: true
  uptime: 42h 43m 35.56s (Approximately 1.75 days)
insecure registries:
  registries:
  - localhost:5000
store:
  ConfigFile: /home/jtudelag/.config/containers/storage.conf
  ContainerStore:
    number: 6
  GraphDriverName: vfs
  GraphOptions: null
  GraphRoot: /home/jtudelag/.local/share/containers/storage
  GraphStatus: {}
  ImageStore:
    number: 47
  RunRoot: /run/user/1000/run
  VolumePath: /home/jtudelag/.local/share/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 11, 2019
@mheon
Copy link
Member

mheon commented Apr 11, 2019

I've been seeing this one occasionally. The container is definitely removed, so this seems to be a cleanup error - the container is gone, but some resources might not be fully freed. It's an ENOENT, definitely.

The curious thing is that we're not actually printing where the error happened. All of these cleanup errors should prefix what went wrong (the "No such file or directory" part) with the component that failed to clean up after itself - storage, networking, etc.

Oh, wait... they only do that if there's more than one error...

Alright, first step here is to get a patch in to improve debug information on cleanup error.

@mheon
Copy link
Member

mheon commented Apr 12, 2019

I'm going to try and deliberately trigger this, see if i can get proper debug output now. I suspect it's related to freeing the lock.

@hardys
Copy link

hardys commented Apr 17, 2019

I've been seeing similar, where podman pod exists somepod returns zero, then immediately after podman pod rm somepod fails with Error: no such file or directory

I'd assumed it was some sort of corruption, and we've been working around it by either deleting the podman DB and reinstalling, or by using go get github.com/br0xen/boltbrowser then using that to manually clean up.

@mheon
Copy link
Member

mheon commented Apr 17, 2019 via email

@hardys
Copy link

hardys commented Apr 17, 2019

@celebdor do you have any details of the boltbrowser hacking you had to do which may be of interest to @mheon ?

@abitrolly
Copy link
Contributor

I am blocked by the same issue. When this new version with improved debug output will be released?

@abitrolly
Copy link
Contributor

It is very strange.

➜  ~ podman rm d89
Error: no such file or directory
➜  ~ podman rm c74
Error: no such file or directory
➜  ~ podman rm 7d4
Error: no such file or directory
➜  ~ podman rm d89 c74 7d4
➜  ~ podman ps -a         

@mheon
Copy link
Member

mheon commented May 5, 2019

1.3 should be releasing on Monday, and hopefully available in the repos later this week. It will include this patch.

For reference, after some further debugging, I'm fairly certain this is coming out of the lock allocation code, and it's not actually a "no such file or directory" - just a somewhat questionable reuse of the error code to indicate that a lock has already been released (the error code is my fault - I promise it made some degree of sense at the time).

The lock error itself is harmless, though it may indicate that a reboot occurred without Podman detecting it, which can lead to somewhat undesirable effects (for example, container states being desynced).

@abitrolly
Copy link
Contributor

This command removes one container and shows the error.

➜  ~ podman rm -a
Error: no such file or directory

@abitrolly
Copy link
Contributor

@mheon the error is preventing rm -a to remove the rest of containers. I can't say it's harmless.

@mheon
Copy link
Member

mheon commented May 5, 2019

That's a separate bug in rm -a then - we should be trying each container independently

@celebdor
Copy link

celebdor commented May 6, 2019

I just went and deleted all the references to the unremovable container with boltbrowser:

go get -v github.com/br0xen/boltbrowser
sudo ~/go/bin/boltbrowser /var/lib/containers/storage/libpod/bolt_state.db

@mheon
Copy link
Member

mheon commented May 6, 2019

@celebdor An unremovable container is very concerning - if it happens again, would you mind opening an issue here? Containers that manage to wedge themselves to the point that they can't be removed would be a top-priority bug for us. The original issue here (No such file or directory) is almost certainly separate/harmless (the container is still gone after rm runs).

@mheon
Copy link
Member

mheon commented May 6, 2019

Think I have a clue on what the original no such file or directory issue is

@mheon
Copy link
Member

mheon commented May 6, 2019

Confirmed - I've identified the root cause of the issue.

Patching now.

mheon added a commit to mheon/libpod that referenced this issue May 6, 2019
After a reboot, when we refresh Podman's state, we retrieved the
lock from the fresh SHM instance, but we did not mark it as
allocated to prevent it being handed out to other containers and
pods.

Provide a method for marking locks as in-use, and use it when we
refresh Podman state after a reboot.

Fixes containers#2900

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@mheon
Copy link
Member

mheon commented May 6, 2019

#3073 to fix

mheon added a commit to mheon/libpod that referenced this issue May 6, 2019
After a reboot, when we refresh Podman's state, we retrieved the
lock from the fresh SHM instance, but we did not mark it as
allocated to prevent it being handed out to other containers and
pods.

Provide a method for marking locks as in-use, and use it when we
refresh Podman state after a reboot.

Fixes containers#2900

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
mheon added a commit to mheon/libpod that referenced this issue May 6, 2019
After a reboot, when we refresh Podman's state, we retrieved the
lock from the fresh SHM instance, but we did not mark it as
allocated to prevent it being handed out to other containers and
pods.

Provide a method for marking locks as in-use, and use it when we
refresh Podman state after a reboot.

Fixes containers#2900

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@hardys
Copy link

hardys commented May 8, 2019

@mheon just to clarify, the issues we're seeing are either pods or containers that cannot be removed (e.g the container/pod is not gone after rm runs, hence the DB hacking).

An example which I just ran into now, immediately after a reboot:

$ sudo podman pod exists ironic-pod ; echo $?
0
$ sudo podman pod rm ironic-pod -f
Error: no such file or directory
$ sudo podman pod ps
POD ID         NAME         STATUS    CREATED        # OF CONTAINERS   INFRA ID
d8989fe173da   ironic-pod   Created   18 hours ago   1                 fee311ec44f8

This is with podman-1.2-2.git3bd528e.el7.x86_64 so I'm wondering if this is likely to be fixed by the #3073 PR, or if there's another issue to flush out?

Happy to build/test the latest locally but wanted to confirm if that's worthwhile effort before proceeding, thanks!

@mheon
Copy link
Member

mheon commented May 8, 2019

Our solution here is a combination of #3073 and #3082

#3073 fixed the root cause of the errors, so we won't have issues with them anymore.

#3082 resolves the issues that caused podman pod rm to fail on what should have been non-fatal errors, preventing this from happening again.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants