Impossible to recreate a container with the same name that a container already removed #2240

4383 · 2019-01-30T13:03:53Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

A script launch the following command to start a container with the rm flag so the contianer will be destroyed at exit but when I try to recreate a container manually with the same podman command, podman fail to create the container and display the following error:

$ podman run --rm --name nova_cellv2_discover_hosts -it --label config_id=tripleo_step5 --label container_name=nova_cellv2_discover_hosts --label managed_by=paunch --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro --volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro --volume=/var/log/containers/nova:/var/log/nova --volume=/var/lib/docker-config-scripts/:/docker-config-scripts/ 192.168.122.1:5000/fedora-binary-nova-compute:ospsprint 
error creating container storage: the container name "nova_cellv2_discover_hosts" is already in use by "5efe2260d1aaadf63e8ce70d0aca100472bb0e0ee90884e95c785821a37d694c". You have to remove that container to be able to reuse that name.: that name is already in use

When I try to inspect for an existing volume or something like that I doesn't found any results:

$ sudo podman ps -a |grep 5efe                                                                                       
$ # no results found
$ sudo podman volume list
$ # no results found and no volumes exists

Look like similar to #1359

Steps to reproduce the issue:

run the command podman run --rm command twices

Describe the results you received:

error creating container storage: the container name "nova_cellv2_discover_hosts" is already in use by "5efe2260d1aaadf63e8ce70d0aca100472bb0e0ee90884e95c785821a37d694c". You have to remove that container to be able to reuse that name.: that name is already in use

Describe the results you expected:

I'm waiting for a container creation

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

podman version 1.0.0

Output of podman info:

$ sudo podman info
host:
  BuildahVersion: 1.6-dev
  Conmon:
    package: podman-1.0.0-1.git82e8011.module+el8+2696+e59f0461.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.14.0-dev, commit: 52154d748ee9623ac65d34514ec22063d2633ac2-dirty'
  Distribution:
    distribution: '"rhel"'
    version: "8.0"
  MemFree: 382480384
  MemTotal: 16645574656
  OCIRuntime:
    package: runc-1.0.0-54.rc5.dev.git2abd837.module+el8+2650+e6b3d617.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.0'
  SwapFree: 796397568
  SwapTotal: 1073737728
  arch: amd64
  cpus: 16
  hostname: herve.localdomain
  kernel: 4.18.0-60.el8.x86_64
  os: linux
  rootless: false
  uptime: 48h 20m 18.38s (Approximately 2.00 days)
insecure registries:
  registries:
  - 192.168.122.1:5000
  - 192.168.24.2:8787
registries:
  registries:
  - registry.redhat.io
  - quay.io
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 90
  GraphDriverName: overlay
  GraphOptions: null
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
  ImageStore:
    number: 28
  RunRoot: /var/run/containers/storage

Additional environment details (AWS, VirtualBox, physical, etc.):
KVM

The text was updated successfully, but these errors were encountered:

rhatdan · 2019-01-30T14:05:30Z

Does the original container actually exists?

mheon · 2019-01-30T14:44:08Z

Try a podman pod ps and see if there are any pods with that name/ID.

rhatdan · 2019-01-30T15:00:00Z

There could be a race condition here. Where one container is exiting, and running podman cleanup while another container is launching,

4383 · 2019-01-30T18:03:12Z

I have already try podman ps and the container doesn't exist in the list...

TomSweeneyRedHat · 2019-01-30T18:18:28Z

Just to be sure, did you try podman ps -a to show all the containers?

4383 · 2019-01-30T18:29:03Z

Yep see the bug description

4383 · 2019-01-30T18:31:09Z

I have try the following commands to find an existing container with the same name and no results was founds:

$ sudo podman ps                                                                             
$ # no results found
$ sudo podman ps -a                                                                             
$ # no results found
$ sudo podman volume list
$ # no results found and no volumes exists

mheon · 2019-01-30T18:33:59Z

Volumes don't share names with pods and containers, so podman volume list doesn't really help.

Can you try podman pod ps? Pods do share names with containers

4383 · 2019-01-30T20:27:42Z

@mheon I'm sorry but I'm not sure to can reproduce this issue all the time (a little bit random) and I have already reset my env... If I'm facing it again lets me append my traceback and outputs commands here, especially podman pod ps

mheon · 2019-01-30T21:19:44Z

If you do manage to reproduce again, and the pod check produces negative, also append /var/lib/containers/storage/libpod/bolt_state.db

mheon · 2019-01-30T21:20:32Z

There's a small chance we have some sort of state corruption going on, but I would think we would have hit this before if so.

rhatdan · 2019-01-31T08:46:20Z

I attempted something this
for i in {1..100};do podman run --name dan --rm fedora echo hello; done
To see if we could be suffering from a race condition.

But nothing failed.

4383 · 2019-01-31T10:07:31Z

If I reproduce again I will push all the informations to this issue

mbaldessari · 2019-02-01T10:45:36Z

FWIW when I was looking at this with Herve I did try a 'podman pod ps' and it returned empty. Hopefully we can collect /var/lib/containers/storage/libpod/bolt_state.db next time we hit this

bogdando · 2019-02-01T10:52:15Z

you can try the better reproducer version from #1656 (just drop the docker'ish part of it)

4383 · 2019-02-01T13:01:45Z

Well.. I have successfully reproduce the problem...

podman pod ps is always empty.
I have extract the bolt_state.db like @mbaldessari suggest and it was attached to this comment.
bolt_state.db.zip

rhatdan · 2019-02-01T13:11:05Z

I was able to get into this state also, but calling podman rm -f on a running container.

mheon · 2019-02-04T20:36:28Z

The containers seem to be remaining in c/storage, preventing us from creating new containers with the same names.

Part of the problem seems to be c/storage not being durable enough under stress - it seems to start failing to delete containers a lot sooner than the rest of Podman. When it does, we still get rid of as much of the container as we can (rather than leave a half-configured container around), but the lingering c/storage container conflicts with new containers with the same name. We should look into why c/storage is failing here.

I'm not sure if we have a good option for deleting the lingering storage containers... For all we know, they're valid buildah or CRI-O containers (we'll know they don't belong to CRI-O soon enough, but there are no plans to put buildah on libpod), so we can't safely delete them.

4383 · 2019-02-05T08:19:38Z

@mheon interesting analyze

rhatdan · 2019-02-05T17:48:16Z

I think we should add something to rm --removestorage, which would ignore the error from libpod saying the container does not exist, and remove the storage.

mheon · 2019-02-05T20:20:51Z

@rhatdan Should we just recommend they use buildah rm to get rid of it? I'd almost prefer that to adding potentially confusing options to podman rm

TomSweeneyRedHat · 2019-02-06T00:10:48Z

we call Buildah for building stuff, could we call rm too in error cases as a final hammer rather than telling the user to?

mheon · 2019-02-06T00:30:10Z

The problem is, we've already called into c/storage in this case, and it's failed - I don't know if hammering it more by calling it again through Buildah would help...

mheon · 2019-02-06T00:30:52Z

(We really ought to just drill into why c/storage is failing in these cases - it seems like making it more stable would be beneficial for all our tools)

rhatdan · 2019-02-06T15:01:12Z

Are we sure that an error happened? or was this a race condition.

mheon · 2019-02-06T15:02:44Z

@rhatdan Do you still have your reproducer? I'm expecting that we're getting errors out of c/storage, and we'd be printing them in that case

rhatdan · 2019-02-06T15:03:25Z

I am not crazy about requiring buildah to be installed to get us out of a state where the container image was accidently left around.

If I do a podman rm --force foobar, The user would expect the container to be removed and then be able to do
podman run --name foobar.

We can add documentation to podman rm --force foobar indicating that this will remove not only podman containers named foobar but could remove containers created by other tools.

rhatdan · 2019-02-23T21:20:06Z

I have merged in a fix for podman rm --force that will remove a container that libpod does not know about.

This will get you our of this situation.

mheon · 2019-04-17T19:31:31Z

The issue here seems to be some Podman command between the container being started, and the container being removed, is run in a container without /var/run from the host mounted (or, otherwise, missing the /var/run/libpod/alive file we use to check to see if the system has restarted, plus whatever c/storage uses for the same thing). This causes us to lose track of container status - whether it's been mounted, how many times, etc. When we attempt to remove the container, it's still mounted, but c/storage doesn't know this (it lost the mount counter, I believe?), so we get a failure as it's still in use.

mheon · 2019-04-17T19:32:42Z

I can partially work around this on the Podman side by making our refresh code smarter (but slower) and actually querying c/storage and runc to see what the container is doing at the moment.

However, I can't fix c/storage losing the mount counter because /var/run was changed, so I can't directly fix this on the Podman side.

mheon · 2019-04-17T19:33:53Z

Also, unfortunately, buildah rm no longer works on containers without a buildah.conf.

This means we no longer have a way of working directly with c/storage containers that get orphaned.

rhatdan · 2019-04-17T20:46:06Z

We should fix it, so that it also removes container images when told to --force.

mheon · 2019-04-17T20:59:27Z

I don't see how that will help here?

rhatdan · 2019-04-17T21:02:33Z

I would give you a way to cleanup

mheon · 2019-04-17T21:03:37Z

I don't think that helps us? The issue here is that we don't know the container is mounted, so it's not unmounted, so attempting to remove storage doesn't work.

baude · 2019-05-29T14:38:03Z

@mheon whats the latest on this?

mheon · 2019-05-29T14:40:02Z

This probably overlaps with the work we were talking about to show c/storage containers in podman ps with a flag, and allow removal with podman rm

mheon · 2019-08-02T16:11:03Z

We've dealt with these via podman rm --storage on upstream (though there are plans to add a podman ps --storage to show all containers in c/storage as well)

grigorig · 2019-11-08T10:34:05Z

This is not fixed and podman rm --storage doesn't work for me either.
E.g.

$ podman run --pod foo --name foo-postgres -d postgres:9.6
Error: error creating container storage: the container name "mf-postgres" is already in use by "04dfc7232d5bda23990c441c825ec56d138c1ff87f34082134c23bb8fd887324". You have to remove that container to be able to reuse that name.: that name is already in use

$ podman rm -f --storage foo-postgres
foo-postgres
Error: error removing storage for container "foo-postgres": unlinkat /home/greg/.local/share/containers/storage/overlay/c4b778bbff10d826fe1c837b0147e8e51b9e539eda76a0dc34a9633dedcedad9/merged: device or resource busy

mheon · 2019-11-08T14:10:17Z

Something is likely mounted at that directory (specifically, it seems like fuse-overlayfs failed to cleanly unmount). You might want to try unmounting it in a podman unshare shell, then removing once that's done.

grigorig · 2019-11-08T14:34:26Z

Why should I care about these things as a regular user? Removing a container should work without any crazy workarounds and hacking around bugs. Especially because podman wants to be a drop-in replacement of Docker.

mheon · 2019-11-08T14:42:31Z

We are aware of this issue, and I understand that it sucks - this is definitely something that Podman should be handling automatically. There's an issue somewhere in containers/storage where containers can be registered as successfully unmounted despite the unmount failing, so our tools don't know they have to unmount on trying to remove. Thus far, this has been a very rare occurrence, so hopefully you won't have to worry about this again. If you can consistently reproduce, though, we'd love to have your help tracking this one down - it's very difficult to figure out what's going wrong when we can't manage to reproduce the issue ourselves.

cryobry · 2019-11-19T19:48:11Z

Is this issue I'm having related to this? I am trying to create a container with the same name that is already removed and failing:

Error: error creating container storage: the container name "mc_guacgui" is already in use by "18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4". You have to remove that container to be able to reuse that name.: that name is already in use
[bryan@fedora-laptop]~/containers/guac_mc$ podman rm -f 18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4
Error: Failed to evict container: "": Failed to find container "18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4" in state: no container with name or ID 18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4 found: no such container

How can I fix this?

mheon · 2019-11-19T19:49:59Z

Try podman rm --force --storage 18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4 and see if that works.

cryobry · 2019-11-19T19:53:38Z

Thanks, that got me past that point but now it looks like it resulted in some sort of permissions problem:

Error: creating file '/home/bryan/.local/share/containers/storage/overlay/87b107153a17c9044c38656eed59f8273f85c01fad9051f599f798d6005ae057/merged/run/secrets': Permission denied: OCI runtime permission denied error

cryobry · 2019-11-19T20:06:20Z

Looks like it may be related to this: https://discussion.fedoraproject.org/t/toolbox-broken-again-crun-update-in-31-20191112-0/11369/19

Two bugs in one day, yay! I'm not really sure what I'm supposed to do here. I can reboot to clear the OCI error until I create a container of the same name as one that has already been deleted. When I workaround that with rm --force --storage, it triggers another OCI error.

mheon · 2019-11-19T20:33:18Z

This seems like it could be a crun issue - @giuseppe

Regardless, this one is (probably) not Podman.

shlao · 2020-02-29T14:13:16Z

I hit this problem again:
After a reboot, I run some scripts to create pod.

+(./04_setup_ironic.sh:128): sudo podman run -d --net host --privileged --name httpd --pod ironic-pod -v /opt/dev-scripts/ironic:/shared --entrypoint /bin/runhttpd quay.io/metal3-io/ironic:master
Error: error creating container storage: the container name "httpd" is already in use by "0bbbfaecbbb46a0ad51b786dd8a7e439868a15d35091c6e24953362a36d0db18". You have to remove that container to be able to reuse that name.: that name is already in use

# podman ps
# podman pod ps
POD ID         NAME         STATUS    CREATED         # OF CONTAINERS   INFRA ID
5840254ebc5c   ironic-pod   Created   2 minutes ago   1                 cd0aa1806e0b
# podman ps -a
CONTAINER ID  IMAGE                 COMMAND  CREATED        STATUS   PORTS  NAMES
cd0aa1806e0b  k8s.gcr.io/pause:3.1           2 minutes ago  Created         5840254ebc5c-infra
# uname -a
Linux aa 3.10.0-1126.el7.x86_64 #1 SMP Mon Feb 3 15:30:44 EST 2020 x86_64 x86_64 x86_64 GNU/Linux

So I delete some files and make it works again
# rm -rf /var/lib/containers/storage/libpod/bolt_state.db
# rm -rf /var/lib/containers/storage/

cryobry · 2020-02-29T18:00:32Z

@shlao which version of podman are you running? I found that podman >= 1.7.0 fixed this issue for me. F31 is already at 1.8.0 but it looks like you are using CentOS 7.

ksingh7 · 2020-05-04T13:34:22Z

[root@rgw-5 ~]# /usr/bin/podman stop ceph-osd-189
Error: no container with name or ID ceph-osd-189 found: no such container
[root@rgw-5 ~]#
[root@rgw-5 ~]# podman ps -a | grep -i ceph-osd-189
[root@rgw-5 ~]#
[root@rgw-5 ~]# podman version
Version:            1.6.4
RemoteAPI Version:  1
Go Version:         go1.13.4
OS/Arch:            linux/amd64
[root@rgw-5 ~]#

[root@rgw-5 ~]# /usr/share/ceph-osd-run.sh 189
Error: error creating container storage: the container name "ceph-osd-189" is already in use by "30b07795d6c1e9d62e5cd82848e231c9e9803e5bcfdaf15a9af166caab36a673". You have to remove that container to be able to reuse that name.: that name is already in use
[root@rgw-5 ~]#
[root@rgw-5 ~]#
[root@rgw-5 ~]# podman ps -a | grep -i ceph-osd-189
[root@rgw-5 ~]#
[root@rgw-5 ~]#

root@rgw-5 ~]# cat /usr/share/ceph-osd-run.sh
#!/bin/bash
# Please do not change this file directly since it is managed by Ansible and will be overwritten


########
# MAIN #
########

/usr/bin/podman run \
  --rm \
  --net=host \
  --privileged=true \
  --pid=host \
  --ipc=host \
  --cpus=4 \
  -v /dev:/dev \
  -v /etc/localtime:/etc/localtime:ro \
  -v /var/lib/ceph:/var/lib/ceph:z \
  -v /etc/ceph:/etc/ceph:z \
  -v /var/run/ceph:/var/run/ceph:z \
  -v /var/run/udev/:/var/run/udev/ \
  -v /var/log/ceph:/var/log/ceph:z \
  -e OSD_BLUESTORE=1 -e OSD_FILESTORE=0 -e OSD_DMCRYPT=0 \
  -e CLUSTER=ceph \
  -v /run/lvm/:/run/lvm/ \
  -e CEPH_DAEMON=OSD_CEPH_VOLUME_ACTIVATE \
  -e CONTAINER_IMAGE=registry.redhat.io/rhceph/rhceph-4-rhel8:latest \
  -e OSD_ID="$1" \
  --name=ceph-osd-"$1" \
   \
  registry.redhat.io/rhceph/rhceph-4-rhel8:latest
[root@rgw-5 ~]#

mheon · 2020-05-04T13:43:43Z

Patches for this were landed in 1.7.0, and should be in RHEL 8.2.1 (which will include a 1.9.x release of Podman).

diwilli · 2020-06-11T14:51:14Z

@mheon Do you know when / if this will land in RHEL/CentOS 7.9?

I'm running ceph via cephadm and podman and I'm having to restart the host after every container gets upgraded because of this issue.

mheon · 2020-06-11T14:57:19Z

There are no plans for further Podman releases on Cent/RHEL 7 - I believe 1.6.4 in 7.8 will be the last.

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 30, 2019

rhatdan mentioned this issue Feb 6, 2019

Remove container from storage on --force #2281

Merged

mheon reopened this Apr 23, 2019

baude assigned mheon May 29, 2019

mheon closed this as completed Aug 2, 2019

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023

Impossible to recreate a container with the same name that a container already removed #2240

Impossible to recreate a container with the same name that a container already removed #2240

Comments

4383 commented Jan 30, 2019

rhatdan commented Jan 30, 2019

mheon commented Jan 30, 2019

rhatdan commented Jan 30, 2019

4383 commented Jan 30, 2019

TomSweeneyRedHat commented Jan 30, 2019

4383 commented Jan 30, 2019 • edited Loading

4383 commented Jan 30, 2019

mheon commented Jan 30, 2019

4383 commented Jan 30, 2019

mheon commented Jan 30, 2019

mheon commented Jan 30, 2019

rhatdan commented Jan 31, 2019 • edited Loading

4383 commented Jan 31, 2019 • edited Loading

mbaldessari commented Feb 1, 2019

bogdando commented Feb 1, 2019 • edited Loading

4383 commented Feb 1, 2019

rhatdan commented Feb 1, 2019

mheon commented Feb 4, 2019

4383 commented Feb 5, 2019

rhatdan commented Feb 5, 2019

mheon commented Feb 5, 2019

TomSweeneyRedHat commented Feb 6, 2019

mheon commented Feb 6, 2019

mheon commented Feb 6, 2019

rhatdan commented Feb 6, 2019

mheon commented Feb 6, 2019

rhatdan commented Feb 6, 2019

rhatdan commented Feb 23, 2019

mheon commented Apr 17, 2019

mheon commented Apr 17, 2019

mheon commented Apr 17, 2019

rhatdan commented Apr 17, 2019

mheon commented Apr 17, 2019

rhatdan commented Apr 17, 2019

mheon commented Apr 17, 2019

baude commented May 29, 2019

mheon commented May 29, 2019

mheon commented Aug 2, 2019

grigorig commented Nov 8, 2019

mheon commented Nov 8, 2019

grigorig commented Nov 8, 2019 • edited Loading

mheon commented Nov 8, 2019

cryobry commented Nov 19, 2019

mheon commented Nov 19, 2019

cryobry commented Nov 19, 2019 • edited Loading

cryobry commented Nov 19, 2019

mheon commented Nov 19, 2019

shlao commented Feb 29, 2020 • edited Loading

cryobry commented Feb 29, 2020

ksingh7 commented May 4, 2020

mheon commented May 4, 2020

diwilli commented Jun 11, 2020

mheon commented Jun 11, 2020

4383 commented Jan 30, 2019 •

edited

Loading

rhatdan commented Jan 31, 2019 •

edited

Loading

4383 commented Jan 31, 2019 •

edited

Loading

bogdando commented Feb 1, 2019 •

edited

Loading

grigorig commented Nov 8, 2019 •

edited

Loading

cryobry commented Nov 19, 2019 •

edited

Loading

shlao commented Feb 29, 2020 •

edited

Loading