Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.6.x: "API error (500): Could not kill running container" "container 8d67c83751e6 PID 19180 is zombie and can not be killed" #856

Closed
1 of 5 tasks
rfay opened this issue Nov 13, 2023 · 23 comments · Fixed by #884
Labels
bug Something isn't working

Comments

@rfay
Copy link
Contributor

rfay commented Nov 13, 2023

Description

With DDEV I've never seen this before in previous versions of Colima, but I've now seen it and it's been reported by others. On ddev stop:

API error (500): Could not kill running container 8d67c83751e67ee98571cb7c2b64794c30419ef9c9496791d34876dbd040dda3, cannot remove - container 8d67c83751e6 PID 19180 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes

Version

Colima Version: 0.6.1
Lima Version: 0.18.0
Qemu Version:

Operating System

  • macOS Intel <= 12 (Monterrey)
  • macOS Intel >= 13 (Ventura)
  • macOS M1 <= 12 (Monterrey)
  • macOS M1 >= 13 (Ventura)
  • Linux

Output of colima status

INFO[0000] colima is running using QEMU
INFO[0000] arch: aarch64
INFO[0000] runtime: docker
INFO[0000] mountType: sshfs
INFO[0000] socket: unix:///Users/rfay/.colima/default/docker.sock

Reproduction Steps

It doesn't happen every time, but does happen periodically on ddev stop, which is mostly the equivalent of docker stop followed by docker rm

Expected behaviour

I didn't see this ever before.

Additional context

No response

@rpkoller
Copy link

I ran into the exact same issue yesterday with Colima 0.6.0 on a M1pro running on Sonoma (14.1.1) and Lima 0.18.0. I have a default profile which is using vz/virtiofs and i created an additional test profile using qemu/sshfs ( i wanted to see if some hangs on project starts are exclusive to vz or if they happen on qemu as well). on one of my project stops in DDEV on the qemu sshfs profile i ran into the exact same error @rfay ran into. the odd detail with docker ps -a the "zombie" container was still shown after the project was stopped. but after restarting the project that zombie container was gone again.

@abiosoft abiosoft added the bug Something isn't working label Nov 13, 2023
@abiosoft
Copy link
Owner

Thanks for reporting this.

Does it freeze or it only displays the message before stopping the container?

@rfay
Copy link
Contributor Author

rfay commented Nov 13, 2023

It does not freeze, and doing another ddev stop goes cleanly. I didn't look to see if it actually removed the container, will check that next time.

@rpkoller
Copy link

also no freezing on my end. and yes the message got displayed. even though it was called a zombie it doesnt behaved like one (except the container would be still present invisible to docker ps -astill). but here is the terminal output from yesterday. first the last few lines of ddev stop

2023-11-13T01:42:16.117 Paused Mutagen sync session 'tests' 
 Container ddev-tests-web  Stopped 
 Container ddev-tests-db  Stopped 
 Container ddev-tests-web  Stopped 
 Container ddev-tests-db  Stopped 
 Container ddev-tests-db  Removed 
 Container ddev-tests-web  Removed 
Failed to stop project tests: 
API error (500): Could not kill running container 104d003aba7d2319b995d504276301a76125c6242a82e053c46b67bdc8daa950, cannot remove - container 104d003aba7d PID 20554 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes 

and after the error took place i tested:

$> docker ps -a
CONTAINER ID   IMAGE                               COMMAND                  CREATED          STATUS                            PORTS     NAMES
104d003aba7d   ddev/ddev-traefik-router:v1.22.4    "/entrypoint.sh --co…"   11 minutes ago   Exited (137) About a minute ago             ddev-router
188539ad3944   ddev/ddev-ssh-agent:v1.22.4-built   "/entry.sh ssh-agent"    12 minutes ago   Up 12 minutes (healthy)                     ddev-ssh-agent

i then started the same project again. and after the successful start i did another:

$> docker ps -a
CONTAINER ID   IMAGE                                                 COMMAND                  CREATED              STATUS                        PORTS                                                                                                          NAMES
ddc7e1101c11   ddev/ddev-traefik-router:v1.22.4                      "/entrypoint.sh --co…"   About a minute ago   Up About a minute (healthy)   127.0.0.1:80->80/tcp, 127.0.0.1:443->443/tcp, 127.0.0.1:8025-8026->8025-8026/tcp, 127.0.0.1:10999->10999/tcp   ddev-router
971ef26ae07f   ddev/ddev-dbserver-mariadb-10.5:v1.22.4-tests-built   "/docker-entrypoint.…"   2 minutes ago        Up 2 minutes (healthy)        127.0.0.1:32789->3306/tcp                                                                                      ddev-tests-db
929e4f3722ec   ddev/ddev-webserver:20231108_new_lagoon-tests-built   "/pre-start.sh"          2 minutes ago        Up 2 minutes (healthy)        8025/tcp, 127.0.0.1:32788->80/tcp, 127.0.0.1:32787->443/tcp                                                    ddev-tests-web
188539ad3944   ddev/ddev-ssh-agent:v1.22.4-built                     "/entry.sh ssh-agent"    25 minutes ago       Up 25 minutes (healthy)                                                                                                                      ddev-ssh-agent

the exited container was removed and there was a new ddev-traefik-router container running instead.

@rpkoller
Copy link

one additional note. the error happened the first time on vz and virtiofs for me now when i wanted to shut down my computer for today. so far i ran into it only on qemu and sshfs.

@rfay
Copy link
Contributor Author

rfay commented Nov 14, 2023

We're still seeing this in DDEV automated tests, https://github.com/ddev/ddev/actions/runs/6866895347/job/18674123089#step:13:358

That's colima v0.6.2 on amd64, qemu, ssh-fs, macos 12.

Changing tests now to use macos 13. (macos-latest is macos-12 for no known reason)

@rfay
Copy link
Contributor Author

rfay commented Nov 15, 2023

Getting the same failure now using macos-13 in github actions, https://github.com/ddev/ddev/actions/runs/6870719728/job/18686158079?pr=5540#step:13:842

API error (500): Could not kill running container 220f5d75b765fe9913392032eeb50f410f94e6f45e3c8a5223cb6ccb754ad5bd, cannot remove - container 220f5d75b765 PID 53183 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes

I haven't had it happen to me on my own machine lately so haven't been able to check to see if there are leftover dead containers when this happens.

@abiosoft abiosoft changed the title 0.6.1: "API error (500): Could not kill running container" "container 8d67c83751e6 PID 19180 is zombie and can not be killed" v0.6.x: "API error (500): Could not kill running container" "container 8d67c83751e6 PID 19180 is zombie and can not be killed" Nov 15, 2023
@abiosoft
Copy link
Owner

@rfay is it the reason for the github action failure? If yes, is there anywhere I can run or reproduce that locally?

Or is there any other docker run command that behaves similarly that can aid the troubleshooting?

@rfay
Copy link
Contributor Author

rfay commented Nov 15, 2023

This happens sporadically but quite often. Just ddev start and ddev stop a few times. I imagine we could make it happen with a simple script.

@rfay
Copy link
Contributor Author

rfay commented Nov 15, 2023

I'm working on a script to demonstrate this.

$ colima status
INFO[0000] colima is running using QEMU
INFO[0000] arch: aarch64
INFO[0000] runtime: docker
INFO[0000] mountType: sshfs
INFO[0000] socket: unix:///Users/rfay/.colima/default/docker.sock

Here's the docker inspect of the container that failed:

ddev-router (nginx-proxy router) docker inspect `API error (500): Could not kill running container 20e8242d350cefd4cac14709e9561966ae9c372c97e23274b459f97ca16553b6, cannot remove - container 20e8242d350c PID 107081 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes`
[
    {
        "Id": "20e8242d350cefd4cac14709e9561966ae9c372c97e23274b459f97ca16553b6",
        "Created": "2023-11-15T23:24:11.345228767Z",
        "Path": "/app/docker-entrypoint.sh",
        "Args": [
            "forego",
            "start",
            "-r"
        ],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2023-11-15T23:24:11.631724808Z",
            "FinishedAt": "2023-11-15T23:24:39.642157676Z",
            "Health": {
                "Status": "unhealthy",
                "FailingStreak": 1,
                "Log": [
                    {
                        "Start": "2023-11-15T23:24:12.63236404Z",
                        "End": "2023-11-15T23:24:12.682823578Z",
                        "ExitCode": 3,
                        "Output": "nginx config valid:OK  ddev nginx config not yet generated "
                    },
                    {
                        "Start": "2023-11-15T23:24:13.692351875Z",
                        "End": "2023-11-15T23:24:13.861309276Z",
                        "ExitCode": 0,
                        "Output": "nginx config valid:OK  ddev nginx config:generated nginx healthcheck endpoint:OK ddev-nginx-proxy-router is healthy with 2 upstreams"
                    },
                    {
                        "Start": "2023-11-15T23:24:14.874705149Z",
                        "End": "2023-11-15T23:24:39.601288709Z",
                        "ExitCode": 137,
                        "Output": "container was previously healthy, so sleeping 59 seconds before continuing healthcheck...  "
                    }
                ]
            }
        },
        "Image": "sha256:c53031efe59a92c70a107e9476af8e8655ac48735d18113e946d94f78bf56992",
        "ResolvConfPath": "/var/lib/docker/containers/20e8242d350cefd4cac14709e9561966ae9c372c97e23274b459f97ca16553b6/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/20e8242d350cefd4cac14709e9561966ae9c372c97e23274b459f97ca16553b6/hostname",
        "HostsPath": "/var/lib/docker/containers/20e8242d350cefd4cac14709e9561966ae9c372c97e23274b459f97ca16553b6/hosts",
        "LogPath": "/var/lib/docker/containers/20e8242d350cefd4cac14709e9561966ae9c372c97e23274b459f97ca16553b6/20e8242d350cefd4cac14709e9561966ae9c372c97e23274b459f97ca16553b6-json.log",
        "Name": "/ddev-router",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/var/run/docker.sock:/tmp/docker.sock:ro"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "f42c88d70c5ca2aac3b1a41cbb1605f2e271d5595e89e2b757970834584866c5",
            "PortBindings": {
                "443/tcp": [
                    {
                        "HostIp": "127.0.0.1",
                        "HostPort": "443"
                    }
                ],
                "80/tcp": [
                    {
                        "HostIp": "127.0.0.1",
                        "HostPort": "80"
                    }
                ],
                "8025/tcp": [
                    {
                        "HostIp": "127.0.0.1",
                        "HostPort": "8025"
                    }
                ],
                "8026/tcp": [
                    {
                        "HostIp": "127.0.0.1",
                        "HostPort": "8026"
                    }
                ]
            },
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "ConsoleSize": [
                0,
                0
            ],
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "private",
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": [],
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "Mounts": [
                {
                    "Type": "volume",
                    "Source": "ddev-global-cache",
                    "Target": "/mnt/ddev-global-cache",
                    "VolumeOptions": {}
                }
            ],
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/be4693922f9df310453380381d4e8e52561a88ffa4354dc3048cf90fb18576db-init/diff:/var/lib/docker/overlay2/0732e7471247c370e0f4fc0152ec8a23f9aba09f24102c62d56da5363187f7f4/diff:/var/lib/docker/overlay2/45bc8a5522b036103d11ba51c777c52663a00711bea7b34ba8cc3dfa3c7751cf/diff:/var/lib/docker/overlay2/17ccb77cf7cea56aa5da439961a01b4b6752481b552a7509beff65cf786bd731/diff:/var/lib/docker/overlay2/5ce7d30c0968aa034b011600ee03a034c7e0d0d34562dd72811cae01404bc541/diff:/var/lib/docker/overlay2/ce5d87ebb1a5f00784757aaa61d3fd6b60549ef079528cc2f8a69978b436e93c/diff:/var/lib/docker/overlay2/54d9d5d7234740183c969d802b6218c3240a1e148c543c38ed52fb4e6e7afbe0/diff:/var/lib/docker/overlay2/385b7e58b30963f176af71077044827e1a023fbcad9d2719281cc5b3c23a6b44/diff:/var/lib/docker/overlay2/a0cd07d1b279aaf7346d8943f4829e56d8809cc02e66397383b95973722a1772/diff:/var/lib/docker/overlay2/b08fda14c637d63af05f492496b60079f000206400010a19eaf9eb9c2cfe16c5/diff:/var/lib/docker/overlay2/1f0fee41220b87d2d945296cb13840abdbd9e728c4a52b7cd811e410ffacba62/diff:/var/lib/docker/overlay2/8ddf4bce9be084f816ec5b76a2c049d4fafb2084ae3198b257a32b85cf24a61c/diff:/var/lib/docker/overlay2/e594a2dff17fad317eeaee1d9930518027e26e85851a9784e0c523f044b48a47/diff:/var/lib/docker/overlay2/bd380b4f36cc7cce1fe65354d86e4f999089bdad91e7ada52a2da7c8e10512c2/diff:/var/lib/docker/overlay2/fedbcb1a6c90afdc7bfe82b0e04e937273a77600cf749830379ff517e6db7dcd/diff:/var/lib/docker/overlay2/fa30bfac08be210872400ec1f23e209d3364be3c6f83ec8a0fa321773d1961e0/diff:/var/lib/docker/overlay2/5079ee5fe5d4ec64e2badd0ce566c869fe042a0fb3eec082764947786fd6e078/diff:/var/lib/docker/overlay2/b1c6770224d20ce17a1ee2fb806924f436ef8c93b36d190dd6883e3b0b95c953/diff",
                "MergedDir": "/var/lib/docker/overlay2/be4693922f9df310453380381d4e8e52561a88ffa4354dc3048cf90fb18576db/merged",
                "UpperDir": "/var/lib/docker/overlay2/be4693922f9df310453380381d4e8e52561a88ffa4354dc3048cf90fb18576db/diff",
                "WorkDir": "/var/lib/docker/overlay2/be4693922f9df310453380381d4e8e52561a88ffa4354dc3048cf90fb18576db/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/run/docker.sock",
                "Destination": "/tmp/docker.sock",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "volume",
                "Name": "ddev-global-cache",
                "Source": "/var/lib/docker/volumes/ddev-global-cache/_data",
                "Destination": "/mnt/ddev-global-cache",
                "Driver": "local",
                "Mode": "z",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "20e8242d350c",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": true,
            "AttachStderr": true,
            "ExposedPorts": {
                "443/tcp": {},
                "80/tcp": {},
                "8025/tcp": {},
                "8026/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "DISABLE_HTTP2=false",
                "TRAEFIK_MONITOR_PORT=10999",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "NGINX_VERSION=1.20.1",
                "NJS_VERSION=0.5.3",
                "PKG_RELEASE=1~buster",
                "MKCERT_VERSION=v1.4.6",
                "DEBIAN_FRONTEND=noninteractive",
                "DOCKER_GEN_VERSION=0.7.7",
                "DOCKER_HOST=unix:///tmp/docker.sock"
            ],
            "Cmd": [
                "forego",
                "start",
                "-r"
            ],
            "Healthcheck": {
                "Test": [
                    "CMD-SHELL",
                    "/app/healthcheck.sh"
                ],
                "Interval": 1000000000,
                "Timeout": 120000000000,
                "StartPeriod": 120000000000,
                "Retries": 120
            },
            "Image": "ddev/ddev-nginx-proxy-router:v1.22.4",
            "Volumes": null,
            "WorkingDir": "/app",
            "Entrypoint": [
                "/app/docker-entrypoint.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "build-info": "ddev/ddev-nginx-proxy-router:v1.22.4 commit=1f8f1b0 built Wed Oct 25 16:50:31 UTC 2023 by runner on fv-az395-589",
                "com.docker.compose.config-hash": "e93ccf9f802016deb68e5c4c032d34d3f881d7ba78cc43f957314f2fa8b78160",
                "com.docker.compose.container-number": "1",
                "com.docker.compose.depends_on": "",
                "com.docker.compose.image": "sha256:c53031efe59a92c70a107e9476af8e8655ac48735d18113e946d94f78bf56992",
                "com.docker.compose.oneoff": "False",
                "com.docker.compose.project": "ddev-router",
                "com.docker.compose.project.config_files": "/Users/rfay/.ddev/.router-compose-full.yaml",
                "com.docker.compose.project.working_dir": "/Users/rfay/.ddev",
                "com.docker.compose.service": "ddev-router",
                "com.docker.compose.version": "2.23.0",
                "maintainer": "DDEV <randy@randyfay.com>"
            },
            "StopSignal": "SIGQUIT"
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "f862e5be3b5345398d7edb3a5238a5f8878e2e549603c4dc6c332397717b2d1b",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/f862e5be3b53",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "ddev_default": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "ddev-router",
                        "ddev-router",
                        "20e8242d350c"
                    ],
                    "NetworkID": "f42c88d70c5ca2aac3b1a41cbb1605f2e271d5595e89e2b757970834584866c5",
                    "EndpointID": "",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "",
                    "DriverOpts": null
                }
            }
        }
    }
]

@rfay
Copy link
Contributor Author

rfay commented Nov 15, 2023

Here's a script that demonstrates the problem, at least with qemu (mac m1).

https://gist.github.com/rfay/60a1e8d9112d178f6a0e86df027926aa

@rfay
Copy link
Contributor Author

rfay commented Nov 16, 2023

I note that the failure here hasn't been observed on any other docker provider. Not previous Colima, Orbstack, Docker Desktop (mac or Windows), docker-ce, etc. I only have speculation about what it could be. I looked in the moby/moby queue to see if there as anything there that might have been fixed after docker 24.0.5, Not finding it. And we used 24.0.5 plenty when it was current.

@graham73may
Copy link

graham73may commented Nov 17, 2023

Also experiencing this on intel based Mac OS Sonoma.

colima version 0.6.2
git commit: 22d7e5fbc86d5b8e3b27065a762800bc7960a0ff

runtime: docker
arch: x86_64
client: v20.10.17
server: v24.0.5
 ITEM             VALUE
 DDEV version     v1.22.4
 architecture     amd64
 db               ddev/ddev-dbserver-mariadb-10.4:v1.22.4
 ddev-ssh-agent   ddev/ddev-ssh-agent:v1.22.4
 docker           24.0.5
 docker-compose   v2.21.0
 docker-platform  colima
 mutagen          0.17.2
 os               darwin
 router           ddev/ddev-traefik-router:v1.22.4
 web              ddev/ddev-webserver:v1.22.4
API error (500): Could not kill running container b9de7a544eb5c1610229036bd16f7d83f0ed7e41320213ea958debbb8d20d4df, cannot remove - container b9de7a544eb5 PID 47202 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes

Containers not removed upon error, running a second time and does remove them

image

@rfay
Copy link
Contributor Author

rfay commented Nov 18, 2023

I updated the script above to capture the pids before the failure point.

Partial output:

rfay       13991   13948  0 00:40 ?        00:00:00 sleep 59
root       13992       1  0 00:40 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 488912cc3368791ecac063c85952c69d546118d9bf152d96abff33bc4083d4c5 -address /run/containerd/containerd.sock
rfay       14012   13992  4 00:40 ?        00:00:00 traefik traefik --configFile=/mnt/ddev-global-cache/traefik/static_config.yaml
rfay       14086   13992  0 00:40 ?        00:00:00 /bin/bash /healthcheck.sh
rfay       14096   14086  0 00:40 ?        00:00:00 sleep 59
rfay       14147    2056  0 00:40 ?        00:00:00 /bin/bash --login
rfay       14151   14147  0 00:40 ?        00:00:00 ps -ef
 Container ddev-junk-db  Stopped
 Container ddev-junk-web  Stopped
 Container ddev-junk-web  Stopped
 Container ddev-junk-db  Stopped
 Container ddev-junk-db  Removed
 Container ddev-junk-web  Removed
 Network ddev-junk_default  Removed
Failed to stop project junk:
API error (500): Could not kill running container 488912cc3368791ecac063c85952c69d546118d9bf152d96abff33bc4083d4c5, cannot remove - container 488912cc3368 PID 14012 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes

We see that the pid it's complaining about is 14012, which is traefik, inside the ddev-router container (which is the one that is failing). It does have a parent pid, 13992, which is containerd-shim-runc. AFAICT it's not a zombie then.

And I see that several other processes at this time (in other containers, which end up being stopped successfully) are set up exactly the same:

root        9745       1  0 00:39 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 2727d5ec02c5670520cd6bacd2a4048c070e840690bcd1b1853a915824a6188f -address /run/containerd/containerd.sock
rfay        9773    9745  0 00:39 ?        00:00:00 /bin/bash /pre-start.sh
root        9774    1304  0 00:39 ?        00:00:00 /usr/bin/docker-proxy -proto tcp -host-ip 127.0.0.1 -host-port 32777 -container-ip 172.21.0.3 -container-port 3306
root        9801       1  0 00:39 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 7b674dc2da0d594c15bc64c7f5f31a8a20745d84a6aac265896902c15a119bd7 -address /run/containerd/containerd.sock

@rfay
Copy link
Contributor Author

rfay commented Nov 18, 2023

I don't think we would have expected otherwise, but v0.6.3 doesn't change the behavior and the script is easily able to demonstrate the problem.

@rfay
Copy link
Contributor Author

rfay commented Nov 18, 2023

I note that the Colima is using the Ubuntu 23.10 docker packages instead of the ones from the Docker repository, https://docs.docker.com/engine/install/ubuntu/ - I wonder if that could make any difference?

@abiosoft
Copy link
Owner

I don't think we would have expected otherwise, but v0.6.3 doesn't change the behavior and the script is easily able to demonstrate the problem.

I had the script running for about an hour and could not see the error :( But I've encountered it few times in other places without being able to pinpoint or reproduce it.

I note that the Colima is using the Ubuntu 23.10 docker packages instead of the ones from the Docker repository, https://docs.docker.com/engine/install/ubuntu/ - I wonder if that could make any difference?

I would give it a try and see.

@rfay
Copy link
Contributor Author

rfay commented Nov 18, 2023

I did try installing docker from their repo, but on colima restart everything falls apart as it tries to reinstall docker.io.

@abiosoft
Copy link
Owner

abiosoft commented Nov 18, 2023

I can confirm that the official docker packages behaves better.
It was able to resolve #863 and I think would resolve this issue as well.

I would push out a fix asap.

Thanks for the suggestion @rfay.

@abiosoft
Copy link
Owner

Can you try with the current development version?

brew install --HEAD colima

@rfay
Copy link
Contributor Author

rfay commented Nov 18, 2023

I ran colima HEAD through 110 iterations of the breakit.sh script without failure. It was a new colima profile, which makes the test perhaps a little questionable, but it sure looked good.

Running full DDEV test suite now in ddev/ddev#5549, that wasn't able to succeed in v0.6.*, so fingers crossed now!

@abiosoft
Copy link
Owner

Looks like the test suite passed 🎉

@rfay
Copy link
Contributor Author

rfay commented Nov 18, 2023

Yay, full test pass, the full DDEV test suite, using colima HEAD, and it hadn't passed since v0.6.0. Thanks! I'm a bit baffled why the normal Ubuntu build would have trouble, but docker-ce keeps up better and is pretty well maintained. https://github.com/ddev/ddev/actions/runs/6914246048?pr=5549

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants