Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: podman stop results in panic: runtime error: invalid memory address or nil pointer dereference #17069

Closed
jukefr opened this issue Jan 11, 2023 · 13 comments · Fixed by #17077
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@jukefr
Copy link

jukefr commented Jan 11, 2023

Issue Description

Trying to use traefik with podman (more specifically podman-compose) not sure what specifically is the issue or if it is even related to traefik.

Here is what the compose file looks like spoiler
version: "3.3"

services:

  # IMPORTANT
  # Run commands with keep-id to make volume permissions correct and all truly rootless
  # podman-compose --podman-run-args="--userns=keep-id" [...]
  #
  # Forward traffic to right port with
  # iptables -A PREROUTING -t nat -p tcp --dport 80 -j REDIRECT --to-port 1024
  # iptables -A OUTPUT -t nat -p tcp --dport 80 -j REDIRECT --to-port 1024

  ###########################################################################
  # PROXY
  ###########################################################################

  traefik:
    user: "1000:1001"
    image: "docker.io/library/traefik"
    labels:
      - "io.containers.autoupdate=registry"
    restart: always
    command:
      #- "--log.level=DEBUG"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:1024" # HTTP
      - "--entrypoints.ssh.address=:1025" # GIT SSH
    ports:
      - "1024:1024"
      - "1025:1025"
      - "1026:8080"
    volumes:
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
      - "/run/user/1000/podman/podman.sock:/var/run/docker.sock:ro"

    # NOTE
    # Sometimes when shutting down the rootlessport process will hang for some reason
    # sudo lsof -i -P -n | grep $port
    # sudo kill $process_number

  # whoami:
  #   user: "1000:1001"
  #   image: "docker.io/traefik/whoami"
  #   labels:
  #     - "io.containers.autoupdate=registry"
  #     - "traefik.enable=true"
  #     - "traefik.http.routers.whoami.rule=Host(`whoami.localhost`)"
  #     - "traefik.http.routers.whoami.entrypoints=web"
  #     - "traefik.http.services.whoami-juke.loadbalancer.server.port=1024"
  #   command:
  #     - "--port=1024"
  #   restart: always

  ###########################################################################
  # NEXTCLOUD
  ###########################################################################
  # user
  # password
  # database
  # cloud.localhost

  nextcloud_database:
    user: "1000:1001"
    image: "docker.io/library/postgres:alpine"
    labels:
      - "io.containers.autoupdate=registry"
    restart: always
    volumes:
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
      - ./resources/postgres_alpine_passwd:/etc/passwd:ro
      - ./volumes/nextcloud_database:/var/lib/postgresql/data:Z
    environment:
      - POSTGRES_DB=database
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password

  nextcloud_server:
    user: "1000:1001"
    depends_on: 
      - traefik
      - nextcloud_database
    image: "docker.io/library/nextcloud"
    labels:
      - "io.containers.autoupdate=registry"
      - "traefik.enable=true"
      - "traefik.http.routers.nextcloud_server.rule=Host(`cloud.localhost`)"
      - "traefik.http.routers.nextcloud_server.entrypoints=web"
      - "traefik.http.services.nextcloud_server-juke.loadbalancer.server.port=1024"
    restart: always
    volumes:
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
      - ./resources/nextcloud_server_passwd:/etc/passwd:ro
      - ./resources/nextcloud_server_ports.conf:/etc/apache2/ports.conf:ro
      - ./volumes/nextcloud_server:/var/www/html:Z
    hostname: cloud.localhost
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=database
      - POSTGRES_USER=user
      - POSTGRES_HOST=nextcloud_database
      - NEXTCLOUD_TRUSTED_DOMAINS=cloud.localhost
    [...]

Everything seems to work fine when I

podman-compose --podman-run-args="--userns=keep-id" up -d

However when I

podman-compose --podman-run-args="--userns=keep-id" down -v

I get the following error

['podman', '--version', '']
using podman version: 4.3.1
** excluding:  set()
podman stop -t 10 juke_uptime_kuma_server_1
juke_uptime_kuma_server_1
exit code: 0
podman stop -t 10 juke_gitea_server_1
juke_gitea_server_1
exit code: 0
podman stop -t 10 juke_nextcloud_server_1
juke_nextcloud_server_1
exit code: 0
podman stop -t 10 juke_element_server_1
juke_element_server_1
exit code: 0
podman stop -t 10 juke_gitea_database_1
juke_gitea_database_1
exit code: 0
podman stop -t 10 juke_nextcloud_database_1
juke_nextcloud_database_1
exit code: 0
podman stop -t 10 juke_traefik_1
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5564a2c4495b]

goroutine 50 [running]:
os.(*File).Name(...)
	os/file.go:56
github.com/containers/podman/v4/pkg/errorhandling.CloseQuiet(0xc0002bc9a0?)
	github.com/containers/podman/v4/pkg/errorhandling/errorhandling.go:74 +0x5b
github.com/containers/podman/v4/libpod.(*Runtime).setupRootlessPortMappingViaRLK(0xc00027e380, 0xc0003d3b00, {0xc00060f700, 0x3f}, 0xc000604f01?)
	github.com/containers/podman/v4/libpod/networking_slirp4netns.go:581 +0x105d
github.com/containers/podman/v4/libpod.(*Container).setupRootlessNetwork(0xc0003d3b00)
	github.com/containers/podman/v4/libpod/container_internal_linux.go:414 +0x13c
github.com/containers/podman/v4/libpod.(*Container).handleRestartPolicy(0xc0003d3b00, {0x5564a3cc7510, 0xc0000460d0})
	github.com/containers/podman/v4/libpod/container_internal.go:296 +0x445
github.com/containers/podman/v4/libpod.(*Container).Cleanup(0xc0003d3b00, {0x5564a3cc7510, 0xc0000460d0})
	github.com/containers/podman/v4/libpod/container_api.go:726 +0x3dd
github.com/containers/podman/v4/pkg/domain/infra/abi.(*ContainerEngine).ContainerStop.func1(0xc0003d3b00)
	github.com/containers/podman/v4/pkg/domain/infra/abi/containers.go:248 +0x24e
github.com/containers/podman/v4/pkg/parallel/ctr.ContainerOp.func1()
	github.com/containers/podman/v4/pkg/parallel/ctr/ctr.go:28 +0x22
github.com/containers/podman/v4/pkg/parallel.Enqueue.func1()
	github.com/containers/podman/v4/pkg/parallel/parallel.go:67 +0x1ac
created by github.com/containers/podman/v4/pkg/parallel.Enqueue
	github.com/containers/podman/v4/pkg/parallel/parallel.go:56 +0xbe
exit code: 2
podman rm juke_uptime_kuma_server_1
juke_uptime_kuma_server_1
exit code: 0
podman rm juke_gitea_server_1
juke_gitea_server_1
exit code: 0
podman rm juke_nextcloud_server_1
juke_nextcloud_server_1
exit code: 0
podman rm juke_element_server_1
juke_element_server_1
exit code: 0
podman rm juke_gitea_database_1
juke_gitea_database_1
exit code: 0
podman rm juke_nextcloud_database_1
juke_nextcloud_database_1
exit code: 0
podman rm juke_traefik_1
ERRO[0000] Unable to clean up network for container 56582b9b79c1eaf581d92fde3ae67d1e02e2ab8b894cd83c708f6c0820ad4bfc: "unmounting network namespace for container 56582b9b79c1eaf581d92fde3ae67d1e02e2ab8b894cd83c708f6c0820ad4bfc: failed to remove ns path /run/user/1000/netns/netns-8528016b-18be-bf00-27e5-7c4f38f8fe90: remove /run/user/1000/netns/netns-8528016b-18be-bf00-27e5-7c4f38f8fe90: device or resource busy" 
juke_traefik_1
exit code: 0
['podman', 'volume', 'inspect', '--all']

Sometimes it will look like this however:

[user@demovm juke]$ podman-compose --podman-run-args="--userns=keep-id" down -v
['podman', '--version', '']
using podman version: 4.3.1
** excluding:  set()
podman stop -t 10 juke_uptime_kuma_server_1
juke_uptime_kuma_server_1
exit code: 0
podman stop -t 10 juke_gitea_server_1
juke_gitea_server_1
exit code: 0
podman stop -t 10 juke_nextcloud_server_1
juke_nextcloud_server_1
exit code: 0
podman stop -t 10 juke_gitea_database_1
juke_gitea_database_1
exit code: 0
podman stop -t 10 juke_nextcloud_database_1
juke_nextcloud_database_1
exit code: 0
podman stop -t 10 juke_traefik_1
ERRO[0001] Unable to clean up network for container 9520a833f7e2230bfa4eafe785d0320efb80f593bdeaf8e62ab78424dbd7e8b3: "unmounting network namespace for container 9520a833f7e2230bfa4eafe785d0320efb80f593bdeaf8e62ab78424dbd7e8b3: failed to remove ns path /run/user/1000/netns/netns-ac16b2fe-694e-4255-bbc6-5810f553715c: remove /run/user/1000/netns/netns-ac16b2fe-694e-4255-bbc6-5810f553715c: device or resource busy" 
juke_traefik_1
exit code: 0
podman rm juke_uptime_kuma_server_1
juke_uptime_kuma_server_1
exit code: 0
podman rm juke_gitea_server_1
juke_gitea_server_1
exit code: 0
podman rm juke_nextcloud_server_1
juke_nextcloud_server_1
exit code: 0
podman rm juke_gitea_database_1
juke_gitea_database_1
exit code: 0
podman rm juke_nextcloud_database_1
juke_nextcloud_database_1
exit code: 0
podman rm juke_traefik_1
juke_traefik_1
exit code: 0
['podman', 'volume', 'inspect', '--all']

At this point if I lsof I see a process that I can kill

 ~  sudo lsof -i -P -n | grep 1024
rootlessp 36533  user   10u  IPv6 137699      0t0  TCP *:1024 (LISTEN)
 ~  sudo kill 36533

But doing so still apparently leaves the system thinking that IP addresses are allocated when they shouldnt be because trying to spin up the services again with

podman-compose --podman-run-args="--userns=keep-id" up -d

results in the following

['podman', '--version', '']
using podman version: 4.3.1
** excluding:  set()
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_traefik_1 -d --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/Documents/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=traefik -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /run/user/1000/podman/podman.sock:/var/run/docker.sock:ro --net juke_default --network-alias traefik -p 1024:1024 -p 1025:1025 -p 1026:8080 -u 1000:1001 --restart always docker.io/library/traefik --api.insecure=true --providers.docker=true --providers.docker.exposedbydefault=false --entrypoints.web.address=:1024 --entrypoints.ssh.address=:1025
9e257843121f1294bfbeec12548cf198c93f8cb032f491d1aec79d7843117b7d
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_nextcloud_database_1 -d --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/Documents/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/Documents/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/Documents/juke/volumes/nextcloud_database:/var/lib/postgresql/data:Z --net juke_default --network-alias nextcloud_database -u 1000:1001 --restart always docker.io/library/postgres:alpine
f0b6ec0f62e2d492a404ffc90f5ce0ff5a31aaa7fb2e139e79caf4399f177e0d
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_gitea_database_1 -d --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/Documents/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/Documents/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/Documents/juke/volumes/gitea_database:/var/lib/postgresql/data:Z --net juke_default --network-alias gitea_database -u 1000:1001 --restart always docker.io/library/postgres:alpine
72354b4dcf75071a511f109e98ea954954e83622a9881a14a6aa29224caf0810
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_element_server_1 -d --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.element_server.rule=Host(`chat.localhost`) --label traefik.http.routers.element_server.entrypoints=web --label traefik.http.services.element_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/Documents/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=element_server -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/Documents/juke/resources/element_server_passwd:/etc/passwd:ro -v /home/user/Documents/juke/resources/element_server_nginx.conf:/etc/nginx/conf.d/default.conf:ro -v /home/user/Documents/juke/resources/element_server_config.json:/app/config.json:ro --net juke_default --network-alias element_server -u 1000:1001 --restart always docker.io/vectorim/element-web
7e074a3a9ae23003fdfa6642a48e05507585f6b1fcf45f3e92c91e543ced7e45
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_nextcloud_server_1 -d --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.nextcloud_server.rule=Host(`cloud.localhost`) --label traefik.http.routers.nextcloud_server.entrypoints=web --label traefik.http.services.nextcloud_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/Documents/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_server -e POSTGRES_PASSWORD=password -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_HOST=nextcloud_database -e NEXTCLOUD_TRUSTED_DOMAINS=cloud.localhost -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/Documents/juke/resources/nextcloud_server_passwd:/etc/passwd:ro -v /home/user/Documents/juke/resources/nextcloud_server_ports.conf:/etc/apache2/ports.conf:ro -v /home/user/Documents/juke/volumes/nextcloud_server:/var/www/html:Z --net juke_default --network-alias nextcloud_server -u 1000:1001 --hostname cloud.localhost --restart always docker.io/library/nextcloud
687ad8936a6aa6427bcdccd1d02c05acb2a0733229b564ebadcbe1cb0ee0bc5f
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_gitea_server_1 -d --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.gitea_server.rule=Host(`code.localhost`) --label traefik.http.routers.gitea_server.entrypoints=web --label traefik.http.services.gitea_server-juke.loadbalancer.server.port=1024 --label traefik.tcp.routers.gitea_server_ssh.rule=HostSNI(`*`) --label traefik.tcp.routers.gitea_server_ssh.entrypoints=ssh --label traefik.tcp.services.girea_server_ssh-juke.loadbalancer.server.port=1025 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/Documents/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_server -e HTTP_PORT=1024 -e DEFAULT_BRANCH=main -e RUN_MODE=prod -e DISABLE_SSH=false -e START_SSH_SERVER=true -e SSH_PORT=1025 -e SSH_LISTEN_PORT=1025 -e ROOT_URL=http://code.localhost -e GITEA__database__DB_TYPE=postgres -e GITEA__database__HOST=gitea_database:5432 -e GITEA__database__NAME=database -e GITEA__database__USER=user -e GITEA__database__PASSWD=password -e GITEA__service__DISABLE_REGISTRATION=true -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/Documents/juke/resources/gitea_server_passwd:/etc/passwd:ro -v /home/user/Documents/juke/volumes/gitea_server:/data:Z --net juke_default --network-alias gitea_server -u 1000:1001 --restart always docker.io/gitea/gitea:latest-rootless
ERRO[0000] IPAM error: failed to get ips for container ID 49493f4c0bc56fe1bc48ab5cdc113ca5d56a5fb369835d144e0a413b041e5024 on network juke_default 
[ERROR netavark::network::bridge] failed to parse ipam options: no static ips provided
ERRO[0000] IPAM error: failed to find ip for subnet 10.89.0.0/24 on network juke_default 
ERRO[0000] Unable to clean up network for container 49493f4c0bc56fe1bc48ab5cdc113ca5d56a5fb369835d144e0a413b041e5024: "tearing down network namespace configuration for container 49493f4c0bc56fe1bc48ab5cdc113ca5d56a5fb369835d144e0a413b041e5024: netavark: failed to delete container veth eth0: Netlink error: No such device (os error 19)" 
Error: IPAM error: failed to find free IP in range: 10.89.0.1 - 10.89.0.254
exit code: 126
podman start juke_gitea_server_1
ERRO[0000] IPAM error: failed to get ips for container ID 49493f4c0bc56fe1bc48ab5cdc113ca5d56a5fb369835d144e0a413b041e5024 on network juke_default 
[ERROR netavark::network::bridge] failed to parse ipam options: no static ips provided
ERRO[0000] IPAM error: failed to find ip for subnet 10.89.0.0/24 on network juke_default 
ERRO[0000] Unable to clean up network for container 49493f4c0bc56fe1bc48ab5cdc113ca5d56a5fb369835d144e0a413b041e5024: "tearing down network namespace configuration for container 49493f4c0bc56fe1bc48ab5cdc113ca5d56a5fb369835d144e0a413b041e5024: netavark: failed to delete container veth eth0: Netlink error: No such device (os error 19)" 
Error: unable to start container "49493f4c0bc56fe1bc48ab5cdc113ca5d56a5fb369835d144e0a413b041e5024": IPAM error: failed to find free IP in range: 10.89.0.1 - 10.89.0.254
exit code: 125
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_uptime_kuma_server_1 -d --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.uptime_kuma_server.rule=Host(`status.localhost`) --label traefik.http.routers.uptime_kuma_server.entrypoints=web --label traefik.http.services.uptime_kuma_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/Documents/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=uptime_kuma_server -e PUID=1000 -e PGID=1001 -e PORT=1024 -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/Documents/juke/resources/uptime_kuma_server_passwd:/etc/passwd:ro -v /home/user/Documents/juke/volumes/uptime_kuma_server:/app/data:Z --net juke_default --network-alias uptime_kuma_server -u 1000:1001 --restart always --entrypoint ["node", "/app/server/server.js"] docker.io/louislam/uptime-kuma
ERRO[0000] IPAM error: failed to get ips for container ID 1c7c25f5e7c6816dc644b93402010e5c85bbed88d38e74b97f5f347e125a2ab6 on network juke_default 
[ERROR netavark::network::bridge] failed to parse ipam options: no static ips provided
ERRO[0000] IPAM error: failed to find ip for subnet 10.89.0.0/24 on network juke_default 
ERRO[0000] Unable to clean up network for container 1c7c25f5e7c6816dc644b93402010e5c85bbed88d38e74b97f5f347e125a2ab6: "tearing down network namespace configuration for container 1c7c25f5e7c6816dc644b93402010e5c85bbed88d38e74b97f5f347e125a2ab6: netavark: failed to delete container veth eth0: Netlink error: No such device (os error 19)" 
Error: IPAM error: failed to find free IP in range: 10.89.0.1 - 10.89.0.254
exit code: 126
podman start juke_uptime_kuma_server_1
ERRO[0000] IPAM error: failed to get ips for container ID 1c7c25f5e7c6816dc644b93402010e5c85bbed88d38e74b97f5f347e125a2ab6 on network juke_default 
[ERROR netavark::network::bridge] failed to parse ipam options: no static ips provided
ERRO[0000] IPAM error: failed to find ip for subnet 10.89.0.0/24 on network juke_default 
ERRO[0000] Unable to clean up network for container 1c7c25f5e7c6816dc644b93402010e5c85bbed88d38e74b97f5f347e125a2ab6: "tearing down network namespace configuration for container 1c7c25f5e7c6816dc644b93402010e5c85bbed88d38e74b97f5f347e125a2ab6: netavark: failed to delete container veth eth0: Netlink error: No such device (os error 19)" 
Error: unable to start container "1c7c25f5e7c6816dc644b93402010e5c85bbed88d38e74b97f5f347e125a2ab6": IPAM error: failed to find free IP in range: 10.89.0.1 - 10.89.0.254
exit code: 125

Saying that its failing to find any free IPs

Steps to reproduce the issue

Steps to reproduce the issue

    0  clean arch install inside of a VM (endeavouros in this specific case)
    1  yay -S podman podman-compose cni-plugins aardvark-dns wget
    3  systemctl --user enable --now podman
    4  wget "https://github.com/containers/podman/files/10389570/files.zip"
    5  unzip files.zip 
    6  mv files juke 
    7  cd juke
    9  ./create_dirs.sh 
   10  podman-compose --podman-run-args="--userns=keep-id" up

press CTRL-C, error happens

alteratively, start the services in daemon mode up -d and destroy them and their volumes in another step down -v, same error happens

Describe the results you received

Stopping is not clean and not leaving hung processes and IP addresses stay unavailable. The only way I found to fix it properly is to reboot the entire host.

Describe the results you expected

Not having hung processes that make it impossible to restart the pods because no more IPs are available and needing to reboot to get it to work.

podman info output

podman information output log spoiler
~  podman version
Client:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.19.3
Git Commit:   814b7b003cc630bf6ab188274706c383f9fb9915-dirty
Built:        Sun Nov 20 23:32:45 2022
OS/Arch:      linux/amd64
 ~  podman info
host:
  arch: amd64
  buildahVersion: 1.28.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.5-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: c9f7f19eb82d5b8151fc3ba7fbbccf03fdcd0325'
  cpuUtilization:
    idlePercent: 90.28
    systemPercent: 1.51
    userPercent: 8.2
  cpus: 8
  distribution:
    distribution: endeavouros
    version: unknown
  eventLogger: journald
  hostname: user-standardpcq35ich92009
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.1.4-arch1-1
  linkmode: dynamic
  logDriver: journald
  memFree: 4336590848
  memTotal: 8333340672
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.7.2-1
    path: /usr/bin/crun
    version: |-
      crun version 1.7.2
      commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.0-1
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 0
  swapTotal: 0
  uptime: 0h 20m 17.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/user/.local/share/containers/storage
  graphRootAllocated: 31523282944
  graphRootUsed: 12035514368
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 6
  runRoot: /run/user/1000/containers
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 1668983565
  BuiltTime: Sun Nov 20 23:32:45 2022
  GitCommit: 814b7b003cc630bf6ab188274706c383f9fb9915-dirty
  GoVersion: go1.19.3
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Happens both locally and inside a fresh VM.

Additional information

Ask if anything unclear
None of the info in here is sensitive and mostly placeholders for password and such not to worry

@jukefr jukefr added the kind/bug Categorizes issue or PR as related to a bug. label Jan 11, 2023
@giuseppe
Copy link
Member

when I try the same command I get:

Error: --userns and --pod cannot be set together
exit code: 125

I am using:

$ podman --version
podman version 4.3.1

@Luap99
Copy link
Member

Luap99 commented Jan 11, 2023

Just looking at the stack trace I am surprised a podman stop causes a restart of the container. The restart policy is set to always but AFAIK an explicit podman stop command always stops and does not attempt to restart the container.
cc @mheon

@jukefr
Copy link
Author

jukefr commented Jan 11, 2023

when I try the same command I get:

Error: --userns and --pod cannot be set together
exit code: 125

I am using:

$ podman --version
podman version 4.3.1

Not sure about that, maybe you have a config file somewhere overwriting arguments being used? I spinned up a fresh VM, simply installed the packages and ran the commands and was able to reproduce.

podman-compose --version
['podman', '--version', '']
using podman version: 4.3.1
podman-composer version  1.0.3
podman --version 
podman version 4.3.1
exit code: 0

If that helps

I just spinned up a new fresh VM from scratch again and this is the litteral bash history from the moment the OS is installed to reproduction of the issue :

    1  clear
    2  yay -S podman podman-compose cni-plugins aardvark-dns
    3  systemctl --user enable --now podman
    4  yay -S wget
    5  wget "https://github.com/containers/podman/files/10389570/files.zip"
    6  unzip files.zip 
    7  mv files juke 
    8  cd juke
    9  ls
   10  ./create_dirs.sh 
   11  podman-compose --podman-run-args="--userns=keep-id" up

press CTRL-C, error happens

I can provide this VM image if you want


Just looking at the stack trace I am surprised a podman stop causes a restart of the container. The restart policy is set to always but AFAIK an explicit podman stop command always stops and does not attempt to restart the container. cc @mheon

Not sure what you mean by restart, from what I am seeing in the logs its only trying to stop with a delay of 10 seconds but I'm not all that sure what I'm looking at. In a second step I am manually bringing back up the containers and the network seems in a bad state making it fail.

@jukefr
Copy link
Author

jukefr commented Jan 11, 2023

Speaking of the fresh VM i mention in my latest reply theres another variant of this issue i forgot to mention

  1. I just rebooted the VM
  2. start services goes fine
[user@demovm juke]$ podman-compose --podman-run-args="--userns=keep-id" up -d
['podman', '--version', '']
using podman version: 4.3.1
** excluding:  set()
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_traefik_1 -d --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=traefik -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /run/user/1000/podman/podman.sock:/var/run/docker.sock:ro --net juke_default --network-alias traefik -p 1024:1024 -p 1025:1025 -p 1026:8080 -u 1000:1001 --restart always docker.io/library/traefik --api.insecure=true --providers.docker=true --providers.docker.exposedbydefault=false --entrypoints.web.address=:1024 --entrypoints.ssh.address=:1025
9520a833f7e2230bfa4eafe785d0320efb80f593bdeaf8e62ab78424dbd7e8b3
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_nextcloud_database_1 -d --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/juke/volumes/nextcloud_database:/var/lib/postgresql/data:Z --net juke_default --network-alias nextcloud_database -u 1000:1001 --restart always docker.io/library/postgres:alpine
37883aa11e7700e5b38232f4dd6660d1c0133d7fca0bc2a5f4394c40ed96e924
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_gitea_database_1 -d --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/juke/volumes/gitea_database:/var/lib/postgresql/data:Z --net juke_default --network-alias gitea_database -u 1000:1001 --restart always docker.io/library/postgres:alpine
caa368d7fa8a3d5c6f806210f95c86b4efa6d893f288e23fdedc9c883350af6d
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_nextcloud_server_1 -d --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.nextcloud_server.rule=Host(`cloud.localhost`) --label traefik.http.routers.nextcloud_server.entrypoints=web --label traefik.http.services.nextcloud_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_server -e POSTGRES_PASSWORD=password -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_HOST=nextcloud_database -e NEXTCLOUD_TRUSTED_DOMAINS=cloud.localhost -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/nextcloud_server_passwd:/etc/passwd:ro -v /home/user/juke/resources/nextcloud_server_ports.conf:/etc/apache2/ports.conf:ro -v /home/user/juke/volumes/nextcloud_server:/var/www/html:Z --net juke_default --network-alias nextcloud_server -u 1000:1001 --hostname cloud.localhost --restart always docker.io/library/nextcloud
5e43479147278511fcf530f5962a20fbad4cd05ae13249d341f78a8e86a8b876
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_gitea_server_1 -d --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.gitea_server.rule=Host(`code.localhost`) --label traefik.http.routers.gitea_server.entrypoints=web --label traefik.http.services.gitea_server-juke.loadbalancer.server.port=1024 --label traefik.tcp.routers.gitea_server_ssh.rule=HostSNI(`*`) --label traefik.tcp.routers.gitea_server_ssh.entrypoints=ssh --label traefik.tcp.services.girea_server_ssh-juke.loadbalancer.server.port=1025 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_server -e HTTP_PORT=1024 -e DEFAULT_BRANCH=main -e RUN_MODE=prod -e DISABLE_SSH=false -e START_SSH_SERVER=true -e SSH_PORT=1025 -e SSH_LISTEN_PORT=1025 -e ROOT_URL=http://code.localhost -e GITEA__database__DB_TYPE=postgres -e GITEA__database__HOST=gitea_database:5432 -e GITEA__database__NAME=database -e GITEA__database__USER=user -e GITEA__database__PASSWD=password -e GITEA__service__DISABLE_REGISTRATION=true -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/gitea_server_passwd:/etc/passwd:ro -v /home/user/juke/volumes/gitea_server:/data:Z --net juke_default --network-alias gitea_server -u 1000:1001 --restart always docker.io/gitea/gitea:latest-rootless
07ccf42e939421663f64e2ecbe31598d6a884c26e8faad9a02959a4a09eaae33
exit code: 0
['podman', 'network', 'exists', 'juke_default']
podman run --userns=keep-id --name=juke_uptime_kuma_server_1 -d --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.uptime_kuma_server.rule=Host(`status.localhost`) --label traefik.http.routers.uptime_kuma_server.entrypoints=web --label traefik.http.services.uptime_kuma_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=uptime_kuma_server -e PUID=1000 -e PGID=1001 -e PORT=1024 -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/uptime_kuma_server_passwd:/etc/passwd:ro -v /home/user/juke/volumes/uptime_kuma_server:/app/data:Z --net juke_default --network-alias uptime_kuma_server -u 1000:1001 --restart always --entrypoint ["node", "/app/server/server.js"] docker.io/louislam/uptime-kuma
725f56c4f78a94c7a39547567e7e4bd87475bf018080e9c04137e1c8c98c0333
exit code: 0
  1. try to stop them
[user@demovm juke]$ podman-compose --podman-run-args="--userns=keep-id" down -v
['podman', '--version', '']
using podman version: 4.3.1
** excluding:  set()
podman stop -t 10 juke_uptime_kuma_server_1
juke_uptime_kuma_server_1
exit code: 0
podman stop -t 10 juke_gitea_server_1
juke_gitea_server_1
exit code: 0
podman stop -t 10 juke_nextcloud_server_1
juke_nextcloud_server_1
exit code: 0
podman stop -t 10 juke_gitea_database_1
juke_gitea_database_1
exit code: 0
podman stop -t 10 juke_nextcloud_database_1
juke_nextcloud_database_1
exit code: 0
podman stop -t 10 juke_traefik_1
ERRO[0001] Unable to clean up network for container 9520a833f7e2230bfa4eafe785d0320efb80f593bdeaf8e62ab78424dbd7e8b3: "unmounting network namespace for container 9520a833f7e2230bfa4eafe785d0320efb80f593bdeaf8e62ab78424dbd7e8b3: failed to remove ns path /run/user/1000/netns/netns-ac16b2fe-694e-4255-bbc6-5810f553715c: remove /run/user/1000/netns/netns-ac16b2fe-694e-4255-bbc6-5810f553715c: device or resource busy" 
juke_traefik_1
exit code: 0
podman rm juke_uptime_kuma_server_1
juke_uptime_kuma_server_1
exit code: 0
podman rm juke_gitea_server_1
juke_gitea_server_1
exit code: 0
podman rm juke_nextcloud_server_1
juke_nextcloud_server_1
exit code: 0
podman rm juke_gitea_database_1
juke_gitea_database_1
exit code: 0
podman rm juke_nextcloud_database_1
juke_nextcloud_database_1
exit code: 0
podman rm juke_traefik_1
juke_traefik_1
exit code: 0
['podman', 'volume', 'inspect', '--all']

no clue what causes one issue or the other (the one in the first post with the panic) to happen or why there seems to be two different possible outcomes

updated my original post to include the other error message

@mheon
Copy link
Member

mheon commented Jan 11, 2023

@Luap99 Yeah, this doesn't look like a restart policy restart. An explicit stop by the user (either podman stop or podman kill) disables restart policy and prevents container restart. Could be something podman-compose is doing, I suppose?

@Luap99
Copy link
Member

Luap99 commented Jan 11, 2023

@Luap99 Yeah, this doesn't look like a restart policy restart. An explicit stop by the user (either podman stop or podman kill) disables restart policy and prevents container restart. Could be something podman-compose is doing, I suppose?

No I read this as stop was called but it decided it has to restart the container due it restart policy

goroutine 50 [running]:
os.(*File).Name(...)
	os/file.go:56
github.com/containers/podman/v4/pkg/errorhandling.CloseQuiet(0xc0002bc9a0?)
	github.com/containers/podman/v4/pkg/errorhandling/errorhandling.go:74 +0x5b
github.com/containers/podman/v4/libpod.(*Runtime).setupRootlessPortMappingViaRLK(0xc00027e380, 0xc0003d3b00, {0xc00060f700, 0x3f}, 0xc000604f01?)
	github.com/containers/podman/v4/libpod/networking_slirp4netns.go:581 +0x105d
github.com/containers/podman/v4/libpod.(*Container).setupRootlessNetwork(0xc0003d3b00)
	github.com/containers/podman/v4/libpod/container_internal_linux.go:414 +0x13c
github.com/containers/podman/v4/libpod.(*Container).handleRestartPolicy(0xc0003d3b00, {0x5564a3cc7510, 0xc0000460d0})
	github.com/containers/podman/v4/libpod/container_internal.go:296 +0x445
github.com/containers/podman/v4/libpod.(*Container).Cleanup(0xc0003d3b00, {0x5564a3cc7510, 0xc0000460d0})
	github.com/containers/podman/v4/libpod/container_api.go:726 +0x3dd
github.com/containers/podman/v4/pkg/domain/infra/abi.(*ContainerEngine).ContainerStop.func1(0xc0003d3b00)
	github.com/containers/podman/v4/pkg/domain/infra/abi/containers.go:248 +0x24e
github.com/containers/podman/v4/pkg/parallel/ctr.ContainerOp.func1()
	github.com/containers/podman/v4/pkg/parallel/ctr/ctr.go:28 +0x22
github.com/containers/podman/v4/pkg/parallel.Enqueue.func1()
	github.com/containers/podman/v4/pkg/parallel/parallel.go:67 +0x1ac
created by github.com/containers/podman/v4/pkg/parallel.Enqueue
	github.com/containers/podman/v4/pkg/parallel/parallel.go:56 +0xbe

The flow is ContainerStop() -> Cleanup() -> handleRestartPolicy() and then handleRestartPolicy() tried to setup the network so it thinks it has to start the container again.

@mheon
Copy link
Member

mheon commented Jan 11, 2023

Damn, you're right. That's definitely not correct. Stop explicitly sets a bool in the database to prevent restart action from taking place.

@mheon
Copy link
Member

mheon commented Jan 11, 2023

Aha, got it, we're calling cleanup before we set StoppedByUser.

@jukefr
Copy link
Author

jukefr commented Jan 11, 2023

Okay still on the VM, after a fresh reboot again,

If I start the services with

podman-compose --podman-run-args="--userns=keep-id" up

and force a graceful shutdown with a single CTRL-C it seems to go through (sometimes it will error, sometimes it seems to go through, I have no clue why)

^CTraceback (most recent call last):
  File "/usr/bin/podman-compose", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3.10/site-packages/podman_compose.py", line 1775, in main
    podman_compose.run()
  File "/usr/lib/python3.10/site-packages/podman_compose.py", line 1024, in run
    cmd(self, args)
  File "/usr/lib/python3.10/site-packages/podman_compose.py", line 1248, in wrapped
    return func(*args, **kw)
  File "/usr/lib/python3.10/site-packages/podman_compose.py", line 1442, in compose_up
    thread.join(timeout=1.0)
  File "/usr/lib/python3.10/threading.py", line 1100, in join
    self._wait_for_tstate_lock(timeout=max(timeout, 0))
  File "/usr/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt
2023-01-11T15:06:17Z [SERVER] INFO: Shutdown requested
2023-01-11T15:06:17Z [SERVER] INFO: Called signal: SIGINT
2023-01-11T15:06:17Z [SERVER] INFO: Stopping all monitors

[user@demovm juke]$ 2023/01/11 16:06:17 ...eful/manager_unix.go:147:handleSignals() [W] [63bed03e-4] PID 2. Received SIGINT. Shutting down...
2023/01/11 16:06:17 cmd/web.go:271:listen() [I] [63bed03e-6] HTTP Listener: 0.0.0.0:1024 Closed
2023/01/11 16:06:17 ...eful/server_hooks.go:47:doShutdown() [I] [63bed03e-6] PID: 2 Listener ([::]:1024) closed.
2023/01/11 16:06:17 .../graceful/manager.go:206:doHammerTime() [W] Setting Hammer condition
2023-01-11 16:06:17.434 CET [1] LOG:  received fast shutdown request
2023-01-11 16:06:17.442 CET [1] LOG:  aborting any active transactions
2023-01-11 16:06:17.444 CET [1] LOG:  background worker "logical replication launcher" (PID 14) exited with exit code 1
2023-01-11 16:06:17.444 CET [9] LOG:  shutting down
[Wed Jan 11 16:06:17.449373 2023] [mpm_prefork:notice] [pid 1] AH00169: caught SIGTERM, shutting down
2023-01-11 16:06:17.450 CET [9] LOG:  checkpoint starting: shutdown immediate
2023-01-11 16:06:17.513 CET [9] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.015 s, sync=0.006 s, total=0.070 s; sync files=2, longest=0.004 s, average=0.003 s; distance=0 kB, estimate=0 kB
2023-01-11 16:06:17.516 CET [1] LOG:  database system is shut down
time="2023-01-11T16:06:17+01:00" level=error msg="accept tcp [::]:8080: use of closed network connection" entryPointName=traefik
time="2023-01-11T16:06:17+01:00" level=error msg="accept tcp [::]:1024: use of closed network connection" entryPointName=web
time="2023-01-11T16:06:17+01:00" level=error msg="accept tcp [::]:1025: use of closed network connection" entryPointName=ssh
time="2023-01-11T16:06:17+01:00" level=error msg="close tcp [::]:8080: use of closed network connection" entryPointName=traefik
time="2023-01-11T16:06:17+01:00" level=error msg="close tcp [::]:1025: use of closed network connection" entryPointName=ssh
time="2023-01-11T16:06:17+01:00" level=error msg="close tcp [::]:1024: use of closed network connection" entryPointName=web
2023-01-11 16:06:17.734 CET [1] LOG:  received fast shutdown request
2023-01-11 16:06:17.745 CET [1] LOG:  aborting any active transactions
2023-01-11 16:06:17.746 CET [1] LOG:  background worker "logical replication launcher" (PID 14) exited with exit code 1
2023-01-11 16:06:17.746 CET [9] LOG:  shutting down
2023-01-11 16:06:17.755 CET [9] LOG:  checkpoint starting: shutdown immediate
2023-01-11 16:06:17.830 CET [9] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.018 s, sync=0.010 s, total=0.084 s; sync files=2, longest=0.006 s, average=0.005 s; distance=0 kB, estimate=0 kB
2023-01-11 16:06:17.833 CET [1] LOG:  database system is shut down
2023/01/11 16:06:18 .../graceful/manager.go:225:doTerminate() [W] Terminating
2023/01/11 16:06:18 ...eful/manager_unix.go:158:handleSignals() [W] PID: 2. Background context for manager closed - context canceled - Shutting down...
2023/01/11 16:06:18 cmd/web.go:140:runWeb() [I] PID: 2 Gitea Web Finished
2023-01-11T15:06:19Z [DB] INFO: Closing the database
2023-01-11T15:06:21Z [DB] INFO: SQLite closed
2023-01-11T15:06:21Z [CLOUDFLARED] INFO: Stop cloudflared
2023-01-11T15:06:21Z [SERVER] INFO: Graceful shutdown successful!

However when i check with lsof

[user@demovm juke]$ sudo lsof -i -P -n | grep 1024
rootlessp 3127  user   10u  IPv6  27132      0t0  TCP *:1024 (LISTEN)

The rootlessport process seems to still be there

mheon added a commit to mheon/libpod that referenced this issue Jan 11, 2023
The StoppedByUser variable indicates that the container was
requested to stop by a user. It's used to prevent restart policy
from firing (so that a restart=always container won't restart if
the user does a `podman stop`. The problem is we were setting it
*very* late in the stop() function. Originally, this was fine,
but after the changes to add the new Stopping state, the logic
that triggered restart policy was firing before StoppedByUser was
even set - so the container would still restart.

Setting it earlier shouldn't hurt anything and guarantees that
checks will see that the container was stopped manually.

Fixes containers#17069

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@mheon
Copy link
Member

mheon commented Jan 11, 2023

#17077 should fix the incorrect firing of restart policy. Not the panic, though, but it shouldn't happen if restart policy isn't restarting the container.

@jukefr
Copy link
Author

jukefr commented Jan 11, 2023

Thank you very much for your reactivity on this!
In the meantime does that mean I can try to simply remove the restart policy on the services and see if I am still able to reproduce? Would that help?

edit: it does in fact seem like removing the restart policy from the compose from all the services does seem to make any errors go away and the port frees up whever i try in daemon or foreground with graceful shutdown, nice catch!
yeah seems completely gone without the restart policy i dont have to dev inside a vm and reboot every other save thank you 😹 ill just comment them out until the fix is merged thank you :)

mheon added a commit to mheon/libpod that referenced this issue Jan 11, 2023
The StoppedByUser variable indicates that the container was
requested to stop by a user. It's used to prevent restart policy
from firing (so that a restart=always container won't restart if
the user does a `podman stop`. The problem is we were setting it
*very* late in the stop() function. Originally, this was fine,
but after the changes to add the new Stopping state, the logic
that triggered restart policy was firing before StoppedByUser was
even set - so the container would still restart.

Setting it earlier shouldn't hurt anything and guarantees that
checks will see that the container was stopped manually.

Fixes containers#17069

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@jukefr
Copy link
Author

jukefr commented Jan 12, 2023

Okay so after trying some more things,

Running all of the commands that podman-compose runs, but manually, with a restart policy, seems to always work (as in no error)

Here are the commands i used spoiler
podman run --userns=keep-id --name=juke_traefik_1 --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=traefik -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /run/user/1000/podman/podman.sock:/var/run/docker.sock:ro --net juke_default --network-alias traefik -p 1024:1024 -p 1025:1025 -p 1026:8080 -u 1000:1001 --restart always docker.io/library/traefik --api.insecure=true --providers.docker=true --providers.docker.exposedbydefault=false --entrypoints.web.address=:1024 --entrypoints.ssh.address=:1025 &\
podman run --userns=keep-id --name=juke_nextcloud_database_1 --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/juke/volumes/nextcloud_database:/var/lib/postgresql/data:Z --net juke_default --network-alias nextcloud_database -u 1000:1001 --restart always docker.io/library/postgres:alpine &\
podman run --userns=keep-id --name=juke_gitea_database_1 --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/juke/volumes/gitea_database:/var/lib/postgresql/data:Z --net juke_default --network-alias gitea_database -u 1000:1001 --restart always docker.io/library/postgres:alpine &\
podman run --userns=keep-id --name=juke_nextcloud_server_1 --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.nextcloud_server.rule="Host(\`cloud.localhost\`)" --label traefik.http.routers.nextcloud_server.entrypoints=web --label traefik.http.services.nextcloud_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_server -e POSTGRES_PASSWORD=password -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_HOST=nextcloud_database -e NEXTCLOUD_TRUSTED_DOMAINS=cloud.localhost -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/nextcloud_server_passwd:/etc/passwd:ro -v /home/user/juke/resources/nextcloud_server_ports.conf:/etc/apache2/ports.conf:ro -v /home/user/juke/volumes/nextcloud_server:/var/www/html:Z --net juke_default --network-alias nextcloud_server -u 1000:1001 --restart always --hostname cloud.localhost docker.io/library/nextcloud &\
podman run --userns=keep-id --name=juke_gitea_server_1 --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.gitea_server.rule="Host(\`code.localhost\`)" --label traefik.http.routers.gitea_server.entrypoints=web --label traefik.http.services.gitea_server-juke.loadbalancer.server.port=1024 --label traefik.tcp.routers.gitea_server_ssh.rule="HostSNI(\`*\`)" --label traefik.tcp.routers.gitea_server_ssh.entrypoints=ssh --label traefik.tcp.services.girea_server_ssh-juke.loadbalancer.server.port=1025 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_server -e HTTP_PORT=1024 -e DEFAULT_BRANCH=main -e RUN_MODE=prod -e DISABLE_SSH=false -e START_SSH_SERVER=true -e SSH_PORT=1025 -e SSH_LISTEN_PORT=1025 -e ROOT_URL=http://code.localhost -e GITEA__database__DB_TYPE=postgres -e GITEA__database__HOST=gitea_database:5432 -e GITEA__database__NAME=database -e GITEA__database__USER=user -e GITEA__database__PASSWD=password -e GITEA__service__DISABLE_REGISTRATION=true -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/gitea_server_passwd:/etc/passwd:ro -v /home/user/juke/volumes/gitea_server:/data:Z --net juke_default --network-alias gitea_server -u 1000:1001 --restart always docker.io/gitea/gitea:latest-rootless &\
podman run --userns=keep-id --name=juke_uptime_kuma_server_1 --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.uptime_kuma_server.rule="Host(\`status.localhost\`)" --label traefik.http.routers.uptime_kuma_server.entrypoints=web --label traefik.http.services.uptime_kuma_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=uptime_kuma_server -e PUID=1000 -e PGID=1001 -e PORT=1024 -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/uptime_kuma_server_passwd:/etc/passwd:ro -v /home/user/juke/volumes/uptime_kuma_server:/app/data:Z --net juke_default --network-alias uptime_kuma_server -u 1000:1001 --restart always --entrypoint '["node", "/app/server/server.js"]' docker.io/louislam/uptime-kuma &

podman stop -t 10 juke_uptime_kuma_server_1
podman stop -t 10 juke_gitea_server_1
podman stop -t 10 juke_nextcloud_server_1
podman stop -t 10 juke_gitea_database_1
podman stop -t 10 juke_nextcloud_database_1
podman stop -t 10 juke_traefik_1

podman rm juke_uptime_kuma_server_1
podman rm juke_gitea_server_1
podman rm juke_nextcloud_server_1
podman rm juke_gitea_database_1
podman rm juke_nextcloud_database_1
podman rm juke_traefik_1

However as soon as I do a

podman-compose --podman-run-args="--userns=keep-id" up
CTRL-C
podman-compose --podman-run-args="--userns=keep-id" down -v
['podman', '--version', '']
using podman version: 4.3.1
** excluding:  set()
podman stop -t 10 juke_uptime_kuma_server_1
juke_uptime_kuma_server_1
exit code: 0
podman stop -t 10 juke_gitea_server_1
juke_gitea_server_1
exit code: 0
podman stop -t 10 juke_nextcloud_server_1
juke_nextcloud_server_1
exit code: 0
podman stop -t 10 juke_gitea_database_1
juke_gitea_database_1
exit code: 0
podman stop -t 10 juke_nextcloud_database_1
juke_nextcloud_database_1
exit code: 0
podman stop -t 10 juke_traefik_1
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x55c1faa0695b]

goroutine 13 [running]:
os.(*File).Name(...)
	os/file.go:56
github.com/containers/podman/v4/pkg/errorhandling.CloseQuiet(0xc00028a420?)
	github.com/containers/podman/v4/pkg/errorhandling/errorhandling.go:74 +0x5b
github.com/containers/podman/v4/libpod.(*Runtime).setupRootlessPortMappingViaRLK(0xc00026a540, 0xc000441200, {0xc00021df00, 0x3f}, 0xc0003e1601?)
	github.com/containers/podman/v4/libpod/networking_slirp4netns.go:581 +0x105d
github.com/containers/podman/v4/libpod.(*Container).setupRootlessNetwork(0xc000441200)
	github.com/containers/podman/v4/libpod/container_internal_linux.go:414 +0x13c
github.com/containers/podman/v4/libpod.(*Container).handleRestartPolicy(0xc000441200, {0x55c1fba89510, 0xc000130020})
	github.com/containers/podman/v4/libpod/container_internal.go:296 +0x445
github.com/containers/podman/v4/libpod.(*Container).Cleanup(0xc000441200, {0x55c1fba89510, 0xc000130020})
	github.com/containers/podman/v4/libpod/container_api.go:726 +0x3dd
github.com/containers/podman/v4/pkg/domain/infra/abi.(*ContainerEngine).ContainerStop.func1(0xc000441200)
	github.com/containers/podman/v4/pkg/domain/infra/abi/containers.go:248 +0x24e
github.com/containers/podman/v4/pkg/parallel/ctr.ContainerOp.func1()
	github.com/containers/podman/v4/pkg/parallel/ctr/ctr.go:28 +0x22
github.com/containers/podman/v4/pkg/parallel.Enqueue.func1()
	github.com/containers/podman/v4/pkg/parallel/parallel.go:67 +0x1ac
created by github.com/containers/podman/v4/pkg/parallel.Enqueue
	github.com/containers/podman/v4/pkg/parallel/parallel.go:56 +0xbe
exit code: 2

So this might be actually caused by podman-compose or I am very unlucky and it only happens when using the podman-compose commands (the error does not occur 100% of the time in the first place)

Should this maybe be moved to the podman-compose repo ?

@mheon
Copy link
Member

mheon commented Jan 12, 2023

That probably explains why I can't reproduce. Might be related to how podman-compose is calling Podman - maybe they're leaking extra file descriptors in, which is messing with rootlessport config?

mheon added a commit to mheon/libpod that referenced this issue Jan 12, 2023
The StoppedByUser variable indicates that the container was
requested to stop by a user. It's used to prevent restart policy
from firing (so that a restart=always container won't restart if
the user does a `podman stop`. The problem is we were setting it
*very* late in the stop() function. Originally, this was fine,
but after the changes to add the new Stopping state, the logic
that triggered restart policy was firing before StoppedByUser was
even set - so the container would still restart.

Setting it earlier shouldn't hurt anything and guarantees that
checks will see that the container was stopped manually.

Fixes containers#17069

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 4, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants