Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fedora 31: "podman start" hangs #4621

Closed
MrErr opened this issue Dec 3, 2019 · 9 comments · Fixed by #4624
Closed

Fedora 31: "podman start" hangs #4621

MrErr opened this issue Dec 3, 2019 · 9 comments · Fixed by #4624
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@MrErr
Copy link

MrErr commented Dec 3, 2019

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Steps to reproduce the issue:

  1. podman create
    --name squid-ip-pub
    --ip 10.88.13.24
    --dns 10.88.13.4
    --expose 3128
    --hostname squid-ip-pub
    --mount type=volume,src=squid-ip-config,dst=/etc/config
    --mount type=volume,src=squid-ip-pub-config,dst=/etc/squid
    --restart always
    vimagick/squid

  2. podman --log-level=debug start --attach squid-ip-pub

Describe the results you received:
The output from the above two commands is

6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844
DEBU[0000] using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Using volume path /var/lib/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] cached value indicated that overlay is supported
DEBU[0000] cached value indicated that metacopy is being used
DEBU[0000] cached value indicated that native-diff is not being used
WARN[0000] Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled
DEBU[0000] backingFs=extfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=true
DEBU[0000] Initializing event backend journald
DEBU[0000] using runtime "/usr/bin/runc"
DEBU[0000] using runtime "/usr/bin/crun"
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
INFO[0000] Found CNI network podnet1 (type=bridge) at /etc/cni/net.d/podnet1.conflist
DEBU[0000] Not attaching to stdin
DEBU[0000] Made network namespace at /var/run/netns/cni-12272194-73cc-47c2-0dbd-3a4872fbe6d0 for container 6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844
INFO[0000] Got pod network &{Name:squid-ip-pub Namespace:squid-ip-pub ID:6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844 NetNS:/var/run/netns/cni-12272194-73cc-47c2-0dbd-3a4872fbe6d0 Networks:[podman] RuntimeConfig:map[podman:{IP:10.88.13.24 PortMappings:[] Bandwidth: IpRanges:[]}]}
INFO[0000] About to add CNI network cni-loopback (type=loopback)
INFO[0000] Got pod network &{Name:squid-ip-pub Namespace:squid-ip-pub ID:6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844 NetNS:/var/run/netns/cni-12272194-73cc-47c2-0dbd-3a4872fbe6d0 Networks:[podman] RuntimeConfig:map[podman:{IP:10.88.13.24 PortMappings:[] Bandwidth: IpRanges:[]}]}
INFO[0000] About to add CNI network podman (type=bridge)
DEBU[0000] overlay: mount_data=nodev,metacopy=on,lowerdir=/var/lib/containers/storage/overlay/l/JGO6DESSVVX5NJZUVCVQVLGGEW:/var/lib/containers/storage/overlay/l/57Q4RCONHFROQV5V4GHVEONIRH,upperdir=/var/lib/containers/storage/overlay/6f9470335241b21b46e0de3704e469f9e388aea997935d3671987d80fe340aef/diff,workdir=/var/lib/containers/storage/overlay/6f9470335241b21b46e0de3704e469f9e388aea997935d3671987d80fe340aef/work
DEBU[0000] mounted container "6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844" at "/var/lib/containers/storage/overlay/6f9470335241b21b46e0de3704e469f9e388aea997935d3671987d80fe340aef/merged"
DEBU[0000] [0] CNI result: Interfaces:[{Name:cni-podman0 Mac:0a:69:bc:8e:a1:08 Sandbox:} {Name:veth3ed1af4b Mac:ae:a8:ca:a6:cb:41 Sandbox:} {Name:eth0 Mac:06:6e:a2:1c:82:3f Sandbox:/var/run/netns/cni-12272194-73cc-47c2-0dbd-3a4872fbe6d0}], IP:[{Version:4 Interface:0xc000619ac8 Address:{IP:10.88.13.24 Mask:ffff0000} Gateway:10.88.0.1}], Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:}], DNS:{Nameservers:[] Domain: Search:[] Options:[]}

Describe the results you expected:
I expected the container to start running and not hang.

Additional information you deem important (e.g. issue happens only occasionally):
This happens reliably.

Output of podman version
Version: 1.6.2
RemoteAPI Version: 1
Go Version: go1.13.1
OS/Arch: linux/amd64
sion`:

Output of podman info --debug:
debug:
compiler: gc
git commit: ""
go version: go1.13.1
podman version: 1.6.2
host:
BuildahVersion: 1.11.3
CgroupVersion: v1
Conmon:
package: conmon-2.0.2-1.fc31.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
Distribution:
distribution: fedora
version: "31"
MemFree: 18905640960
MemTotal: 33428758528
OCIRuntime:
name: crun
package: crun-0.10.6-1.fc31.x86_64
path: /usr/bin/crun
version: |-
crun version 0.10.6
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
SwapFree: 8589930496
SwapTotal: 8589930496
arch: amd64
cpus: 4
eventlogger: journald
hostname: xroads
kernel: 5.3.12-300.fc31.x86_64
os: linux
rootless: false
uptime: 2h 53m 51.6s (Approximately 0.08 days)
registries:
blocked: null
insecure: null
search:

  • docker.io
  • registry.fedoraproject.org
  • registry.access.redhat.com
  • registry.centos.org
  • quay.io
    store:
    ConfigFile: /etc/containers/storage.conf
    ContainerStore:
    number: 45
    GraphDriverName: overlay
    GraphOptions:
    overlay.mountopt: nodev,metacopy=on
    GraphRoot: /var/lib/containers/storage
    GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
    ImageStore:
    number: 86
    RunRoot: /var/run/containers/storage
    VolumePath: /var/lib/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):
podman-1.6.2-2.fc31.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):
This is fedora 31 machine.

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 3, 2019
@rhatdan
Copy link
Member

rhatdan commented Dec 3, 2019

@mheon @giuseppe Any ideas?

@mheon
Copy link
Member

mheon commented Dec 3, 2019

Running with or without root?

@mheon
Copy link
Member

mheon commented Dec 3, 2019

Can't repro as root, Fedora 30, v1.6.2.

Looking at the code, it looks like prepare() is running successfully - the container is mounted, the network namespace is created - but init() is not successful (we'd see a "Created container %s in OCI runtime" message if it was). This is probably a hang creating the container in conmon. Probably a crun issue, then?

@mheon
Copy link
Member

mheon commented Dec 3, 2019

Interestingly, we're not printing the args we're calling Conmon with... We might not be getting there?

@mheon
Copy link
Member

mheon commented Dec 3, 2019

Aha, I think we're stuck somewhere in mounting...

Can you try a podman system renumber?

@mheon
Copy link
Member

mheon commented Dec 3, 2019

(For reference, I think we may be running into a deadlock between the container lock and the volume lock - if the volume and container were created in a pre-1.6.x version, the volume was not originally assigned a lock number, so is now defaulting to 0)

mheon added a commit to mheon/libpod that referenced this issue Dec 3, 2019
After a restart, pods and containers both run a refresh()
function to prepare to run after a reboot. Until now, volumes
have not had a similar function, because they had no per-boot
setup to perform.

Unfortunately, this was not noticed when in-memory locking was
introduced to volumes. The refresh() routine is, among other
things, responsible for ensuring that locks are reserved after a
reboot, ensuring they cannot be taken by a freshly-created
container, pod, or volume. If this reservation is not done, we
can end up with two objects using the same lock, potentially
needing to lock each other for some operations - classic recipe
for deadlocks.

Add a refresh() function to volumes to perform lock reservation
and ensure it is called as part of overall refresh().

Fixes containers#4605
Fixes containers#4621

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@mheon
Copy link
Member

mheon commented Dec 3, 2019

Probably caused by a reboot. #4605 has same root cause. Fix in #4624

@MrErr
Copy link
Author

MrErr commented Dec 3, 2019

Running with or without root?

Running with root.

@MrErr
Copy link
Author

MrErr commented Dec 3, 2019

Aha, I think we're stuck somewhere in mounting...

Can you try a podman system renumber?

[root@xroads irlanet]# podman system renumber
Error: Error shutting down container storage: A layer is mounted: layer is in use by a container

mheon added a commit to mheon/libpod that referenced this issue Dec 10, 2019
After a restart, pods and containers both run a refresh()
function to prepare to run after a reboot. Until now, volumes
have not had a similar function, because they had no per-boot
setup to perform.

Unfortunately, this was not noticed when in-memory locking was
introduced to volumes. The refresh() routine is, among other
things, responsible for ensuring that locks are reserved after a
reboot, ensuring they cannot be taken by a freshly-created
container, pod, or volume. If this reservation is not done, we
can end up with two objects using the same lock, potentially
needing to lock each other for some operations - classic recipe
for deadlocks.

Add a refresh() function to volumes to perform lock reservation
and ensure it is called as part of overall refresh().

Fixes containers#4605
Fixes containers#4621

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants