podman segfaults while podman ps --sync #3411

ikke-t · 2019-06-23T20:09:00Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I'm again at the state where podman can't start container due conflicting ghost container name #3329 . I tried to get rid of ghost image by upgrading from podman-1.4.2-1.fc29 to podman-1.4.3-0.30.dev.git7c4e444.fc31, but that version segfaults while doing podman ps --sync

Steps to reproduce the issue:

By some weird way, end up having "ghost containers"

$ sudo /usr/bin/podman run --name nodered \                                                                        
>   --rm -p 1880:1880/tcp -v "/var/lib/containers/exported_volumes/node-red:/data:Z" --hostname=nodered.ikenet --memory=512M -e FL
OWS=flows_nodered.ikenet.json \                                                                                                   
>   nodered/node-red-docker                                                                                                       
Error: error creating container storage: the container name "nodered" is already in use by "e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2". You have to remove that container to be able to reuse that name.: that name is already in use        
[ikke@ohuska ~]$ sudo podman rm nodered                                                                                           
Error: no container with name or ID nodered found: no such container                                                              
[ikke@ohuska ~]$ sudo podman rm -f nodered                                                                                        
Error: no container with name or ID nodered found: no such container                                                              
[ikke@ohuska ~]$ sudo podman rm e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2                                  
Error: no container with name or ID e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2 found: no such container

updated to latest podman from Koji: https://koji.fedoraproject.org/koji/buildinfo?buildID=1293952

Describe the results you received:

Segfault. See attached rapsa.txt for segfault backtrace.
rapsa.txt

Describe the results you expected:

sync the pod list to be able to delete ghost pod.

Additional information you deem important (e.g. issue happens only occasionally):

Continously

Output of podman version:

Version:       1.0.2-dev
Go Version:    go1.11.5
OS/Arch:       linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.11.5
  podman version: 1.0.2-dev
host:
  BuildahVersion: 1.6-dev
  Conmon:
    package: podman-1.0.0-2.git921f98f.module+el8+2785+ff8a053f.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.14.0-dev, commit: be8255a19cda8a598d76dfa49e16e337769d4528-dirty'
  Distribution:
    distribution: '"rhel"'
    version: "8.0"
  MemFree: 7093895168
  MemTotal: 16281849856
  OCIRuntime:
    package: runc-1.0.0-55.rc5.dev.git2abd837.module+el8.0.0+3049+59fd2bba.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.0'
  SwapFree: 8292134912
  SwapTotal: 8292134912
  arch: amd64
  cpus: 8
  hostname: gr8.localdomain
  kernel: 4.18.0-80.4.2.el8_0.x86_64
  os: linux
  rootless: true
  uptime: 101h 52m 44.26s (Approximately 4.21 days)
insecure registries:
  registries: []
registries:
  registries:
  - registry.redhat.io
  - quay.io
  - docker.io
store:
  ConfigFile: /home/ikke/.config/containers/storage.conf
  ContainerStore:
    number: 0
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mount_program=/usr/bin/fuse-overlayfs
  GraphRoot: /home/ikke/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
  ImageStore:
    number: 0
  RunRoot: /run/user/1017

Additional environment details (AWS, VirtualBox, physical, etc.):

The text was updated successfully, but these errors were encountered:

mheon · 2019-06-23T20:42:03Z

It seems like the deference of ociRuntime in exitFilePath() being the cause, but I'm unclear as to how we got into a situation where that's nil. If we don't have a valid OCI runtime we should be bailing much earlier. I'm going to lay initial blame at the feet of my multiruntime changes but I'm still not clear how it happened.

Your podman info doesn't really match what you've described - is this RHEL8 plus an F31 Podman package?

ikke-t · 2019-06-23T20:44:50Z

this is fedora 29 with latest podman from kodi

mheon · 2019-06-23T20:45:38Z

Ack. Can you cause the segfault any other way? IE, does ps without --sync still segfault? My once-over suggests that it probably should.

mheon · 2019-06-23T20:47:04Z

Also, for reference, we're adding new functionality to deal with these 'ghost containers' (containers existing only in storage, not Podman) - podman rm now has a --storage flag to force removal of such containers, and I'm working on adding support to ps to list container IDs/names that exist but aren't known by Podman.

ikke-t · 2019-06-23T20:47:46Z

I found one possible problem initiator. For some reason NFS share had stopped serving, I had to restart NFS on server. Some containers point directly there. Perhaps it initiated problems. However, this pod does not use nfs. Also, while I do ps --sync now that NFS is up, it still segfaults.

Answering your question, "podman ps" without sync does not crash.

ikke-t · 2019-06-23T20:50:02Z

Great, that --storage managed to remove the ghost!

$ sudo podman rm --storage e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2
e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2

mheon · 2019-06-23T20:53:33Z

Confirmed that the segfault reproduces locally, and only happens with --sync

We weren't properly populating the container's OCI Runtime in Batch(), causing segfaults on attempting to access it. Add a test to make sure we actually catch cases like this in the future. Fixes containers#3411 Signed-off-by: Matthew Heon <matthew.heon@pm.me>

mheon · 2019-06-23T21:04:54Z

#3412 to fix

ikke-t · 2019-06-23T21:09:15Z

Wau, you are fast! I also got my containers back up in meanwhile, thanks for quick fix. I wonder if there would be a way to have some --purge or similar option to get rid of all the ghosts at one go, triggered by some failure. But that's not related to this segfault, so it's another discussion.

We weren't properly populating the container's OCI Runtime in Batch(), causing segfaults on attempting to access it. Add a test to make sure we actually catch cases like this in the future. Fixes containers#3411 Signed-off-by: Matthew Heon <matthew.heon@pm.me>

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 23, 2019

mheon self-assigned this Jun 23, 2019

mheon mentioned this issue Jun 23, 2019

Fix a segfault in 'podman ps --sync' #3412

Merged

openshift-merge-robot closed this as completed in #3412 Jun 24, 2019

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman segfaults while podman ps --sync #3411

podman segfaults while podman ps --sync #3411

ikke-t commented Jun 23, 2019 •

edited

Loading

mheon commented Jun 23, 2019

ikke-t commented Jun 23, 2019

mheon commented Jun 23, 2019

mheon commented Jun 23, 2019

ikke-t commented Jun 23, 2019

ikke-t commented Jun 23, 2019

mheon commented Jun 23, 2019

mheon commented Jun 23, 2019

ikke-t commented Jun 23, 2019

podman segfaults while podman ps --sync #3411

podman segfaults while podman ps --sync #3411

Comments

ikke-t commented Jun 23, 2019 • edited Loading

mheon commented Jun 23, 2019

ikke-t commented Jun 23, 2019

mheon commented Jun 23, 2019

mheon commented Jun 23, 2019

ikke-t commented Jun 23, 2019

ikke-t commented Jun 23, 2019

mheon commented Jun 23, 2019

mheon commented Jun 23, 2019

ikke-t commented Jun 23, 2019

ikke-t commented Jun 23, 2019 •

edited

Loading