Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman segfaults while podman ps --sync #3411

Closed
ikke-t opened this issue Jun 23, 2019 · 9 comments · Fixed by #3412
Closed

podman segfaults while podman ps --sync #3411

ikke-t opened this issue Jun 23, 2019 · 9 comments · Fixed by #3412
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@ikke-t
Copy link

ikke-t commented Jun 23, 2019

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I'm again at the state where podman can't start container due conflicting ghost container name #3329 . I tried to get rid of ghost image by upgrading from podman-1.4.2-1.fc29 to podman-1.4.3-0.30.dev.git7c4e444.fc31, but that version segfaults while doing podman ps --sync

Steps to reproduce the issue:

  1. By some weird way, end up having "ghost containers"
$ sudo /usr/bin/podman run --name nodered \                                                                        
>   --rm -p 1880:1880/tcp -v "/var/lib/containers/exported_volumes/node-red:/data:Z" --hostname=nodered.ikenet --memory=512M -e FL
OWS=flows_nodered.ikenet.json \                                                                                                   
>   nodered/node-red-docker                                                                                                       
Error: error creating container storage: the container name "nodered" is already in use by "e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2". You have to remove that container to be able to reuse that name.: that name is already in use        
[ikke@ohuska ~]$ sudo podman rm nodered                                                                                           
Error: no container with name or ID nodered found: no such container                                                              
[ikke@ohuska ~]$ sudo podman rm -f nodered                                                                                        
Error: no container with name or ID nodered found: no such container                                                              
[ikke@ohuska ~]$ sudo podman rm e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2                                  
Error: no container with name or ID e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2 found: no such container 
  1. updated to latest podman from Koji: https://koji.fedoraproject.org/koji/buildinfo?buildID=1293952

Describe the results you received:

Segfault. See attached rapsa.txt for segfault backtrace.
rapsa.txt

Describe the results you expected:

sync the pod list to be able to delete ghost pod.

Additional information you deem important (e.g. issue happens only occasionally):

Continously

Output of podman version:

Version:       1.0.2-dev
Go Version:    go1.11.5
OS/Arch:       linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.11.5
  podman version: 1.0.2-dev
host:
  BuildahVersion: 1.6-dev
  Conmon:
    package: podman-1.0.0-2.git921f98f.module+el8+2785+ff8a053f.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.14.0-dev, commit: be8255a19cda8a598d76dfa49e16e337769d4528-dirty'
  Distribution:
    distribution: '"rhel"'
    version: "8.0"
  MemFree: 7093895168
  MemTotal: 16281849856
  OCIRuntime:
    package: runc-1.0.0-55.rc5.dev.git2abd837.module+el8.0.0+3049+59fd2bba.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.0'
  SwapFree: 8292134912
  SwapTotal: 8292134912
  arch: amd64
  cpus: 8
  hostname: gr8.localdomain
  kernel: 4.18.0-80.4.2.el8_0.x86_64
  os: linux
  rootless: true
  uptime: 101h 52m 44.26s (Approximately 4.21 days)
insecure registries:
  registries: []
registries:
  registries:
  - registry.redhat.io
  - quay.io
  - docker.io
store:
  ConfigFile: /home/ikke/.config/containers/storage.conf
  ContainerStore:
    number: 0
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mount_program=/usr/bin/fuse-overlayfs
  GraphRoot: /home/ikke/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
  ImageStore:
    number: 0
  RunRoot: /run/user/1017

Additional environment details (AWS, VirtualBox, physical, etc.):

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 23, 2019
@mheon
Copy link
Member

mheon commented Jun 23, 2019

It seems like the deference of ociRuntime in exitFilePath() being the cause, but I'm unclear as to how we got into a situation where that's nil. If we don't have a valid OCI runtime we should be bailing much earlier. I'm going to lay initial blame at the feet of my multiruntime changes but I'm still not clear how it happened.

Your podman info doesn't really match what you've described - is this RHEL8 plus an F31 Podman package?

@ikke-t
Copy link
Author

ikke-t commented Jun 23, 2019

this is fedora 29 with latest podman from kodi

@mheon
Copy link
Member

mheon commented Jun 23, 2019

Ack. Can you cause the segfault any other way? IE, does ps without --sync still segfault? My once-over suggests that it probably should.

@mheon
Copy link
Member

mheon commented Jun 23, 2019

Also, for reference, we're adding new functionality to deal with these 'ghost containers' (containers existing only in storage, not Podman) - podman rm now has a --storage flag to force removal of such containers, and I'm working on adding support to ps to list container IDs/names that exist but aren't known by Podman.

@ikke-t
Copy link
Author

ikke-t commented Jun 23, 2019

I found one possible problem initiator. For some reason NFS share had stopped serving, I had to restart NFS on server. Some containers point directly there. Perhaps it initiated problems. However, this pod does not use nfs. Also, while I do ps --sync now that NFS is up, it still segfaults.

Answering your question, "podman ps" without sync does not crash.

@ikke-t
Copy link
Author

ikke-t commented Jun 23, 2019

Great, that --storage managed to remove the ghost!

$ sudo podman rm --storage e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2
e7a08b5bf6d78528d2f87eb7741914ee7c293c335a7cc9ba543a10ff5b00bbc2

@mheon
Copy link
Member

mheon commented Jun 23, 2019

Confirmed that the segfault reproduces locally, and only happens with --sync

@mheon mheon self-assigned this Jun 23, 2019
mheon added a commit to mheon/libpod that referenced this issue Jun 23, 2019
We weren't properly populating the container's OCI Runtime in
Batch(), causing segfaults on attempting to access it. Add a test
to make sure we actually catch cases like this in the future.

Fixes containers#3411

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@mheon
Copy link
Member

mheon commented Jun 23, 2019

#3412 to fix

@ikke-t
Copy link
Author

ikke-t commented Jun 23, 2019

Wau, you are fast! I also got my containers back up in meanwhile, thanks for quick fix. I wonder if there would be a way to have some --purge or similar option to get rid of all the ghosts at one go, triggered by some failure. But that's not related to this segfault, so it's another discussion.

mheon added a commit to mheon/libpod that referenced this issue Jun 24, 2019
We weren't properly populating the container's OCI Runtime in
Batch(), causing segfaults on attempting to access it. Add a test
to make sure we actually catch cases like this in the future.

Fixes containers#3411

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
QazerLab pushed a commit to QazerLab/libpod that referenced this issue Jun 27, 2019
We weren't properly populating the container's OCI Runtime in
Batch(), causing segfaults on attempting to access it. Add a test
to make sure we actually catch cases like this in the future.

Fixes containers#3411

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants