Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crc start issues after forceful shutdown #325

Closed
cfergeau opened this issue Jul 18, 2019 · 12 comments
Closed

crc start issues after forceful shutdown #325

cfergeau opened this issue Jul 18, 2019 · 12 comments

Comments

@cfergeau
Copy link
Contributor

I've seen this on hyperkit, when the VM is forcefully shutdown (for example if one interrupts the 3 minutes wait for the cluster to be up), then next start fails with

command : sudo podman run  --ip 10.88.0.8 --name dnsmasq -v /var/srv/dnsmasq.conf:/etc/dnsmasq.conf -p 53:53/udp --privileged -d quay.io/crcont/dnsmasq:latest
err     : exit status 125
output  : must provide image ID and image name to use an image: invalid argument

The dnsmasq local image is indeed in an odd state

$ sudo podman image inspect quay.io/crcont/dnsmasq:latest
error parsing image data "851bb0e5bf751cba2d649612a47651890a86eafe629308e1b3273c16b71b047e": readlink /var/lib/containers/storage/overlay/l/5BCUDYQEZFQFL56HPBCM376SLM: no such file or directory

The podman version on our image is old:

$ podman version
Version:       1.0.2-dev
Go Version:    go1.11.5
OS/Arch:       linux/amd64

This may or may not be related to containers/podman#3345 (comment)

@gbraad
Copy link
Contributor

gbraad commented Jul 18, 2019

@giuseppe any thoughts on this? We use an RHCOS image

@giuseppe
Copy link

@gbraad that version of podman is really old and a lot of changes/improvements went into. The issue you've linked is related to rootless containers, even if the error message looks similar.

Looks like the storage is corrupted (in this case missing symlinks), because of the forced shutdown. I'd suggest to remove that image and re-pull it again.

@cfergeau
Copy link
Contributor Author

@giuseppe this is the version of podman which is shipped on the RH coreos image openshift uses (and this is also the default version of podman in rhel8.0).
Removing the image indeed helps, but this is happening close to 100% of the time when the VM is forcefully shutdown (ie unclean shutdown). Is this expected, or is this something that newer podman version are likely to be more robust against?

@gbraad
Copy link
Contributor

gbraad commented Jul 21, 2019

Alternatively we have to stop the container BEFORE doing a stop of the VM, but that sounds like a workaround to an issue that can happen outside of CRC.

@gbraad
Copy link
Contributor

gbraad commented Jul 21, 2019

repulling the image is not an option, as we need to ensure we can start from a disconnected state (no guarantee that we can pull an image from a remote reqistry on the internet, like quay). Alternatively we can export the image and place the archive inside the VM, so we can always re-import it. But again, this sounds like a workaround for an issue with the podman version delivered with RHCOS(?).

@ashcrow Are newer versions of podman considered or available for use with RHCOS?

@ashcrow
Copy link

ashcrow commented Jul 22, 2019

@gbraad yes, newer versions are pulled in when requested. @lsm5 works with us when a new version is required to be pulled in.

@lsm5
Copy link

lsm5 commented Jul 22, 2019

soon, it'll be @jnovy

@cfergeau
Copy link
Contributor Author

I can still reproduce with an image based off OpenShift 4.1.14 (podman-1.0.2-1.dev.git96ccc2e.el8.x86_64). @ashcrow, @jnovy, any plans to update podman to a newer version?

@jnovy
Copy link

jnovy commented Sep 19, 2019

@cfergeau @ashcrow @lsm5 I will provide you with an internal scratch build of podman-1.4.2-5.el8, if that looks good I will push to have it included in rhaos-4.1. Sounds like a plan?

@ashcrow
Copy link

ashcrow commented Sep 19, 2019

Works for me! Thanks @jnovy.

@cfergeau
Copy link
Contributor Author

Tested a 4.2.0-rc.2 image, and could not reproduce the issue, so it's probably fixed there by the upgrade to a newer podman version.

@cfergeau
Copy link
Contributor Author

Closing this, we can reopen if the issue reoccurs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants