Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kpod fails to find the runtime and is not obvious about it #119

Closed
4 tasks
anarcat opened this issue Dec 11, 2017 · 5 comments
Closed
4 tasks

kpod fails to find the runtime and is not obvious about it #119

anarcat opened this issue Dec 11, 2017 · 5 comments
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@anarcat
Copy link
Contributor

anarcat commented Dec 11, 2017

So in my first attempt of bootstrapping a docker-like dev environment, I stumbled upon a weird issue that I couldn't start containers with kpod run. The symptom was as follows:

# kpod run -i --tty projectatomic/atomicapp  /bin/sh
ERRO[0000] Error removing container 4dd00aed2a714682abba1967cc4225878671ebc881f24726ae3bca55c5beacca from runc after creation failed 
error reading container (probably exited) json message: EOF

Now I must say first: not very helpful. It's a confusing error message: we are led to believe the problem is "removing container" when we're just trying to start one... (Obviously, it's a cleanup routine gone wrong, but still.) The real error message is actually buried inside conmon, and shows up in journalctl -f:

déc 11 14:04:58 curie conmon[21323]: conmon <ninfo>: addr{sun_family=AF_UNIX, sun_path=/tmp/conmon-term.3VLNAZ} 
déc 11 14:04:58 curie conmon[21323]: conmon <ninfo>: about to waitpid: 21324 
déc 11 14:04:58 curie conmon[21323]: conmon <error>: Failed to create container: exit status 127 

And well... that's still not very helpful. A few straces later, I found out that I could start containers by hand by firing up the runc command myself, so the problem wasn't there. I also tried running conmon by hand, and noticed it would return with an exit status of zero (0) even though the runc command fails, which also seems wrong.

Old UNIX geeks may have already figured out the issue at this point, thanks to the famous "exit status 127" which is the shell telling us it can't find the binary. And indeed, kpod hardcodes the runtime path in config.go:

https://github.com/projectatomic/libpod/blob/92818fdfb79a142566abb2876c67b23b8944b836/libkpod/config.go#L279

In containers/buildah#355 I filed an issue about how buildah doesn't correctly install runc (while building it). Since the build system installs everything in /usr/local, i fixed the Ubuntu/Debian install instructions to install runc there as well (in containers/buildah#354) which means that runc is in /usr/local/bin/runc. I would also argue that /usr/sbin/runc would be a better default for that command which is (currently) obviously designed to be ran as root.

But the exact path to the runtime shouldn't matter: conmon should look into the PATH and not rely on an absolute path to find the runtime. I have tried passing the runtime to kpod with --runtime=/usr/local/bin/runc and that works!

# kpod --runtime=/usr/local/bin/runc run  -i --tty projectatomic/atomicapp /bin/sh
sh-4.2# 

So, summary, I would suggest the following:

  • better diagnostics when failing to start runtime (from conmon? kpod? both?)
  • proper failure code from conmon if runc fails to start
  • search for runc binary instead of absolute path (in config.go)
  • proper $PATH resolution for the runc binary (in conmon?)

I think that covers it - thanks!

@baude
Copy link
Member

baude commented Dec 11, 2017

@anarcat with --log-level=debug do you get better error messages by chance?

@anarcat
Copy link
Contributor Author

anarcat commented Dec 11, 2017

nope. here's the build log with debug:

paste_1000268.txt

@anarcat
Copy link
Contributor Author

anarcat commented Dec 11, 2017

i filed #1215 on conmon's side to fix the last step in the task list.

@rhatdan
Copy link
Member

rhatdan commented Dec 11, 2017

I opened a PR against conmon in cri-o project, where we copied this from. We should fix it there for now.
cri-o/cri-o#1216

@baude
Copy link
Member

baude commented Feb 5, 2018

#278 should fix this... reopen if needed

@baude baude closed this as completed Feb 5, 2018
baude pushed a commit to baude/podman that referenced this issue Aug 31, 2019
dhcp: clean duplicated error message
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

3 participants