-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman run hang with error: container creation timeout #12339
Comments
Looks like something is going wrong with the btrfs driver. Could you switch to the overlay driver and see if this works. |
My read on this is that it looks like a conmon or crun issue - Conmon is not reporting back to Podman that the container was successfully created without our (very generous - I think it's 900 seconds?) timeout. @giuseppe Any way to get more verbose debug logs out of crun? |
my suggestion is to see what processes are still running after a couple of minutes and use Any reason for using the btrfs backend instead of overlay? |
after switching to overlayfs the hang issue still persist:
after the tiemout, I can see conmon is still running but crun has exited:
|
Are you sure you are on overlay. What does podman info show? You might need to podman system reset your storage. |
as indicated in the log:
already done this, the issue did not go away...
|
This one has me stumped. |
yeah, tried runc with overlayfs |
So both runc and crun hang? |
correct |
Perhaps kernel related, but I have no idea. |
tried 5.10.81, still no luck... |
@mheon @Luap99 @vrothberg Thoughts? |
We need more data to debug. An |
I am experiencing what sounds like identical issue (rootless only) and I was able to track down additional details that perhaps can help:
Adding PODMAN_USERNS=auto env variable, forces the relevant sections to be created by code in https://github.com/containers/podman/blob/main/libpod/container_internal_linux.go#L649. Change in behavior (podman used to work before last upgrade) seems to have originated with 221b1ad though I am not 100% sure.
The way I was able to identify figure out why runc was unhappy was by creating below script and using it --runtime parameter to "podman run" so I could see how runc was requested to be executed and run it via CLI with additional debug flags instead without conmon in the middle. `#!/bin/bash LOGFILE="/tmp/runc.log-$$-$(date +%Y%m%dT%H%M%S)" |
I think I might be having this issue too, here's an strace -F: Since this only happens w/ rootless I figured it's a permissions issue, and the strace seems to point to either |
@giuseppe Thoughts? |
could you show me the output of Also can you attach strace to each conmon, podman, runc/crun process (
crun attempts to configure the cgroups from the root (rootless cannot add/rm controllers in the root cgroup), but it handles such failures |
Not very eventful traces, but here we go: Runc and conmon both have very short traces. |
After spending most of the day debugging and learning podman, conmon and crun code here is what I observed: NOTE: I actually walk through below steps under multiple debuggers (delve for podman, gdbserver for conmon and crun) but trying to keep the explanation to key points. My software versions are, in debian testing:
The flow is as follows:
Reading code and https://github.com/opencontainers/runtime-spec/blob/main/runtime.md it seems another invocation of crun with "start" command is what is expected to let container entry point actually execute but I am unable to find code path that is supposed to trigger that command. Not sure when I will get next opportunity to debug this further, but any tips to help me investigate deeper are most welcome. Thank you |
I think @vmpn is onto something; here's crun on my system:
Note fd number 4 in the second argument there, which is exec.fifo:
If something else is meant to have that open, there's no evidence of that:
|
that is used to stop the container creation until When you are in that state, can you try running |
podman.txt (added .txt for github)
|
strace: |
Thanks for trying that out. It is not expected to work when doing it manually. It shows that crun is just waiting for |
could you attach a debugger to podman and conmon to see where they are stuck? |
Here's podman:
Attempting to attach to conmon is proving difficult, gdb keeps hitting me with this doozy:
All I can tell you so far is:
which being entry 30 in the backtrace may prove misleading. Every other entry is a bare address :F |
Sorry about that, got it to work by attaching as root:
|
I might have to distrohop away from Kionite/Silverblue because of this and an unrelated flatpak issue (both of which are pretty critical to getting any work done on these distros) - is there any other information I can give before risking being unable to reproduce? Coredumps, anything like that? |
I jumped to latest Fedora and it's still happening there, so turns out I am still able to do more debugging if needed, I just don't know anything about glib so it's hard to interpret that stacktrace. |
I can't spot anything strange in the stacktrace. It is just glibc waiting for events. Your issue though doesn't look like a race condition if you can reproduce so easily. Could you try disabling SElinux? Anything useful from |
A friendly reminder that this issue had no activity for 30 days. |
No longer having this issue on the latest Fedora packages. |
Ok I am going to close, reopen if his issue is still happening with podman 4 |
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
podman run --rm -it --network=host alpine:3.14.3
hang and exit after a while with error:Error: container creation timeout: internal libpod error
From the debug log, seems container has been successfully created with crun/runc, but podman failed to connect to the container somehow.
Additional information you deem important (e.g. issue happens only occasionally):
On Arch Linux, kernel 5.15.2, tried both rootless and root, tried both runc and crun as well
Output of
podman version
:Output of
podman info --debug
:Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes
Output of
crun --version
:Output of
runc --version
:Log from systemd:
The text was updated successfully, but these errors were encountered: