-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rootless podman won't start when exposing ports #2942
Comments
@giuseppe PTAL - looks like slirp might be the culprit here? |
could you try slirp4netns 0.3.0 final? |
I just tried with a rebuilt package for 0.3.0 final and it does not make a difference. |
I've tried on a fresh centos 7 Digital Ocean droplet and it works fine for me. I have manually installed slirp4netns, runc and podman to the latest git version.
|
Thanks for testing, @giuseppe ! Part of your first line ( So if you add the following to the Vagrantfile:
And switch between 4 or 1 vCPU it works (mutliple CPUs) or hangs (only 1 vCPU). So there still seem to be an issue, though it only appear when things are executed on a single CPU. I would say this might still be an issue as on a busy system things might get stalled as well, as not everything gets executed in the right order. |
Ok I did a test script: $ cat testing_script.sh
#!/bin/bash
echo "Iteration / cpu count / reported cpu count / result"
for cpu_count in 1 2 3 4; do
for i in `seq 1 10`; do
CPU_COUNT=$cpu_count vagrant up > /dev/null
echo -n "${i} / ${cpu_count} / "
vagrant ssh -c 'echo -n "$(grep processor /proc/cpuinfo | wc -l) / "; sudo systemctl start podman-test.service; sleep 10; curl -s 127.0.0.1:8080 -o /dev/null && echo success || echo failed' 2>&1 | grep -v Shared
vagrant halt > /dev/null
done
done This got me the following output:
With my very small test sample I would conclude that it only works reliable with 4 vCPUs. And actually it's 4 processes that are communicating with each other: podman, conmon, runc & slirp4netns |
Plain slirp4netns without Podman works? |
How would I test that? But I would assume so, as slirp4netns is started, but not configured to listen on port 8080, which imho happens over the control socket, wich is what imho podman is waiting for, but never gets it with only one cpu. |
So this works on a machine with one vCPU. Also note, that rootless containers without exposing a port, works also on hosts with only one vCPU. The problem begins as soon as I am with < 4 vCPUs AND try to expose a port. |
I wonder if the version of go you are using can make any difference. We are using some go routines, but nothing that should block if there are not enough cores |
EPEL comes with 1.11.5, which is what is being used when rebuilding the podman package |
there is a case that I could finally reproduce, I've opened a PR: #3162 |
enable polling also when using inotify. It is generally useful to have it as under high load inotify can lose notifications. It also solves a race condition where the file is created while the watcher is configured and it'd wait until the timeout and fail. Closes: containers#2942 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
/kind bug
Description
Trying to run a rootless podman container exposing ports on CentOS 7, this works fine on Fedora 29, though with the same versions it doesn't work on CentOS 7.
When trying to start a rootless container, it even blocks any other kind of execution of podman as this user.
I am not fully sure, whether this is actually a bug in podman, runc, slirp4netns . Though I assumed opening it here is the best place to get started.
For what I am trying to achieve (running rootless containers through podman on CentOS 7 in the end managed by systemd) I backported the current versions of podman, runc and slirp4netns that I have on Fedora 29 to CentOS 7 (rebuild of packages). On Fedora 29 what I try to do works fine.
Additionally, I have the new shadow utils from @vbatts from https://copr.fedorainfracloud.org/coprs/vbatts/shadow-utils-newxidmap/
These are the packages:
When I try to do the following things just hang:
The container is never started and podman just hangs there.
When trying to investigate with podman, it just hangs as well:
Investigating further shows that it hangs waiting on a futex:
You can also observer this through ps (https://access.redhat.com/solutions/237383):
Or on the podman process itself:
runc seems to wait on slirp4netns:
While slirp4netns seems to poll the interface:
Starting the container without exposing the port works fine:
What is missing to have port binding in rootless mode on EL7?
To more easily reproduce the environment one can use the following Vagrantfile:
The text was updated successfully, but these errors were encountered: