-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
static ip address is already allocated, but that container is already deleted #21194
Comments
Seems this is a regression: |
Is /tmp on a tmpfs and thus empty after the reboot? |
/tmp is mounted under LVM partition: Path is set correctly to /run/user/... |
Your workaround mentions |
As I understand after performing |
|
The ipam location should be |
A friendly reminder that this issue had no activity for 30 days. |
Got the same pb, Ip are never removed from ipam.db nor from aardvark-dns when container or network are stopped/rm/prune. Versions $ podman --version
podman version 4.9.3
$ podman info --format "{{.Host.NetworkBackendInfo.Path}}"
/usr/lib/podman/netavark
$ podman info --format "{{.Host.NetworkBackendInfo.Version}}"
netavark 1.10.3 Reproducer # Create networks
podman network create --label io.podman.test=test_multinetworks --driver bridge --subnet '10.11.11.0/24' test_multi_net11
podman network create --label io.podman.test=test_multinetworks --driver bridge --subnet '10.22.22.0/24' test_multi_net22
# Run busybox in each network
podman run --name=test_run11 --rm -d --label io.podman.test=test_multinetworks --net test_multi_net11 busybox sh -c 'while : ; do printf -- "test_run11: %s\n" "$(ip r s)"; sleep 1; done'
podman run --name=test_run22 --rm -d --label io.podman.test=test_multinetworks --net test_multi_net22 busybox sh -c 'while : ; do printf -- "test_run22: %s\n" "$(ip r s)"; sleep 1; done'
# Get ipadress (always increment)
podman inspect test_run11 --format '{{range .NetworkSettings.Networks}}{{.NetworkID}} {{.IPAddress }}{{end}}'
podman inspect test_run22 --format '{{range .NetworkSettings.Networks}}{{.NetworkID}} {{.IPAddress }}{{end}}'
# Run busybox in 2 network using static ip address.
podman run --name=test_run --rm -d --label io.podman.test=test_multinetworks --net test_multi_net11:ip=10.11.11.11 --net test_multi_net22:ip=10.22.22.22 --requires=test_run11,test_run22 busybox sh -c 'while : ; do printf -- "test_run: %s\n" "$(ip r s)"; sleep 1; done' It fails with same error as #15708 Checks # Remove all containers and networks
podman stop test_run11 test_run22
podman network rm test_multi_net11 test_multi_net22
# Dns configuration still there
ls -1 /run/user/1000/containers/networks/aardvark-dns/test_*
/run/user/1000/containers/networks/aardvark-dns/test_multi_net11
/run/user/1000/containers/networks/aardvark-dns/test_multi_net22
# Show dns entries
cat /run/user/1000/containers/networks/aardvark-dns/test_multi_net11
10.11.11.1
9e530d00dde7af14cd63da57c05103de1b8a065e11c8c4022e49f231332e28c0 10.11.11.2 test_run11,9e530d00dde7
0d18bdf86dfc3ff50f52d671edad50b3fb555b89294dd6c98d3f369e9c60c400 10.11.11.3 test_run11,0d18bdf86dfc
daed369acb93e9c5b224689d3b96718e766e6aee3b6b9b1ea4f64c1cfe46423c 10.11.11.4 test_run11,daed369acb93
940b91fd5912d46fdb629b6c0343df4aeefcc3472209dd6168c870fd1c82d3f3 10.11.11.5 test_run11,940b91fd5912
...
# all entries are in ipam
boltbrowser -ro /run/user/1000/containers/networks/ipam.db
... I's like if |
@fccagou I don't have much time right now to look into this, if you only run the command listed there then I don't see how it would fail. I tried it real quick but it seems to work fine for me? |
Thx @Luap99 You are right, everything is ok with this reproducer... I'm new using podman, interactions between all components is not clear in my mind. I keep looking ! |
Got this test reproducing the problem. # Create the network
podman network --log-level debug create --driver bridge --subnet 10.11.11.0/24 mynet
# Create the container
podman create --log-level debug --name=mytest --network mynet:ip=10.11.11.10 docker.io/library/busybox sleep 10
# Start the container : first time is ok
podman start --log-level debug mytest
# Re start the container fails
podman start --log-level debug mytest
...
many debug logs
...
Error: unable to start container "ef9691b9fbbacf0caf385b5c4bbbe9ead8fa7c8bf945cf88e7a20efe71e7ff65": IPAM error: requested ip address 10.11.11.10 is already allocated to container ID ef9691b9fbbacf0caf385b5c4bbbe9ead8fa7c8bf945cf88e7a20efe71e7ff65 time="2024-02-25T17:54:39+01:00" level=debug msg="Shutting down engines"
Here is the script wrote to produce full logs #!/usr/bin/bash
set -o errexit
#set -o pipefail
set -o nounset
mybanner () {
printf -- "\n\n================================================================================================================\n"
}
mybanner
podman network --log-level debug create --driver bridge --subnet 10.11.11.0/24 mynet
mybanner
podman create --log-level debug --name=mytest --network mynet:ip=10.11.11.10 docker.io/library/busybox sleep 10
mybanner
podman start --log-level debug mytest
sleep 15
mybanner
podman start --log-level debug mytest
# clean all
podman rm mytest
podman network rm mynet
kill $(pidof aardvark-dns)
rm -rf /run/user/"$(id -u)"/containers/networks/* |
hum ... Ok, I can't start a container with static ip twice, I have to recreate it. Is it my lack of knowledge (and I'm out of topic there) or is this a bug ? |
That should work, this looks like the podman container cleanup process is not working for you, check journal for any logs. |
It's ok with stop, netvark "teardrops" : simple-reproducer-with-inspect-stop.log #!/usr/bin/bash
set -o errexit
#set -o pipefail
set -o nounset
mybanner () {
printf -- "\n\n================================================================================================================\n"
}
mybanner
podman network --log-level debug create --driver bridge --subnet 10.11.11.0/24 mynet
mybanner
podman create --log-level debug --name=mytest --network mynet:ip=10.11.11.10 docker.io/library/busybox sleep 10
mybanner
podman start --log-level debug mytest
mybanner
podman inspect mytest
mybanner
sleep 2
mybanner
podman stop --log-level debug mytest
mybanner
podman start --log-level debug mytest || /bin/true
mybanner
sleep 15
# mybanner
# podman rm mytest
#
# mybanner
# podman create --log-level debug --name=mytest --network mynet:ip=10.11.11.10 docker.io/library/busybox sleep 10
mybanner
podman rm mytest
podman network rm mynet
kill $(pidof aardvark-dns)
rm -rf /run/user/"$(id -u)"/containers/networks/* |
Please check your journal logs, podman container cleanup is not working at all for you it seems. |
Got errors in journal that looks interesting Full logs ...
févr. 26 19:14:55 l1nux podman[1203637]: 2024-02-26 19:14:55.155650342 +0100 CET m=+0.052614522 container died 97e5b67b8ff3ea386d6cb2ced12b539069d1edd1d063197478c3768d21aa2c2f (image=docker.io/library/busybox:lat
est, name=mytest)
févr. 26 19:14:55 l1nux /usr/bin/podman[1203637]: time="2024-02-26T19:14:55+01:00" level=debug msg="Cleaning up container 97e5b67b8ff3ea386d6cb2ced12b539069d1edd1d063197478c3768d21aa2c2f"
févr. 26 19:14:55 l1nux /usr/bin/podman[1203637]: time="2024-02-26T19:14:55+01:00" level=debug msg="Tearing down network namespace at /run/user/1000/netns/netns-c6ed13b8-9615-bac9-d576-130ce0370e3e for container
97e5b67b8ff3ea386d6cb2ced12b539069d1edd1d063197478c3768d21aa2c2f"
févr. 26 19:14:55 l1nux /usr/bin/podman[1203637]: time="2024-02-26T19:14:55+01:00" level=error msg="Unable to clean up network for container 97e5b67b8ff3ea386d6cb2ced12b539069d1edd1d063197478c3768d21aa2c2f: \"get
ting rootless network namespace: failed to Statfs \\\"/run/user/1000/netns/rootless-netns-8efa473fa70e9f64eba2\\\": no such file or directory\""
... |
For every next tests, the same errors occurs checking unknown file Exiting one is $ ls -al /run/user/1000/netns/rootless-*
trw------- 1 fccagou fccagou 0 26 févr. 19:03 rootless-netns-7571ed7d42a2d3c62564 journalctl --since 19:14:44 | grep rootless-netns-8efa473fa70e9f64eba2
févr. 26 19:14:55 l1nux /usr/bin/podman[1203637]: time="2024-02-26T19:14:55+01:00" level=error msg="Unable to clean up network for container 97e5b67b8ff3ea386d6cb2ced12b539069d1edd1d063197478c3768d21aa2c2f: \"getting rootless network namespace: failed to Statfs \\\"/run/user/1000/netns/rootless-netns-8efa473fa70e9f64eba2\\\": no such file or directory\""
févr. 26 19:29:57 l1nux /usr/bin/podman[1206528]: time="2024-02-26T19:29:57+01:00" level=error msg="Unable to clean up network for container 026f57916f70f908dafed111e439e49003b227cb02db61be907f70cc1930582e: \"getting rootless network namespace: failed to Statfs \\\"/run/user/1000/netns/rootless-netns-8efa473fa70e9f64eba2\\\": no such file or directory\""
févr. 26 19:32:27 l1nux /usr/bin/podman[1207214]: time="2024-02-26T19:32:27+01:00" level=error msg="Unable to clean up network for container 35fa09c7f38809ef50d7e0461a18e85502a6e7693b1d1a01d073493021ad8673: \"getting rootless network namespace: failed to Statfs \\\"/run/user/1000/netns/rootless-netns-8efa473fa70e9f64eba2\\\": no such file or directory\""
févr. 26 19:37:12 l1nux /usr/bin/podman[1208364]: time="2024-02-26T19:37:12+01:00" level=error msg="Unable to clean up network for container 397b6bb4de4f0f8fb33e13a8a41c7906e707b0f8ca8ad5733181435bc2c19c16: \"getting rootless network namespace: failed to Statfs \\\"/run/user/1000/netns/rootless-netns-8efa473fa70e9f64eba2\\\": no such file or directory\""
|
After reboot, the error still occurs looking at file rootless-netns- 8efa473fa70e9f64eba2 |
Got it in 358- hash := sha256.Sum256([]byte(r.config.Engine.StaticDir))
359: netnsName := fmt.Sprintf("%s-%x", rootlessNetNsName, hash[:10]) Ok, one step beyond to try to leave the Madness ...
After some code reading, I think the pb appears because my storage directory is a link $ (id="$(echo -n /home/fccagou/.local/share/containers/storage/libpod | sha256sum )"; echo ${id:0:20})
8efa473fa70e9f64eba2 $ (id="$(echo -n /srv/data/fccagou/containers/storage/libpod | sha256sum )"; echo ${id:0:20})
7571ed7d42a2d3c62564 I don't now the source of the error now but the value of Too much inception ... |
yes this seems to be the problem for you I don't think however that is the same problem @khain0 is seeing. |
As described in containers#21194, even if i'ts not the root cause of the issue, an error occurs when ~/.local/share/icontainers/storage is a link. As it said in the containers#21194 comments, the rootless-ns code is rewritten in podman 5 so this patch will not be added in main project. I keep it to fix my usage. The function GetRoolessNetNs is called twice, in setUpNetwork and in teardownNetworkBackend. The function use config.Engine.StaticDir to calculate a hash to create the filename for setting NetNs. In setUpNetwork, the value of config.Engine.StaticDir is the realPath when the value is the link name in teardownNetworkBackend. In teardownNetworkBackend, the netns file can't be found, the network is not removed. When a container is restarted or created with static ip, it fails with existing ip error, The patch replace the hash by a constant string.
Ok. I made a patch for my personal use but no pull-request, it's available there Thanks for the time |
As described in containers#21194, even if i'ts not the root cause of the issue, an error occurs when ~/.local/share/icontainers/storage is a link. As it said in the containers#21194 comments, the rootless-ns code is rewritten in podman 5 so this patch will not be added in main project. I keep it to fix my usage. The function GetRoolessNetNs is called twice, in setUpNetwork and in teardownNetworkBackend. The function use config.Engine.StaticDir to calculate a hash to create the filename for setting NetNs. In setUpNetwork, the value of config.Engine.StaticDir is the realPath when the value is the link name in teardownNetworkBackend. In teardownNetworkBackend, the netns file can't be found, the network is not removed. When a container is restarted or created with static ip, it fails with existing ip error, The patch replace the hash by a constant string.
I assume this is fixed in 5.0, however the runroot/tmpdir for podman still must be on tmpfs so that the ipam db is deleted on reboot. |
Issue Description
After rebooting a server a rootless container is spawned by systemd service and failed with information: IP address is already allocated.
Steps to reproduce the issue
Steps to reproduce the issue
0. Prepare user which wants to run container by systemd service
a) Enable linger for the user
loginctl enable-linger user_name
b) Delegate user ids range for child namespaces
Add below line to files:
vi -o /etc/subuid /etc/subgid
user_name:362144:65536
Create an nginx container:
/usr/bin/podman run
--rm
--replace
-d
--name nginx
-h nginx
-p 443:443
-p 80:80
--network=wind:ip=10.89.0.12
-v /home/windserviceuser/nginx/wind:/var/www/wind:Z
-v /home/windserviceuser/nginx/conf:/etc/nginx/conf.d:Z
-v /home/windserviceuser/nginx/certs:/etc/ssl/certs:Z nginx
Generate systemd files:
podman generate systemd --name windserver --new --files
Create folder and move it to user's home directory
mkdir -p ~/.config/systemd/user/
mv /home/windserviceuser/container-windserver.service ~/.config/systemd/user/
ssh to server with user you want to run container (switch user will not work, you need to establish ssh session as user who wants to run container by systemd service)
Start systemd service
systemctl --user enable --now container-windserver.service
Reboot the server
Describe the results you received
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f35da1d2c2ae docker.io/library/nginx:latest nginx -g daemon o... Created 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp nginx
podman start nginx
Error: IPAM error: requested ip address 10.89.0.12 is already allocated to container ID 1e967bae12d2cf693e9e0ed3fa3593c694db671f994444a2d8bf6d0b942cee13
Describe the results you expected
After server is rebooted the container should be already started by systemd user service.
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Cannot use latest upstream release on RHEL production server.
Additional information
Issue occurs randomly. Not every reboot the IP address is already allocated.
Workaround after issue is encountered:
rm /tmp/containers-user-1041/containers/networks/ipam.db
rm /tmp/containers-user-1041/containers/networks/ipam.db
ps aux |grep aardvark
kill -15 173122
/usr/libexec/podman/aardvark-dns --config /tmp/containers-user-1041/containers/networks/aardvark-dns -p 53 run
systemctl --user restart container-windserver.service
The text was updated successfully, but these errors were encountered: