Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsmasq: failed to create inotify: Too many open files #11825

Closed
edsantiago opened this issue Oct 1, 2021 · 10 comments · Fixed by #12557
Closed

dnsmasq: failed to create inotify: Too many open files #11825

edsantiago opened this issue Oct 1, 2021 · 10 comments · Fixed by #12557
Assignees
Labels
flakes Flakes from Continuous Integration In Progress This issue is actively being worked by the assignee, please do not work on this at this time. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. remote Problem is in podman-remote rootless

Comments

@edsantiago
Copy link
Collaborator

[Very possibly not a podman bug]

Failure seen in f34 gating tests, rootless, remote:

not ok 253 podman network connect/disconnect with port forwarding
# (from function `die' in file ./helpers.bash, line 448,
#  from function `run_podman' in file ./helpers.bash, line 221,
#  in test file ./500-networking.bats, line 419)
#   `run_podman network connect $netname2 $cid' failed with status 56
# $ podman-remote rm --all --force
# $ podman-remote ps --all --external --format {{.ID}} {{.Names}}
# $ podman-remote images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
# quay.io/libpod/testimage:20210610 9f9ec7f2fdef
# $ podman-remote network create testnet-vkwfjR0iF5
# /home/testuser/.config/cni/net.d/testnet-vkwfjR0iF5.conflist
# $ podman-remote network create testnet2-r1Pk2IBwSy
# /home/testuser/.config/cni/net.d/testnet2-r1Pk2IBwSy.conflist
# $ podman-remote run -d --network testnet-vkwfjR0iF5 quay.io/libpod/testimage:20210610 top
# d797adabc97d62a5a8e35072f15f0aaf68d6e73c56a61e0485b5092fdfdaea47
# $ podman-remote run -d -p 5436:80 --network testnet-vkwfjR0iF5 -v /tmp/podman_bats.FQTGpZ/hello.txt:/var/www/index.txt:Z -w /var/www quay.io/libpod/testimage:20210610 /bin/busybox-extras httpd -f -p 80
# 9be6ff480a33b24994ce3f73d308398226b880e67abaabcf6e1cd35ab7d8e19e
# $ podman-remote inspect 9be6ff480a33b24994ce3f73d308398226b880e67abaabcf6e1cd35ab7d8e19e --format {{(index .NetworkSettings.Networks "testnet-vkwfjR0iF5").IPAddress}}
# 10.89.0.3
# $ podman-remote inspect 9be6ff480a33b24994ce3f73d308398226b880e67abaabcf6e1cd35ab7d8e19e --format {{(index .NetworkSettings.Networks "testnet-vkwfjR0iF5").MacAddress}}
# fe:49:35:79:32:82
# $ podman-remote network disconnect testnet-vkwfjR0iF5 9be6ff480a33b24994ce3f73d308398226b880e67abaabcf6e1cd35ab7d8e19e
# $ podman-remote network connect testnet-vkwfjR0iF5 9be6ff480a33b24994ce3f73d308398226b880e67abaabcf6e1cd35ab7d8e19e
# $ podman-remote inspect 9be6ff480a33b24994ce3f73d308398226b880e67abaabcf6e1cd35ab7d8e19e --format {{(index .NetworkSettings.Networks "testnet-vkwfjR0iF5").IPAddress}}
# 10.89.0.4
# $ podman-remote inspect 9be6ff480a33b24994ce3f73d308398226b880e67abaabcf6e1cd35ab7d8e19e --format {{(index .NetworkSettings.Networks "testnet-vkwfjR0iF5").MacAddress}}
# 2e:f5:32:c7:e9:ba
# $ podman-remote network connect testnet2-r1Pk2IBwSy 9be6ff480a33b24994ce3f73d308398226b880e67abaabcf6e1cd35ab7d8e19e
# Error: error configuring network namespace for container 9be6ff480a33b24994ce3f73d308398226b880e67abaabcf6e1cd35ab7d8e19e: error adding pod tender_proskuriakova_tender_proskuriakova to CNI network "testnet2-r1Pk2IBwSy": dnsname error: dnsmasq failed with "\ndnsmasq: failed to create inotify: Too many open files\n": exit status 5
# [ rc=125 (** EXPECTED 0 **) ]
@edsantiago edsantiago added flakes Flakes from Continuous Integration rootless remote Problem is in podman-remote labels Oct 1, 2021
@Luap99
Copy link
Member

Luap99 commented Oct 1, 2021

What is running on the system, is it possible that we exceed the inotify limit? I think the default is that only rootless 128 processes can use inotify. Is it possible that you can check the running processes. Maybe the cleanup is failing and we are leaking the dnsmasq processes.

@edsantiago
Copy link
Collaborator Author

AFAIK this is a system spun up entirely for the purpose of podman gating tests. I have no visibility into these systems, though.

@Luap99
Copy link
Member

Luap99 commented Oct 1, 2021

Well hard to tell what is wrong, I found this cool script to list all processes that use inotify in a nice format. Not only dnsmasq but also conmon uses inotify.

@edsantiago
Copy link
Collaborator Author

Happened again (bodhi):

not ok 253 podman network connect/disconnect with port forwarding
...
# $ podman-remote network connect testnet2-cG4nBJO7El fe9a2eb726bee27d542378ff117dbabcc87bce56af67d7340275e41c29698726
# Error: error configuring network namespace for container fe9a2eb726bee27d542378ff117dbabcc87bce56af67d7340275e41c29698726: error adding pod unruffled_hamilton_unruffled_hamilton to CNI network "testnet2-cG4nBJO7El": dnsname error: dnsmasq failed with "\ndnsmasq: failed to create inotify: Too many open files\n": exit status 5

podman-remote rootless again, but this time f33 instead of f34.

@edsantiago
Copy link
Collaborator Author

This is not a flake. It is failing consistently in bodhi (link is to the third test run; first and second failed the same way). Unfortunately I did not package and ship that script (I don't feel comfortable shipping it in an rpm). It always fails in podman-remote rootless, never (so far) in any of the other three. How can we resolve this?

@Luap99
Copy link
Member

Luap99 commented Oct 21, 2021

@edsantiago Bats always executes the tests in the same order, right?
Could you move the network test ordering around and see if it still fails at the same test.
Could you also add a simple ps auxww so we can at least see what processes are running.

@edsantiago
Copy link
Collaborator Author

Yes, it's always the same order (by filename). It's very easy to change the order, just rename 500-networking.bats to something lower or higher. Adding the ps auxww is also trivial, you just need to add it immediately before the network connect (adding it later would be a NOP, because it would never execute on failure). I am OOTO today (recharge day) so I can't submit a PR until Monday.

Keep in mind, though, this will have a long lag time. The failure only happens in bodhi, so we will only see it in 3.4.2 or 4.0 or some time far in the future. IMHO that makes it impossible to debug.

edsantiago added a commit to edsantiago/libpod that referenced this issue Nov 1, 2021
Volume test: add a sequence of stat()s to confirm that volumes
are mounted as a different device than root.

Network test: add debugging code for containers#11825 (dnsmasq inotify
failure in bodhi only).

Signed-off-by: Ed Santiago <santiago@redhat.com>
@tmds
Copy link
Contributor

tmds commented Nov 3, 2021

I'm also hitting an inotify limit when using rootless podman on Fedora 34.

I think the default is that only rootless 128 processes can use inotify.

Is this limit configurable?

I tried echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p but it doesn't seem to make a difference.

@Luap99
Copy link
Member

Luap99 commented Nov 3, 2021

You also have to set fs.inotify.max_user_instances, the default is 128

@edsantiago
Copy link
Collaborator Author

Still happening:

not ok 255 podman network connect/disconnect with port forwarding
# (from function `die' in file ./helpers.bash, line 448,
#  from function `run_podman' in file ./helpers.bash, line 221,
#  in test file ./500-networking.bats, line 434)
#   `run_podman network connect $netname2 $cid' failed with status 56
# $ podman-remote rm --all --force
# $ podman-remote ps --all --external --format {{.ID}} {{.Names}}
# $ podman-remote images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
# quay.io/libpod/testimage:20210610 9f9ec7f2fdef
# $ podman-remote network create testnet-rBwBb5wFx0
# /home/testuser/.config/cni/net.d/testnet-rBwBb5wFx0.conflist
# $ podman-remote network create testnet2-rCOhOcewfb
# /home/testuser/.config/cni/net.d/testnet2-rCOhOcewfb.conflist
# $ podman-remote run -d --network testnet-rBwBb5wFx0 quay.io/libpod/testimage:20210610 top
# 4e21ea7c543a4c10d13b678b424e8bc713a15b55556ee6d0a7d7273f1aa1e3ef
# $ podman-remote run -d -p 5306:80 --network testnet-rBwBb5wFx0 -v /tmp/podman_bats.Usnxln/hello.txt:/var/www/index.txt:Z -w /var/www quay.io/libpod/testimage:20210610 /bin/busybox-extras httpd -f -p 80
# b739c81c13632068868709a2ec476b718b9059f89c52891524c2387b37fec920
# $ podman-remote inspect b739c81c13632068868709a2ec476b718b9059f89c52891524c2387b37fec920 --format {{(index .NetworkSettings.Networks "testnet-rBwBb5wFx0").IPAddress}}
# 10.89.0.3
# $ podman-remote inspect b739c81c13632068868709a2ec476b718b9059f89c52891524c2387b37fec920 --format {{(index .NetworkSettings.Networks "testnet-rBwBb5wFx0").MacAddress}}
# 96:5a:b1:90:fb:88
# $ podman-remote network disconnect testnet-rBwBb5wFx0 b739c81c13632068868709a2ec476b718b9059f89c52891524c2387b37fec920
# $ podman-remote network connect testnet-rBwBb5wFx0 b739c81c13632068868709a2ec476b718b9059f89c52891524c2387b37fec920
# $ podman-remote inspect b739c81c13632068868709a2ec476b718b9059f89c52891524c2387b37fec920 --format {{(index .NetworkSettings.Networks "testnet-rBwBb5wFx0").IPAddress}}
# 10.89.0.4
# $ podman-remote inspect b739c81c13632068868709a2ec476b718b9059f89c52891524c2387b37fec920 --format {{(index .NetworkSettings.Networks "testnet-rBwBb5wFx0").MacAddress}}
# 7e:7c:16:41:07:0c
# $ podman-remote network disconnect testnet-rBwBb5wFx0 4e21ea7c543a4c10d13b678b424e8bc713a15b55556ee6d0a7d7273f1aa1e3ef
# $ podman-remote network connect testnet-rBwBb5wFx0 4e21ea7c543a4c10d13b678b424e8bc713a15b55556ee6d0a7d7273f1aa1e3ef
# $ podman-remote network connect testnet2-rCOhOcewfb b739c81c13632068868709a2ec476b718b9059f89c52891524c2387b37fec920
# Error: error configuring network namespace for container b739c81c13632068868709a2ec476b718b9059f89c52891524c2387b37fec920: error adding pod clever_wescoff_clever_wescoff to CNI network "testnet2-rCOhOcewfb": dnsname error: dnsmasq failed with "\ndnsmasq: failed to create inotify: Too many open files\n": exit status 5
# [ rc=125 (** EXPECTED 0 **) ]

vrothberg added a commit to vrothberg/libpod that referenced this issue Dec 9, 2021
Issue containers#11825 suggests that *rootless* Podman can run into situations
where too many inotify fds are open.  Indeed, rootless Podman has a
slightly higher usage of inotify watchers than the root counterpart
when using slirp4netns

Make sure to not only close all watchers but to also remove the files
from being watched.  Otherwise, the fds only get closed
when the files are removed.

[NO NEW TESTS NEEDED] since we don't have a way to test it.

Fixes: containers#11825
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/libpod that referenced this issue Dec 9, 2021
Issue containers#11825 suggests that *rootless* Podman can run into situations
where too many inotify fds are open.  Indeed, rootless Podman has a
slightly higher usage of inotify watchers than the root counterpart
when using slirp4netns

Make sure to not only close all watchers but to also remove the files
from being watched.  Otherwise, the fds only get closed
when the files are removed.

[NO NEW TESTS NEEDED] since we don't have a way to test it.

Fixes: containers#11825
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@vrothberg vrothberg self-assigned this Dec 9, 2021
@vrothberg vrothberg added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label Dec 9, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration In Progress This issue is actively being worked by the assignee, please do not work on this at this time. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. remote Problem is in podman-remote rootless
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants