Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sysctl: fs.inotify.max_user_watches is too low and hard-capped #637

Closed
nrvnrvn opened this issue Sep 29, 2020 · 15 comments
Closed

sysctl: fs.inotify.max_user_watches is too low and hard-capped #637

nrvnrvn opened this issue Sep 29, 2020 · 15 comments

Comments

@nrvnrvn
Copy link

nrvnrvn commented Sep 29, 2020

Observing messages like this in system journal:

systemd[1]: rpm-ostreed.service: Failed to add control inotify watch descriptor for control group /system.slice/rpm-ostreed.service: No space left on device
$ rpm-ostree status
State: idle
Deployments:
● ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 32.20200907.3.0 (2020-09-23T08:16:31Z)
                    Commit: b53de8b03134c5e6b683b5ea471888e9e1b193781794f01b9ed5865b57f35d57
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
$ systemctl cat rpm-ostreed
# /usr/lib/systemd/system/rpm-ostreed.service
[Unit]
Description=rpm-ostree System Management Daemon
Documentation=man:rpm-ostree(1)
ConditionPathExists=/ostree

[Service]
Type=dbus
BusName=org.projectatomic.rpmostree1
# To use the read-only sysroot bits
MountFlags=slave
NotifyAccess=main
ExecStart=/usr/bin/rpm-ostree start-daemon
ExecReload=/usr/bin/rpm-ostree reload
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G  1.0M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/vda4       9.5G  1.9G  7.7G  20% /sysroot
tmpfs            16G     0   16G   0% /tmp
/dev/vdb        180G   42G  139G  24% /var/srv
/dev/vda1       364M   88M  254M  26% /boot
/dev/vda2       127M  8.5M  119M   7% /boot/efi
tmpfs           3.2G     0  3.2G   0% /run/user/1000
@travier
Copy link
Member

travier commented Sep 30, 2020

You might want to check inode counts with df -i

@travier
Copy link
Member

travier commented Sep 30, 2020

Maybe we should also consider raising the inotify limits. We currently have:

$ sysctl fs/inotify/max_user_watches
fs.inotify.max_user_watches = 8192

On my workstation, I have:

$ sysctl fs/inotify/max_user_watches
fs.inotify.max_user_watches = 524288

Temporary workaround:

echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.d/40-max-user-watches.conf

@dustymabe
Copy link
Member

@nrvnrvn - do @travier's suggestions help?

@nrvnrvn
Copy link
Author

nrvnrvn commented Oct 8, 2020

@dustymabe @travier
Yes, it helps.

Provisioning CoreOS with the following bits I don't see this message anymore:

    - path: /etc/sysctl.d/90-inotify.conf
      contents:
        inline: |
          fs.inotify.max_user_watches = 524288

Can this sysctl value be made default for CoreOS?

@lucab
Copy link
Contributor

lucab commented Oct 8, 2020

@nrvnrvn how much RAM do you have on that machine (i.e free -m)? My call is that the inotify.max_user_watches should instead be computed at runtime by the kernel based on the amount of memory on the specific system, which is generally a better idea than us hardcoding a number for everybody.

@nrvnrvn
Copy link
Author

nrvnrvn commented Oct 11, 2020

@lucab this is how much RAM I have on that machine:

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          32107        6236        5947           0       19923       25303
Swap:             0           0           0

@lucab
Copy link
Contributor

lucab commented Oct 12, 2020

@nrvnrvn for context, what is the output of sudo sysctl fs.inotify.max_user_watches on that same machine without the workaround configuration?

@nrvnrvn
Copy link
Author

nrvnrvn commented Oct 14, 2020

@lucab 8192

@travier
Copy link
Member

travier commented Oct 15, 2020

Sorry I forgot that I had a custom /etc/sysctl.d/40-max-user-watches.conf on my system setting the watches to a higher value.

@lucab
Copy link
Contributor

lucab commented Oct 15, 2020

@nrvnrvn @travier thanks both for the feedback.

So I have a running theory at this point (just a guess, still need to investigate more). The kernel-side math logic seems to be somehow hard-capped on x86_64, possibly because of a memory-split CONFIG_HIGHMEM assumption. This results in a silly small result (8192 entries) even on large systems (32GiB of RAM).

Doing the math in the reverse direction, 8192 entries of 160 bytes as 4% of lomem should mean a total lomem of:

8192 * 160 / 4 * 100 == 32768000

That smells fishy because 1) it's a small number compared to machine spec 2) it seems not to vary on machines with different amounts of RAM.

As observed on a few different VMs:

# free -m && sysctl fs.inotify.max_user_watches
              total        used        free      shared  buff/cache   available
Mem:           2984         110        2665           8         209        2720
Swap:             0           0           0
fs.inotify.max_user_watches = 8192
# free -m && sysctl fs.inotify.max_user_watches
              total        used        free      shared  buff/cache   available
Mem:          11974         120       11644           8         209       11599
Swap:             0           0           0
fs.inotify.max_user_watches = 8192
# free -m && sysctl fs.inotify.max_user_watches
              total        used        free      shared  buff/cache   available
Mem:          16006         114       15682           8         209       15622
Swap:             0           0           0
fs.inotify.max_user_watches = 8192
# free -m && sysctl fs.inotify.max_user_watches
              total        used        free      shared  buff/cache   available
Mem:         491806      325628       15191          53      150986      162797
Swap:             0           0           0
fs.inotify.max_user_watches = 8192

This does not seem to be specific to FCOS and I'm also seeing it on RHCOS, I've forwarded to RH bugzilla at https://bugzilla.redhat.com/show_bug.cgi?id=1888617.

@lucab lucab changed the title rpm-ostreed.service: Failed to add control inotify watch descriptor sysctl: max_user_watches is too low and static Oct 15, 2020
@lucab lucab changed the title sysctl: max_user_watches is too low and static sysctl: fs.inotify.max_user_watches is too low and static Oct 15, 2020
@lucab lucab changed the title sysctl: fs.inotify.max_user_watches is too low and static sysctl: fs.inotify.max_user_watches is too low and hard-capped Oct 15, 2020
@lucab
Copy link
Contributor

lucab commented Nov 9, 2020

There is a currently in-progress patch on linux-fsdevel for this, first revision is https://patchwork.kernel.org/project/linux-fsdevel/patch/20201026204418.23197-1-longman@redhat.com/.

@lucab
Copy link
Contributor

lucab commented Jan 27, 2021

The kernel-side patch for this landed in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=92890123749bafc317bbfacbe0a62ce08d78efb7 and should be part of 5.11.

@lucab lucab added status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. and removed needs/investigation labels Jan 27, 2021
@travier travier added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. status/pending-next-release Fixed upstream. Waiting on a next release. and removed status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. labels Apr 7, 2021
@dustymabe dustymabe added the status/pending-testing-release Fixed upstream. Waiting on a testing release. label Apr 7, 2021
@dustymabe
Copy link
Member

Kernel 5.11 landed in the next stream last week.

The fix for this went into next stream release 34.20210328.1.0. Please try out the new release and report issues.

@dustymabe dustymabe removed the status/pending-next-release Fixed upstream. Waiting on a next release. label Apr 8, 2021
@dustymabe
Copy link
Member

The fix for this went into testing stream release 34.20210427.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels May 14, 2021
@dustymabe
Copy link
Member

The fix for this went into stable stream release 34.20210427.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants