Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ran out of file descriptors (MakePipe failures) on cvmfs_server snapshot run from cron on EL8 #3606

Open
DrDaveD opened this issue May 28, 2024 · 1 comment
Milestone

Comments

@DrDaveD
Copy link
Contributor

DrDaveD commented May 28, 2024

We recently upgraded the backup Nebraska stratum 1 from EL7 to EL8, and some snapshots failed with errors like this:

terminate called after throwing an instance of 'ECvmfsException'
  what():  PANIC: /builddir/build/BUILD/cvmfs-2.11.3/cvmfs/util/pipe.h : 214
MakePipe failed with errno 24
/usr/bin/cvmfs_server: line 7557: 3396963 Aborted                 (core dumped) 
$user_shell "$(__swissknife_cmd dbg) pull -m $name         -u $stratum0
                                  -w $stratum1
         -r ${upstream}                                         -x ${spool_dir}/tmp                                    -k $public_key
              -n $num_workers                                        -t $timeout
                       -a $retries $with_history $with_reflog                    $initial_snapshot_flag $timestamp_threshold $log_level"

Investigation of a cvmfs_swissknife pull command showed a soft limit for Max open files of only 1024, even though the nofile parameter set in /etc/security/limits.d was 65536. Interestingly the hard limit was 262144, so it appears that this is a new default for cron independent of the nofile setting. These snapshots were started through a root cron entry including cvmfs_server snapshot -ais.

Perhaps cvmfs_server should look for a higher hard nofile ulimit than soft and update the setting to the hard limit if it is lower.

@DrDaveD DrDaveD added this to the 2.12 milestone May 28, 2024
@DrDaveD
Copy link
Contributor Author

DrDaveD commented May 28, 2024

Interestingly the difference appears to only be in /usr/sbin/CROND processes and not /usr/sbin/crond processes:

There's config to increase the file limit on both our EL7 and EL8 stratum 1s:

[root@hcc-cvmfs ~]# cat /etc/security/limits.d/99-nofile.conf 
# Managed by Puppet
*               -       nofile          65536

But it doesn't seem to be working on cron processes after the upgrade.

EL7:

[root@hcc-cvmfs ~]# grep open /proc/$(pgrep -f /usr/sbin/crond)/limits
Max open files            1024                 4096                 files
[root@hcc-cvmfs ~]# grep open /proc/$(pgrep -f /usr/sbin/CROND | head -1)/limits
Max open files            65536                65536                files

EL8:

root@hcc-cvmfs2 ~]# grep open /proc/$(pgrep -f /usr/sbin/crond)/limits
Max open files            1024                 262144               files
[root@hcc-cvmfs2 ~]# grep open /proc/$(pgrep -f /usr/sbin/CROND | head -1)/limits
Max open files            1024                 262144               files

Another workaround is to fix it with a systemd override for crond:

[root@hcc-cvmfs2 ~]# cat /etc/systemd/system/crond.service.d/override.conf 
[Service]
LimitNOFILE=65536:262144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant