You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
munge writes thousands of errors like
2021-02-01 14:39:14 +0100 Info: Suspended new connections while processing backlog
into /var/log/munge.log. "/" partition is filled up, leading to various errors.
When munged accepts a new connection, accept() errors for EMFILE,
ENFILE, ENOBUFS, and ENOMEM cause the accept() loop to block while
the backlog is processed. However, the resulting log message does
not differentiate between these errors.
Add the strerror() string to the log message for these error cases.
Issue #94
You've hit an error path for accept(). It could be one of:
EMFILE The per-process limit on the number of open file descriptors has been reached.
ENFILE The system-wide limit on the total number of open files has been reached.
ENOBUFS, ENOMEM Not enough free memory. This often means that the memory allocation is limited by the socket buffer limits, not by the system memory.
To find out which of the above errors is occurring, you could change the log message statement to: log_msg (LOG_INFO, "Suspended new connections while processing backlog: %s", strerror (errno));
and then recompile, install, restart service, etc.
I've pushed this commit to my dev branch if you would rather use that.
You could try increasing the per-process and system-wide open file descriptor limits.
It would be useful to know how frequently this is occurring. Thousands of times per second? hour? day? week?
Are there occasional bursts of this happening, or is this seemingly a steady-state? Is the system out of memory when this occurs?
How many worker threads are you running?
It might also be useful to know your distro version and kernel version.
Re: "/" partition is filling up. That is precisely why logs should be written to a separate partition.
munge writes thousands of errors like
2021-02-01 14:39:14 +0100 Info: Suspended new connections while processing backlog
into /var/log/munge.log. "/" partition is filled up, leading to various errors.
Not yet sure how to replicate the issue, but the source is:
https://github.com/dun/munge/blob/master/src/munged/job.c
case ENOMEM: log_msg (LOG_INFO, "Suspended new connections while processing backlog"); work_wait (w);
The text was updated successfully, but these errors were encountered: