Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

excessive logging of: "Suspended new connections while processing backlog" #94

Closed
gabuniag opened this issue Feb 3, 2021 · 1 comment
Milestone

Comments

@gabuniag
Copy link

gabuniag commented Feb 3, 2021

munge writes thousands of errors like
2021-02-01 14:39:14 +0100 Info: Suspended new connections while processing backlog
into /var/log/munge.log. "/" partition is filled up, leading to various errors.

Not yet sure how to replicate the issue, but the source is:
https://github.com/dun/munge/blob/master/src/munged/job.c
case ENOMEM: log_msg (LOG_INFO, "Suspended new connections while processing backlog"); work_wait (w);

dun added a commit that referenced this issue Feb 4, 2021
When munged accepts a new connection, accept() errors for EMFILE,
ENFILE, ENOBUFS, and ENOMEM cause the accept() loop to block while
the backlog is processed.  However, the resulting log message does
not differentiate between these errors.

Add the strerror() string to the log message for these error cases.

Issue #94
@dun
Copy link
Owner

dun commented Feb 4, 2021

You've hit an error path for accept(). It could be one of:

EMFILE The per-process limit on the number of open file descriptors has been reached.

ENFILE The system-wide limit on the total number of open files has been reached.

ENOBUFS, ENOMEM Not enough free memory. This often means that the memory allocation is limited by the socket buffer limits, not by the system memory.

To find out which of the above errors is occurring, you could change the log message statement to:
log_msg (LOG_INFO, "Suspended new connections while processing backlog: %s", strerror (errno));
and then recompile, install, restart service, etc.

I've pushed this commit to my dev branch if you would rather use that.

You could try increasing the per-process and system-wide open file descriptor limits.

It would be useful to know how frequently this is occurring. Thousands of times per second? hour? day? week?

Are there occasional bursts of this happening, or is this seemingly a steady-state? Is the system out of memory when this occurs?

How many worker threads are you running?

It might also be useful to know your distro version and kernel version.

Re: "/" partition is filling up. That is precisely why logs should be written to a separate partition.

@dun dun added this to the 0.5.15 milestone Mar 10, 2021
@dun dun closed this as completed in ae78dcc Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants