Skip to content

Commit

Permalink
Revisit "Logging stopped due to error" msg
Browse files Browse the repository at this point in the history
The sharness test suite triggered several "Logging stopped due to
error" messages in /var/log/messages on FreeBSD, NetBSD, and OpenBSD.
These messages were not observed on any of the Linux test systems.

In log.c:_log_aux(), the formatted log message buffer is written to the
logfile via fprintf().  If this call fails, the aforementioned message
is logged via syslog() and further writes to the logfile are disabled.

The sharness test suite repeatedly starts and stops munged by sending
a SIGTERM (typically via "munged --stop").  On BSD, this SIGTERM
is occasionally interrupting fprintf() and causing it to fail with
errno=EINTR or with errno left unset.

Isolated tests of sending multiple SIGTERMs to a process calling
fprintf() in a loop showed 4KB of output being dropped when fprintf()
returns an error.  Afterwards, writes to the output stream resumed;
consequently, disabling writes to the logfile (at least for transient
errors like EINTR) is unwarranted.

This commit makes the following changes:
1) The syslog() message includes the errno string from strerror()
   describing the error, and errno is initialized beforehand since
   it is not reliably set on an fprintf() error.
2) The syslog() message notes that messages may have been dropped.
3) The syslog() priority is reduced to LOG_ERR.
4) The "got_fprintf_error" flag is added to prevent repeated messages
   to syslog() while fprintf() is failing; this flag is cleared on
   a successful return from fprintf().
5) Further writes to the logfile are no longer disabled on error.

The syslog() message does not include the name of the logfile.
This is because the logging subsystem is given a FILE * and does not
have a record of the filename attached to the stream.  Furthermore,
a connection to the system logger is not opened beforehand with
openlog() when logging to file, so the log message may not include
the pid (via the syslog option LOG_PID); however, note that FreeBSD
enables this option by default.  The lack of a filename and/or pid
can make it difficult to identify the logfile affected if multiple
munged processes are running concurrently.

Tested:
- Debian 10.2
- FreeBSD 12.1-RELEASE-p1, 12.0-RELEASE-p12, 11.3-RELEASE-p5
- NetBSD 8.1, 7.2
- OpenBSD 6.6, 6.5

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
  • Loading branch information
dun committed Nov 22, 2019
1 parent 5646cf1 commit 6176b42
Showing 1 changed file with 13 additions and 4 deletions.
17 changes: 13 additions & 4 deletions src/libcommon/log.c
Expand Up @@ -64,6 +64,7 @@ struct log_ctx {
FILE *fp;
int got_init;
int got_syslog;
int got_fprintf_error;
int priority;
int options;
char id [LOG_IDENTITY_MAXLEN];
Expand All @@ -74,7 +75,7 @@ struct log_ctx {
* Static Variables
*****************************************************************************/

static struct log_ctx log_ctx = { NULL, 0, 0, 0, 0, { '\0' } };
static struct log_ctx log_ctx = { NULL, 0, 0, 0, 0, 0, { '\0' } };


/*****************************************************************************
Expand Down Expand Up @@ -370,15 +371,23 @@ _log_aux (int errnum, int priority, char *msgbuf, int msgbuflen,
msgbuf[0] = '\0';
}
}
/* Log this!
/* Log message.
*/
if (log_ctx.got_syslog && sbuf) {
syslog (priority, "%s", sbuf);
}
if (log_ctx.fp && (priority <= log_ctx.priority)) {
errno = 0;
if (fprintf (log_ctx.fp, "%s", buf) == EOF) {
syslog (LOG_CRIT, "Logging stopped due to error");
log_ctx.fp = NULL;
if (!log_ctx.got_fprintf_error) {
syslog (LOG_ERR,
"Failed logfile write: %s: messages may have been dropped",
(errno != 0) ? strerror (errno) : "Unspecified error");
log_ctx.got_fprintf_error = 1;
}
}
else if (log_ctx.got_fprintf_error) {
log_ctx.got_fprintf_error = 0;
}
}
return;
Expand Down

0 comments on commit 6176b42

Please sign in to comment.