Try to avoid getting stuck when draining the journal #36254

nalind · 2018-02-08T23:35:09Z

- What I did

Try to fix dockerd-keeps-deleted-journal-files-open-forever problems by keeping the reading goroutine from blocking indefinitely.

- How I did it

When draining the journal, if the output message channel is full, stop reading, at least for now. This keeps us from blocking indefinitely when a client hangs up on us and the channel is full. Doing this requires that we make multiple calls to drainJournal(), even in non-follow mode.

In non-follow mode, keep trying to read until we run out of entries or hit the "until" cutoff, which if not specified, we set to "now". If we don't do that, and we start reading but can't keep up with the journal,
we'll never catch up even after we get to journal messages which were logged long after we started looking for the then-present logs.

The journal API opens the journal files when the journal handle is opened by sd_journal_open(), and uses an inotify descriptor to notice when files have been added or removed, but it doesn't set up that
inotify descriptor until the first time that sd_journal_get_fd() is called (either by a client, or as part of sd_journal_wait()). We hadn't been doing that until our initial read-through of entries was done,
meaning that we've been missing file deletion events that occurred during our first pass at reading the journal. Make that window shorter.

Periodically call sd_journal_process() when reading the journal, to let it skip over and close handles to journal files which have been deleted since we started reading the journal, for cases where our keeping them open contributes to a shortage of disk space.

Treat SD_JOURNAL_INVALIDATE like SD_JOURNAL_APPEND, since sd_journal_process() prefers to return INVALIDATE over APPEND when both are applicable. This obsoletes #36064, because sd_journal_process()'s handling of inotify events should make closing and reopening the handle in response to an INVALIDATE status unnecessary.

Clean up a deferred function call in the journal reading logic.

- How to verify it

One way to force a reasonably high turnover rate in the journal is to set RateLimitBurst=0 and SystemMaxFileSize=10M in /etc/systemd/journald.conf, and use the busybox image running seq with a large number (say, a million or ten million) as its argument.

Running docker logs -f should follow the container's output until the container exits or the daemon closes the logging handle internally. docker logs should follow along until the container exits or it hits the end of the log. After the container exits, either command should run to completion and not hang. If either command is paused at the client (using ctrl-s, waiting several seconds, then ctrl-q to unpause) while running, possibly with an intervening journalctl --vacuum-size1 pruning down the size of the journal, despite any gaps in the output, the ordering between messages received by the client should be preserved. If instead of unpausing the client, the client is stopped with ctrl-c, open descriptors that dockerd was using for reading the journal should be closed.

At pretty much any time, we shouldn't be holding open descriptors to deleted journal files for more than a fraction of a second.

- Description for the changelog

journald log reader: avoid blocking indefinitely / holding open deleted files

- A picture of a cute animal (not mandatory but encouraged)

When draining the journal, if the output message channel is full, stop reading. This keeps us from blocking indefinitely when a client hangs up on us and the channel is full. Doing this requires that we make multiple calls to drainJournal(), even in non-follow mode. In non-follow mode, keep trying to read until we run out of entries or hit the "until" cutoff, which if not specified, we set to "now". If we don't do that, and we start reading but can't keep up with the journal, we'll never catch up even after we get to journal messages which were logged long after we started looking for the then-present logs. The journal API opens the journal files when the journal handle is opened by sd_journal_open(), and uses an inotify descriptor to notice when files have been added or removed, but it doesn't set up that inotify descriptor until the first time that sd_journal_get_fd() is called (either by a client, or as part of sd_journal_wait()). We hadn't been doing that until our initial read-through of entries was done, meaning that we've been missing file deletion events that occurred during our first pass at reading the journal. Make that window shorter. Periodically call sd_journal_process() when reading the journal, to let it skip over and close handles to journal files which have been deleted since we started reading the journal, for cases where our keeping them open contributes to a shortage of disk space. Treat SD_JOURNAL_INVALIDATE like SD_JOURNAL_APPEND, since sd_journal_process() prefers to return INVALIDATE over APPEND when both are applicable. Clean up a deferred function call in the journal reading logic. Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>

thaJeztah · 2018-02-19T14:09:22Z

ping @cpuguy83 PTAL

thaJeztah · 2018-04-05T18:32:45Z

Wondering if we can add integration tests for the journald logger

anusha-ragunathan · 2018-04-09T19:42:03Z