New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to avoid getting stuck when draining the journal #36254
Conversation
When draining the journal, if the output message channel is full, stop reading. This keeps us from blocking indefinitely when a client hangs up on us and the channel is full. Doing this requires that we make multiple calls to drainJournal(), even in non-follow mode. In non-follow mode, keep trying to read until we run out of entries or hit the "until" cutoff, which if not specified, we set to "now". If we don't do that, and we start reading but can't keep up with the journal, we'll never catch up even after we get to journal messages which were logged long after we started looking for the then-present logs. The journal API opens the journal files when the journal handle is opened by sd_journal_open(), and uses an inotify descriptor to notice when files have been added or removed, but it doesn't set up that inotify descriptor until the first time that sd_journal_get_fd() is called (either by a client, or as part of sd_journal_wait()). We hadn't been doing that until our initial read-through of entries was done, meaning that we've been missing file deletion events that occurred during our first pass at reading the journal. Make that window shorter. Periodically call sd_journal_process() when reading the journal, to let it skip over and close handles to journal files which have been deleted since we started reading the journal, for cases where our keeping them open contributes to a shortage of disk space. Treat SD_JOURNAL_INVALIDATE like SD_JOURNAL_APPEND, since sd_journal_process() prefers to return INVALIDATE over APPEND when both are applicable. Clean up a deferred function call in the journal reading logic. Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
ping @cpuguy83 PTAL |
Wondering if we can add integration tests for the journald logger |
duration := 10 * time.Millisecond | ||
timer := time.NewTimer(duration) | ||
drainCatchup: | ||
for C.sd_journal_next(j) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be a difference handling EOF (return value 0) and error cases (negative) here?
following = true | ||
} | ||
} else { | ||
// In case we stopped reading because the output channel was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my clarification: Did you mean the "output buffer"?
👍 on adding an integration test to cover the leak. |
// indication that there are no more entries for us to read, or | ||
// until we cross the "until" threshold, or until we get told | ||
// to close the reader, whichever comes first. | ||
duration := 10 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an arbitrary frequency to drain? If this is purely to prevent the client from indefinite blocking scenarios, then I'm thinking that we can increase the duration.
following = true | ||
} | ||
} else { | ||
// In case we stopped reading because the output channel was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt we be checking the done
bool returned by drainJournal
in L517 to determine if we are done, rather than ignoring it?
// We know which entry was read last, so try to go to that | ||
// location. | ||
rc := C.sd_journal_seek_cursor(j, oldCursor) | ||
if rc != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would be nice to stick to one style through this code. Either rc <0
(Preferable) or rc != 0
.
drain: | ||
// Give the journal handle an opportunity to close any open descriptors | ||
// for files that have been removed. | ||
C.sd_journal_process(j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err check?
// for a while. Letting the journal library process them will close any that | ||
// are already deleted, so that we'll skip over them and allow space that would | ||
// have been reclaimed by deleting these files to actually be reclaimed. | ||
if sent > 0 && sent%1024 == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since this is garbage collection logic, it'll be easier to read if we can extract this out into a function.
ping @nalind could you address the review comments? |
@nalind CI build fails:
And these is reviews above which needs to be addressed. |
addresses #27343 and possibly docker/for-linux#575 |
Just found a more review-able version of this PR here: projectatomic#298. Looking... |
closing in favour of #38859 |
- What I did
Try to fix dockerd-keeps-deleted-journal-files-open-forever problems by keeping the reading goroutine from blocking indefinitely.
- How I did it
When draining the journal, if the output message channel is full, stop reading, at least for now. This keeps us from blocking indefinitely when a client hangs up on us and the channel is full. Doing this requires that we make multiple calls to
drainJournal()
, even in non-follow mode.In non-follow mode, keep trying to read until we run out of entries or hit the "until" cutoff, which if not specified, we set to "now". If we don't do that, and we start reading but can't keep up with the journal,
we'll never catch up even after we get to journal messages which were logged long after we started looking for the then-present logs.
The journal API opens the journal files when the journal handle is opened by
sd_journal_open()
, and uses an inotify descriptor to notice when files have been added or removed, but it doesn't set up thatinotify descriptor until the first time that
sd_journal_get_fd()
is called (either by a client, or as part ofsd_journal_wait()
). We hadn't been doing that until our initial read-through of entries was done,meaning that we've been missing file deletion events that occurred during our first pass at reading the journal. Make that window shorter.
Periodically call
sd_journal_process()
when reading the journal, to let it skip over and close handles to journal files which have been deleted since we started reading the journal, for cases where our keeping them open contributes to a shortage of disk space.Treat
SD_JOURNAL_INVALIDATE
likeSD_JOURNAL_APPEND
, sincesd_journal_process()
prefers to return INVALIDATE over APPEND when both are applicable. This obsoletes #36064, becausesd_journal_process()
's handling of inotify events should make closing and reopening the handle in response to an INVALIDATE status unnecessary.Clean up a deferred function call in the journal reading logic.
- How to verify it
One way to force a reasonably high turnover rate in the journal is to set
RateLimitBurst=0
andSystemMaxFileSize=10M
in/etc/systemd/journald.conf
, and use thebusybox
image runningseq
with a large number (say, a million or ten million) as its argument.Running
docker logs -f
should follow the container's output until the container exits or the daemon closes the logging handle internally.docker logs
should follow along until the container exits or it hits the end of the log. After the container exits, either command should run to completion and not hang. If either command is paused at the client (using ctrl-s, waiting several seconds, then ctrl-q to unpause) while running, possibly with an interveningjournalctl --vacuum-size1
pruning down the size of the journal, despite any gaps in the output, the ordering between messages received by the client should be preserved. If instead of unpausing the client, the client is stopped with ctrl-c, open descriptors that dockerd was using for reading the journal should be closed.At pretty much any time, we shouldn't be holding open descriptors to deleted journal files for more than a fraction of a second.
- Description for the changelog
journald log reader: avoid blocking indefinitely / holding open deleted files
- A picture of a cute animal (not mandatory but encouraged)