New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for reading journald logs #38859
Conversation
Codecov Report
@@ Coverage Diff @@
## master #38859 +/- ##
=========================================
Coverage ? 37.35%
=========================================
Files ? 609
Lines ? 45209
Branches ? 0
=========================================
Hits ? 16890
Misses ? 26028
Partials ? 2291 |
also @anusha-ragunathan ptal 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When --tail=0
there is an error:
$ docker logs -f test
DEBU[2019-06-14T16:59:24.370312365Z] Calling GET /_ping
DEBU[2019-06-14T16:59:24.370767564Z] Calling GET /v1.39/containers/test/logs?follow=1&stderr=1&stdout=1&tail=all
DEBU[2019-06-14T16:59:24.370917064Z] begin logs container=test method="(*Daemon).ContainerLogs" module=daemon
DEBU[2019-06-14T16:59:24.371271163Z] Calling GET /v1.39/containers/test/json
ERRO[2019-06-14T16:59:24.371765463Z] Error streaming logs: error skipping to next journal entryUnknown error -1 container=test method="(*Daemon).ContainerLogs" module=daemon
DEBU[2019-06-14T16:59:24.371812863Z] end logs (<nil>) container=test method="(*Daemon).ContainerLogs" module=daemon
error from daemon in stream: Error grabbing logs: error skipping to next journal entryUnknown error -1
9fd3138
to
ff31507
Compare
@cpuguy83 this is actually OK, looks like I made a stupid typo in the last commit; fixed now. I have also removed that unused |
2c59582
to
7965469
Compare
Interesting failure on RS5 🤔 https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS5-Process/2796/console
kicking CI |
I've seen it on all PRs recently, not sure what is going on in there except it's broken globally. |
I'm commenting on this PR in the hopes it might be relevant. We are running several AWS ECS instances and have noticed that dockerd is holding almost two million file descriptors to deleted -json.log files. We are currently running 18.06.1-ce. Is any of the work done in this PR likely to impact the other log drivers? Maybe improving this situation? All the other issues I found referencing this type of FD leak have fixes in versions far older than what we are running. |
@deinspanjer No, these changes will not affect the json log driver. Please open an issue and describe your reproduction steps so we can debug and determine if there's still a problem. Thanks! |
@kolyshkin You are right. Basically the default value for |
So there's still something to fix in this PR? @kolyshkin were you working on that? |
Clean up a deferred function call in the journal reading logic. Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Minor code simplification. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case Tail=N parameter is requested, we need to show N lines. It does not make sense to walk backwards one by one if we can do it at once. Now, if Since=T is also provided, make sure we haven't jumped too far (before T), and if we did, move forward. The primary motivation for this was to make the code simpler. This also fixes a tiny bug in the "since" implementation. Before this commit: > $ docker logs -t --tail=6000 --since="2019-03-10T03:54:25.00" $ID | head > 2019-03-10T03:54:24.999821000Z 95981 After: > $ docker logs -t --tail=6000 --since="2019-03-10T03:54:25.00" $ID | head > 2019-03-10T03:54:25.000013000Z 95982 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case the LogConsumer is gone, the code that sends the message can stuck forever. Wrap the code in select case, as all other loggers do. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. The journald client library initializes inotify watch(es) during the first call to sd_journal_get_fd(), and it make sense to open it earlier in order to not lose any journal file rotation events. 2. It only makes sense to call this if we're going to use it later on -- so add a check for config.Follow. 3. Remove the redundant call to sd_journal_get_fd(). NOTE that any subsequent calls to sd_journal_get_fd() return the same file descriptor, so there's no real need to save it for later use in wait_for_data_cancelable(). Based on earlier patch by Nalin Dahyabhai <nalin@redhat.com>. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
TL;DR: simplify the code, fix --follow hanging indefinitely Do the following to simplify the followJournal() code: 1. Use Go-native select instead of C-native polling. 2. Use Watch{Producer,Consumer}Gone(), eliminating the need to have journald.closed variable, and an extra goroutine. 3. Use sd_journal_wait(). In the words of its own man page: > A synchronous alternative for using sd_journal_get_fd(), > sd_journal_get_events(), sd_journal_get_timeout() and > sd_journal_process() is sd_journal_wait(). Unfortunately, the logic is still not as simple as it could be; the reason being, once the container has exited, journald might still be writing some logs from its internal buffers onto journal file(s), and there is no way to figure out whether it's done so we are guaranteed to read all of it back. This bug can be reproduced with something like > $ ID=$(docker run -d busybox seq 1 150000); docker logs --follow $ID > ... > 128123 > $ (The last expected output line should be `150000`). To avoid exiting from followJournal() early, add the following logic: once the container is gone, keep trying to drain the journal until there's no new data for at least `waitTimeout` time period. Should fix docker/for-linux#575 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
If we take a long time to process log messages, and during that time journal file rotation occurs, the journald client library will keep those rotated files open until sd_journal_process() is called. By periodically calling sd_journal_process() during the processing loop we shrink the window of time a client instance has open file descriptors for rotated (deleted) journal files. This code is modelled after that of journalctl [1]; the above explanation as well as the value of 1024 is taken from there. [v2: fix CErr() argument] [1] https://github.com/systemd/systemd/blob/dc16327c48d/src/journal/journalctl.c#L2676 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
From the first glance, `docker logs --tail 0` does not make sense, as it is supposed to produce no output, but `tail -n 0` from GNU coreutils is working like that, plus there is even a test case (`TestLogsTail` in integration-cli/docker_cli_logs_test.go). Now, something like `docker logs --follow --tail 0` makes total sense, so let's make it work. (NOTE if --tail is not used, config.Tail is set to -1) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use "in-place" variables for if statements to limit their scope to the respectful `if` block. 2. Report the error returned from sd_journal_* by using CErr(). 3. Use errors.New() instead of fmt.Errorf(). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@thaJeztah No, I fixed it as per #38859 (comment). Just checked, this one is all good and ready to fly. @cpuguy83 PTAL |
Rebased to current master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
not an expert on this, but read the code-changes and commit messages, and it looked solid.
@cpuguy83 PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
closes #39305
closes #36254
fixes #27343
fixes docker/for-linux#575
This series is an attempt to straighten out journald log reading (i.e using
docker logs
for a container with journald log driver. In particular, it should fix at least two issues:docker logs --follow
not exiting once a container is gone).In addition, it should improve code readability and fix some minor bugs, like
docker logs --tail=0
not working properly (which seems to be useless until one adds--follow
);docker logs --since
off-by-one record wrt timestamps;Finally, this PR reduces the code size a bit:
Testing notes
Note that current CI does not test journald log driver in any way (which might be the reason why it's in so unfortunate state). Therefore, I have tested the results manually and also by running all
TestLogs*
tests from theintegration-cli/
(the result is, a number of test case failures decreased substiantially).A potential followup to this PR would be to move the above mentioned tests to integration, and make sure CI runs those with log drivers other than the default (
json-file
).Known bugs
NOTE: These are not newly introduced bugs; they were there before and are not fixed by this PR
--follow
is impossible to implement entirely correct due to async nature of journald log writing; therefore it's possible that it would end up earlier than all the container logs are shown. The possibility of this depends on the system load (mostly I/O) as well as the value ofwaitTimeout
infollowJournal()
.because of the above,
--follow
does not exits immediately, there's a waiting period which amounts to about a second;sometimes reading container logs right after it has finished (such as in
docker wait $ID && docker logs $ID
) does not produce any output; the reason is the same async nature of the journald writing.In some corner cases
\n
is lost from the logs; here's a repro:Note in the second run (the one with
seq 1 32768
) the\n
in the end is lost. There is the same bug in local driver forseq 1 16384
or more, and I guess it's not super critical.This work is partially based on prior PR #36254 by @nalind
A picture of a cute animal
image source: https://www.youtube.com/watch?v=K9e3Ime75C0