Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition in logs -f with journald driver #10323

Closed
vrothberg opened this issue May 12, 2021 · 4 comments · Fixed by #10431
Closed

race condition in logs -f with journald driver #10323

vrothberg opened this issue May 12, 2021 · 4 comments · Fixed by #10431
Assignees
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@vrothberg
Copy link
Member

Reproducer from #10222 (comment):

$ while :;do ./bin/podman run --log-driver=journald -d --name foo quay.io/libpod/testimage:20210427 sh -c 'echo hi;sleep 2;echo bye';./bin/podman logs -f foo;./bin/podman rm foo;done
caf4dce7ce3ac5d08da339a8c0ecd4ec97df498cd472fbff61182352e71cf831
hi     <---- there is no subsequent bye!
caf4dce7ce3ac5d08da339a8c0ecd4ec97df498cd472fbff61182352e71cf831
08f58203f738027300d651fed861f1673eb8a1a953b869c788d905664b709938
hi
bye

I know where the error is and will drop a // FIXME in libpod/container_logs_linux.go (and a link to the code once #10222 is merged).

@vrothberg
Copy link
Member Author

@rhatdan FYI

@vrothberg
Copy link
Member Author

As pointed out in https://github.com/containers/podman/pull/10222/files#diff-20cc30e1cdf302ef7404e5923eada3912c68c8b8943c0a7a0a834b29236eba69R92, using the Follow API is racy. In order to get it done correctly, we have to implement our custom follow function that forward everything from stdout and stderr UNTIL we read on the journal that the container died (i.e., get the died event).

I looked at the journal and the died event is always printed after the logs are written. The problem at the moment is that we have to many things running concurrently. Having one goroutine reading the log and filtering out what's necessary until we read the died event seems one way to avoid that race.

@mheon
Copy link
Member

mheon commented May 12, 2021 via email

@vrothberg
Copy link
Member Author

vrothberg commented May 13, 2021 via email

@vrothberg vrothberg self-assigned this May 21, 2021
@vrothberg vrothberg added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label May 21, 2021
vrothberg added a commit to vrothberg/libpod that referenced this issue May 21, 2021
Fix a race in journald driver.  Following the logs implies streaming
until the container is dead.  Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition.  When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read.  The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: containers#10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/libpod that referenced this issue May 21, 2021
Fix a race in journald driver.  Following the logs implies streaming
until the container is dead.  Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition.  When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read.  The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: containers#10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/libpod that referenced this issue May 25, 2021
Fix a race in journald driver.  Following the logs implies streaming
until the container is dead.  Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition.  When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read.  The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: containers#10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/libpod that referenced this issue May 25, 2021
Fix a race in journald driver.  Following the logs implies streaming
until the container is dead.  Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition.  When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read.  The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: containers#10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/libpod that referenced this issue May 25, 2021
Fix a race in journald driver.  Following the logs implies streaming
until the container is dead.  Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition.  When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read.  The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: containers#10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/libpod that referenced this issue May 26, 2021
Fix a race in journald driver.  Following the logs implies streaming
until the container is dead.  Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition.  When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read.  The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: containers#10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/libpod that referenced this issue May 26, 2021
Fix a race in journald driver.  Following the logs implies streaming
until the container is dead.  Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition.  When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read.  The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: containers#10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/libpod that referenced this issue May 26, 2021
Fix a race in journald driver.  Following the logs implies streaming
until the container is dead.  Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition.  When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read.  The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: containers#10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
rugk pushed a commit to rugk/podman that referenced this issue Jul 9, 2021
Fix a race in journald driver.  Following the logs implies streaming
until the container is dead.  Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition.  When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read.  The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: containers#10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants