fix: live log view stalls on busy containers with rotated logs#4776
Merged
Conversation
A container streaming from a point well past its start (because its log backlog was rotated away) feeds skipOrphanedLines lines that all look like orphaned group continuations: simple, level-less, and spaced under maxGroupTimeDelta. At a sustained high rate peek() never reports a gap, so the skip loop buffers every line forever and the UI shows 'Container has no logs yet' until a timing gap finally flushes it. Bound the orphan run to maxOrphanLines (1000). Past that it clearly isn't a leftover fragment, so emit the buffered lines as singles and resume normal processing. Verified against a container flooding ~5k lines/sec with a rotated backlog: live logs went from 0 messages in 40s to streaming immediately. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
fix looks correct. a few notes:
no bugs, security issues, or blockers found. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Opening a busy container in the live log view can show "Container has no logs yet" for a long time even though the container is actively logging, then it suddenly loads. Reproduces on a container flooding ~5k lines/sec whose backlog has rotated (json-file
max-size).Root cause
The live stream replays from the container's
StartedAt.skipOrphanedLines()runs first to drain leading continuation lines left over from a group split at a prior fetch boundary. It short-circuits only when the first line is within 5s ofStartedAt. When the backlog has rotated, the oldest available line is far past the start, so it does not short-circuit.It then treats every simple, level-less line spaced under
maxGroupTimeDelta(50ms) as an orphan and buffers it. At a sustained high ratepeek()always has a line within its 50ms timeout, so it never reports a gap and the loop buffers forever, emitting nothing. It only flushes when a timing gap finally appears, which is the "eventually loads."Verified with curl against the real container: 0 messages in 40s on master.
Fix
Bound the orphan run with
maxOrphanLines(1000). A genuine leftover fragment is the tail of one group, which is small; a run this long is real content, not orphans. Past the cap, emit the buffered lines as singles and resume normal processing.After the fix, the same flooding container goes from 0 messages in 40s to streaming immediately (315k messages in 5s).
Tests
TestEventGenerator_doesNotStallOnSustainedLevellessStream: a sustained level-less stream (first line far fromStartedAt) must emit within 5s. Times out on master, passes with the cap. Fullinternal/containersuite passes with-race.🤖 Generated with Claude Code