Skip to content

fix: live log view stalls on busy containers with rotated logs#4776

Merged
amir20 merged 1 commit into
masterfrom
fix/log-grouping-orphan-stall
Jun 2, 2026
Merged

fix: live log view stalls on busy containers with rotated logs#4776
amir20 merged 1 commit into
masterfrom
fix/log-grouping-orphan-stall

Conversation

@amir20
Copy link
Copy Markdown
Owner

@amir20 amir20 commented Jun 2, 2026

Problem

Opening a busy container in the live log view can show "Container has no logs yet" for a long time even though the container is actively logging, then it suddenly loads. Reproduces on a container flooding ~5k lines/sec whose backlog has rotated (json-file max-size).

Root cause

The live stream replays from the container's StartedAt. skipOrphanedLines() runs first to drain leading continuation lines left over from a group split at a prior fetch boundary. It short-circuits only when the first line is within 5s of StartedAt. When the backlog has rotated, the oldest available line is far past the start, so it does not short-circuit.

It then treats every simple, level-less line spaced under maxGroupTimeDelta (50ms) as an orphan and buffers it. At a sustained high rate peek() always has a line within its 50ms timeout, so it never reports a gap and the loop buffers forever, emitting nothing. It only flushes when a timing gap finally appears, which is the "eventually loads."

Verified with curl against the real container: 0 messages in 40s on master.

Fix

Bound the orphan run with maxOrphanLines (1000). A genuine leftover fragment is the tail of one group, which is small; a run this long is real content, not orphans. Past the cap, emit the buffered lines as singles and resume normal processing.

After the fix, the same flooding container goes from 0 messages in 40s to streaming immediately (315k messages in 5s).

Tests

TestEventGenerator_doesNotStallOnSustainedLevellessStream: a sustained level-less stream (first line far from StartedAt) must emit within 5s. Times out on master, passes with the cap. Full internal/container suite passes with -race.

🤖 Generated with Claude Code

A container streaming from a point well past its start (because its log
backlog was rotated away) feeds skipOrphanedLines lines that all look
like orphaned group continuations: simple, level-less, and spaced under
maxGroupTimeDelta. At a sustained high rate peek() never reports a gap,
so the skip loop buffers every line forever and the UI shows 'Container
has no logs yet' until a timing gap finally flushes it.

Bound the orphan run to maxOrphanLines (1000). Past that it clearly
isn't a leftover fragment, so emit the buffered lines as singles and
resume normal processing. Verified against a container flooding ~5k
lines/sec with a rotated backlog: live logs went from 0 messages in 40s
to streaming immediately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

fix looks correct. a few notes:

  • emitAsSingles return value ignored at the new cap check (event_generator.go:163) — consistent with how it's ignored at the existing peek == nil path, so context cancellation is handled the same way both places
  • steadyLevellessReader.Read() creates a time.After timer per call; fine since go 1.23+ timers are gc-able when drained, and the project is on 1.25+
  • the 5s test timeout is conservative (cap triggers in ~1s at 1ms/line delay), which is good for flake resistance

no bugs, security issues, or blockers found.

@amir20 amir20 merged commit fd0a485 into master Jun 2, 2026
12 checks passed
@amir20 amir20 deleted the fix/log-grouping-orphan-stall branch June 2, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant