Skip to content

syncwriter writes sink errors to raw stderr via println, corrupting go test/test2json output #225

@EhabY

Description

@EhabY

Summary

internal/syncwriter reports sink write errors by calling the builtin println, which writes unsynchronized bytes to fd 2 (stderr). When slog runs inside a test binary, those raw writes are merged into the stream that go test/test2json parses. Under concurrency they interleave inside another test's --- PASS:/--- FAIL: line, so test2json never records that test's terminal action and tools like gotestsum report the (passing) test as unknown/failed. This causes false-positive CI flakes where the blamed test is an innocent bystander.

We hit this in coder/coder: a passing TestRetryWithInterval was reported as a failure purely due to this corruption.

Where (v3.0.0)

internal/syncwriter/syncwriter.go:

func New(w io.Writer) *Writer {
    return &Writer{
        w: w,
        errorf: func(f string, v ...interface{}) {
            println(fmt.Sprintf(f, v...)) // raw, unsynchronized write to fd 2
        },
    }
}

func (w *Writer) Write(name string, p []byte) {
    ...
    if err != nil {
        w.errorf("%v: failed to write entry: %+v", name, err)
    }
}

Trigger: any sink whose underlying writer can fail, e.g. a sloghuman.Sink over an io.Pipe whose reader has gone away (io.ErrClosedPipe). In coder this happens with coder port-forward -v (and similar) when background goroutines keep logging after a test closes the pipe.

Evidence (deterministic)

1. slog emits the error into the captured test stream. A sloghuman.Sink over a closed io.Pipe, then one logger.Warn, run under go test -json:

{"Action":"output","Output":"sloghuman: failed to write entry: io: read/write on closed pipe\n"}

2. println writes the text and the newline as separate syscalls, so under concurrency the framework's --- PASS: write lands between them. Real CI output from coder/coder:

sloghuman: failed to write entry: io: read/write on closed pipe--- PASS: TestRetryWithInterval (0.00s)

3. Feeding that exact byte pattern to go tool test2json drops the pass action:

  • clean --- PASS: TestVictim (0.00s) -> pass TestVictim
  • sloghuman: ...closed pipe--- PASS: TestVictim (0.00s) -> no terminal action (parsed as plain output), so gotestsum reports TestVictim as unknown.

Impact

Any project using slog in tests can have unrelated, passing tests randomly reported as failed/unknown whenever a sink's writer is closed (pipes, network writers, broken stdio, etc.). It is very hard to diagnose because the failing test is not the one that logged.

Proposed fix

Don't write sink diagnostics to raw stderr in a way that can corrupt captured output:

  • Minimal: in Writer.Write, suppress benign "destination closed" errors (io.ErrClosedPipe, os.ErrClosed, syscall.EPIPE), mirroring the existing Sync() suppression of EINVAL/ENOTTY/EBADF.
  • Better: make the error handler configurable (default no-op, or route through the logger itself) instead of an unconditional builtin println to fd 2.

Happy to send a PR.


Filed by Coder Agents on behalf of @EhabY.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions