Skip to content

image: deflake TestCopyCtx_MidStreamCancel#41

Merged
deveshctl merged 1 commit into
mainfrom
fix/copyctx-test-race
Jun 2, 2026
Merged

image: deflake TestCopyCtx_MidStreamCancel#41
deveshctl merged 1 commit into
mainfrom
fix/copyctx-test-race

Conversation

@deveshctl
Copy link
Copy Markdown
Owner

What

Fix the intermittent failure of TestCopyCtx_MidStreamCancel that surfaced on main after #39 and #40 merged in sequence.

Why

The test's blockingReader returned io.EOF on its second Read once the test called close(br.release). But copyCtx returns nil on EOF (line 26-27 of image/copyctx.go) before its next top-of-loop ctx.Err() check. So this scheduling order produced the failure:

  1. Copy goroutine writes the first chunk.
  2. Copy goroutine loops, passes ctx.Err() (still nil), enters second src.Read.
  3. Test sees dst.Len() == len(first), calls cancel(), then close(br.release).
  4. Reader returns (0, io.EOF).
  5. copyCtx sees rerr == io.EOF → returns (written, nil).

Result: error = <nil>, want context.Canceled. Both #39 and #40's PR runs passed in isolation; the failure only showed on main's run for #40 because of CI scheduling.

Fix

Test-only. Give blockingReader a context, and have its second Read block on <-ctx.Done() then return ctx.Err() instead of io.EOF. Now whichever side observes the cancel first yields context.Canceled to copyCtx:

  • copyCtx's top-of-loop ctx.Err() returns it directly, or
  • the reader's <-ctx.Done() unblocks and returns ctx.Err(), which copyCtx propagates via the rerr branch (line 29-30).

Either way, cerr == context.Canceled deterministically. No change to production copyCtx.

Verification

  • go build ./... and go vet ./... clean locally (Windows dev-machine constraint precludes go test).
  • CI build-and-test job will verify on Linux.

The test's blockingReader returned io.EOF on its second Read after
close(br.release). But copyCtx returns nil on EOF (line 26-27) before
its next top-of-loop ctx.Err() check, so a scheduling order where the
copy goroutine entered the second Read *before* the test called
cancel() would observe EOF first and return nil — exactly the
intermittent failure seen on main once #39 and #40 landed.

Make the reader's second Read block on <-ctx.Done() and return
ctx.Err() instead. Now whichever side observes the cancel first
yields context.Canceled to copyCtx: either the top-of-loop
ctx.Err() check, or the rerr branch propagating the reader's
ctx.Err(). Test-only change; production copyCtx is untouched.
@deveshctl deveshctl merged commit ffeb9b7 into main Jun 2, 2026
7 checks passed
@deveshctl deveshctl deleted the fix/copyctx-test-race branch June 2, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant