fix(#773): wait for WatchTask goroutines in Runner.Close() by morgo · Pull Request #776 · block/spirit

morgo · 2026-05-01T21:20:15Z

Summary

Fixes flaky test: TestResumeFromCheckpointCleanupOnFailure #773
status.WatchTask spawns the periodic status and checkpoint dumpers as fire-and-forget goroutines that exit only when their parent ctx is done. Runner.Close() did not wait for them: a late DumpCheckpoint INSERT could land after Close() returned, racing with test-side mutations of the checkpoint table.
In TestResumeFromCheckpointCleanupOnFailure this manifests as the checkpoint UPDATE being out-raced by a background INSERT — the new row has a higher auto-increment id and a real binlog name, so the resumed migration's ORDER BY id DESC LIMIT 1 picks the valid row and sets usedResumeFromCheckpoint=true, failing require.False(t, m2.usedResumeFromCheckpoint).
Have WatchTask return a wait function. Both callers (pkg/migration, pkg/move) store it and invoke it from Close() (after cancelling their ctx) before tearing down DB connections. Close() also now invokes the runner's cancelFunc explicitly so the background goroutines are guaranteed to observe ctx.Done() even on paths that bypass Run's deferred cancel.
This is the broader of the two fixes morgo proposed in the issue (option 2). It eliminates this whole class of "late background-goroutine writes after Close" races, not just the specific symptom in flaky test: TestResumeFromCheckpointCleanupOnFailure #773.

Test plan

go test -run TestResumeFromCheckpointCleanupOnFailure -count=5 ./pkg/migration/
go test -run TestResumeFrom -count=2 ./pkg/migration/ (all resume tests, ~130s)
go test -short ./pkg/migration/... (no regressions; only pre-existing privilege-test failures unrelated to this change)
go build ./... && go vet ./...

🤖 Generated with Claude Code

status.WatchTask spawns the periodic status and checkpoint dumpers as fire-and-forget goroutines that exit only when their parent ctx is done. Runner.Close() did not wait for those goroutines: a late DumpCheckpoint INSERT could land after Close() returned, racing with test-side mutations of the checkpoint table. In TestResumeFromCheckpointCleanupOnFailure this manifested as the checkpoint UPDATE being "out-raced" by a background INSERT — the new row had a higher auto-increment id and a real binlog name, so the resumed migration's `ORDER BY id DESC LIMIT 1` picked the valid row and set usedResumeFromCheckpoint=true. Have WatchTask return a wait function. Both callers (pkg/migration, pkg/move) store it and invoke it from Close() (after cancelling their ctx) before tearing down DB connections. Close() also now invokes the runner's cancelFunc explicitly so the background goroutines are guaranteed to observe ctx.Done() even on paths that bypass Run's deferred cancel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cleaner than wg.Add(N) + per-goroutine defer wg.Done(). Available since Go 1.25; this module is on 1.26. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

morgo and others added 3 commits May 1, 2026 15:16

refactor: use sync.WaitGroup.Go in WatchTask

6763c81

Cleaner than wg.Add(N) + per-goroutine defer wg.Done(). Available since Go 1.25; this module is on 1.26. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge branch 'main' into fix-issue-773-cleanup-checkpoint-race

19de0e2

aparajon approved these changes May 1, 2026

View reviewed changes

morgo merged commit 1e6930d into block:main May 1, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#773): wait for WatchTask goroutines in Runner.Close()#776

fix(#773): wait for WatchTask goroutines in Runner.Close()#776
morgo merged 3 commits into
block:mainfrom
morgo:fix-issue-773-cleanup-checkpoint-race

morgo commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

morgo commented May 1, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants