fix(#773): wait for WatchTask goroutines in Runner.Close()#776
Merged
Conversation
status.WatchTask spawns the periodic status and checkpoint dumpers as fire-and-forget goroutines that exit only when their parent ctx is done. Runner.Close() did not wait for those goroutines: a late DumpCheckpoint INSERT could land after Close() returned, racing with test-side mutations of the checkpoint table. In TestResumeFromCheckpointCleanupOnFailure this manifested as the checkpoint UPDATE being "out-raced" by a background INSERT — the new row had a higher auto-increment id and a real binlog name, so the resumed migration's `ORDER BY id DESC LIMIT 1` picked the valid row and set usedResumeFromCheckpoint=true. Have WatchTask return a wait function. Both callers (pkg/migration, pkg/move) store it and invoke it from Close() (after cancelling their ctx) before tearing down DB connections. Close() also now invokes the runner's cancelFunc explicitly so the background goroutines are guaranteed to observe ctx.Done() even on paths that bypass Run's deferred cancel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cleaner than wg.Add(N) + per-goroutine defer wg.Done(). Available since Go 1.25; this module is on 1.26. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aparajon
approved these changes
May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
status.WatchTaskspawns the periodic status and checkpoint dumpers as fire-and-forget goroutines that exit only when their parent ctx is done.Runner.Close()did not wait for them: a lateDumpCheckpointINSERT could land afterClose()returned, racing with test-side mutations of the checkpoint table.TestResumeFromCheckpointCleanupOnFailurethis manifests as the checkpoint UPDATE being out-raced by a background INSERT — the new row has a higher auto-increment id and a real binlog name, so the resumed migration'sORDER BY id DESC LIMIT 1picks the valid row and setsusedResumeFromCheckpoint=true, failingrequire.False(t, m2.usedResumeFromCheckpoint).WatchTaskreturn a wait function. Both callers (pkg/migration,pkg/move) store it and invoke it fromClose()(after cancelling their ctx) before tearing down DB connections.Close()also now invokes the runner'scancelFuncexplicitly so the background goroutines are guaranteed to observectx.Done()even on paths that bypassRun's deferred cancel.Test plan
go test -run TestResumeFromCheckpointCleanupOnFailure -count=5 ./pkg/migration/go test -run TestResumeFrom -count=2 ./pkg/migration/(all resume tests, ~130s)go test -short ./pkg/migration/...(no regressions; only pre-existing privilege-test failures unrelated to this change)go build ./... && go vet ./...🤖 Generated with Claude Code