Skip to content

test(#630): use WithTestThrottler for race-window tests#816

Merged
morgo merged 3 commits into
block:mainfrom
morgo:fix-issue-630-test-throttler-races
May 3, 2026
Merged

test(#630): use WithTestThrottler for race-window tests#816
morgo merged 3 commits into
block:mainfrom
morgo:fix-issue-630-test-throttler-races

Conversation

@morgo
Copy link
Copy Markdown
Collaborator

@morgo morgo commented May 3, 2026

Summary

  • Fixes flaky test: TestResumeFromCheckpointE2EWithManualSentinel #630. The first migration in TestResumeFromCheckpointE2EWithManualSentinel was finishing CopyRows before the test's cancel() arrived, so the second migration sometimes had no checkpoint to resume from and didn't reach WaitingOnSentinelTable within the 30s Eventually window. Adding WithTestThrottler() keeps it predictably slow.
  • Audited the rest of pkg/migration/*_test.go for the same hazard: a goroutine waits for status >= CopyRows then runs a fixed-iteration DML loop. Five tests had this pattern without the throttler — they were relying on table size + CI hardware speed for the race window. Added the throttler to all of them so timing is bounded by BlockWait (1s/chunk) instead of row count:

Tests deliberately skipped: TestMigrationCancelledFromTableModification (DDL detection only requires < CutOver, much wider window), runCutoverAtomicityTest (throttler would defeat the cutover-race intent), and the WithDeferCutOver/sentinel-blocked tests (sentinel itself bounds timing).

Test plan

  • CI green across MySQL 8.0 / 8.0.45 / 8.4 / 9.7 / Aurora variants
  • Run the touched tests in a tight loop to confirm no regressions:
    • go test ./pkg/migration -run 'TestResumeFromCheckpointE2EWithManualSentinel|TestEnumReorder|TestSetReorder|TestAlterPKIntToBigIntWithDML|TestAlterPKIntToBigIntWithDMLAndAdditionalColumnChange|TestCheckConstraintWithDML' -count=20

🤖 Generated with Claude Code

morgo and others added 3 commits May 3, 2026 07:57
Tests that rely on the migration staying in CopyRows long enough for a
concurrent goroutine to do its work were depending on table size for
timing. On fast hardware (or when SeedRows produced fewer rows than
intended, as in block#630) the migration could race past CopyRows before the
goroutine finished, leading to "Condition never satisfied" timeouts or
silently degraded test coverage.

WithTestThrottler installs a Mock throttler whose BlockWait sleeps 1s
per chunk, making the CopyRows window predictable regardless of CI
load. Apply it to the five tests that were missing it but follow the
same race-window pattern as TestResumeFromCheckpointE2E (which has
already used the throttler successfully):

- TestResumeFromCheckpointE2EWithManualSentinel — fixes block#630
- testEnumReorder / testSetReorder
- testAlterPKIntToBigIntWithDML
- testAlterPKIntToBigIntWithDMLAndAdditionalColumnChange
- TestCheckConstraintWithDML

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@morgo morgo merged commit cd7d687 into block:main May 3, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flaky test: TestResumeFromCheckpointE2EWithManualSentinel

2 participants