Skip to content

Fix flaky migration integration tests#696

Merged
morgo merged 2 commits into
mainfrom
aparajon/fix-flaky-cutover-test
Apr 22, 2026
Merged

Fix flaky migration integration tests#696
morgo merged 2 commits into
mainfrom
aparajon/fix-flaky-cutover-test

Conversation

@aparajon
Copy link
Copy Markdown
Collaborator

@aparajon aparajon commented Apr 20, 2026

What's this?

Fixes two intermittently failing integration tests in pkg/migration that have been blocking CI on unrelated PRs.

Changes

TestCutoverAtomicityWithConcurrentWrites

  • Wrapped the existing checksum comparison in assert.Eventually (5s timeout, 250ms poll) to tolerate brief replication lag after the partial cutover
  • The original single-shot comparison could catch a transient mismatch when residual replication events hadn't fully settled in CI Docker containers

TestResumeFromCheckpointE2EWithManualSentinel

  • Reduced test data from ~400k rows (4x 100k) to ~100k rows (2x 50k)
  • The resumed migration was timing out at 30s with only ~70% of rows copied on slow CI runners
  • 100k rows is still plenty for the first migration to be mid-copy when checkpointed

Verification

Both tests pass 5/5 locally against Docker MySQL with replication.


Most of this was written by Claude Code — I just provided direction.

Replace the one-shot manual CRC32 checksum comparison with
assert.Eventually and CHECKSUM TABLE. This handles the case where
residual replication events haven't fully settled by the time the
test compares tables, which was causing intermittent failures in CI
due to variable Docker container I/O latency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Armand Parajon <armand@squareup.com>
@aparajon aparajon changed the title Fix flaky TestCutoverAtomicityWithConcurrentWrites Fix flaky test TestCutoverAtomicityWithConcurrentWrites Apr 20, 2026
@aparajon aparajon force-pushed the aparajon/fix-flaky-cutover-test branch 2 times, most recently from 968964c to aabab82 Compare April 20, 2026 22:14
@aparajon aparajon changed the title Fix flaky test TestCutoverAtomicityWithConcurrentWrites Fix flaky migration integration tests Apr 20, 2026
@aparajon aparajon marked this pull request as ready for review April 20, 2026 22:44
Wrap the checksum comparison in assert.Eventually (5s timeout, 250ms
poll) to tolerate brief replication lag after the partial cutover.
The original single-shot comparison could catch a transient mismatch
when residual replication events hadn't fully settled in CI Docker
containers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Armand Parajon <armand@squareup.com>
@aparajon aparajon force-pushed the aparajon/fix-flaky-cutover-test branch from aabab82 to 9ab22fe Compare April 20, 2026 23:10
@morgo morgo self-requested a review April 22, 2026 13:09
Copy link
Copy Markdown
Collaborator

@morgo morgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try it. Always hard to know with flaky tests.

@morgo morgo merged commit cb8ce33 into main Apr 22, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants