check: require binlog_order_commits=ON#819
Merged
Merged
Conversation
Both pkg/migration/check/configuration.go and pkg/move/check/configuration.go already validate the binlog-related server settings spirit relies on (binlog_format, binlog_row_image, binlog_row_value_options, log_bin, log_slave_updates), but neither checks binlog_order_commits. With binlog_order_commits=OFF, MySQL is allowed to commit transactions to InnoDB in a different order than they appear in the binary log: a transaction T's row events can be written to the binlog *before* T's commit becomes visible to fresh SELECTs on other connections. Spirit's binlog applier path assumes the opposite — when the streamer delivers a row event to HasChanged, that row is expected to be visible to deltaMap/bufferedMap.Flush's `REPLACE INTO _new ... SELECT FROM original WHERE pk IN (...)` (running on a separate autocommit connection). If the assumption is violated, spirit silently drops rows during cutover. This is exactly the failure shape on the still-open second issue-block#746 bug observed in CI on PRs block#813 and block#814: Flush stmt executed table=t1concurrent_oub num_keys=3 affected_rows=2 stmt="REPLACE INTO _t1concurrent_oub_new (...) SELECT ... FROM t1concurrent_oub WHERE id IN (177,180,179)" A key was in the delta map (so HasChanged saw the binlog event) but the SELECT couldn't find it in `original` at the statement's snapshot — binlog_order_commits=OFF is one of the few server configurations that allows that. binlog_order_commits=ON is the MySQL default, so this check is mostly defensive — it eliminates one whole hypothesis branch from the issue block#746 investigation, and protects future users on customised configs from a silent data-loss mode regardless of how that investigation shakes out. Both check files now SELECT @@global.binlog_order_commits alongside the existing variables and return a configuration error if it is not "1". Tests in both packages flip the variable to OFF and assert the check fails, then restore via SET GLOBAL = ON in a defer (boolean variables don't accept the parameterised string form that binlog_row_image does). Refs block#818, block#746. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These tests flipped server-wide globals (`binlog_row_image`, `binlog_order_commits`, `activate_all_roles_on_login`) to exercise configurationCheck/privilegesCheck rejection paths. Within a single Go test binary, t.Parallel() can keep these tests sequential, but t.Parallel only governs within-binary scheduling — Go test runs distinct package binaries in parallel via -p (default GOMAXPROCS), and pkg/migration/check, pkg/move/check, pkg/migration, etc. all share the same test MySQL. Any other binary running `configurationCheck` or `privilegesCheck` (which fires from every full migration's preflight) during a flipped window saw the wrong value and failed with a misleading error. That's a real source of cross-binary test flakes that has been present since these tests landed; PR block#819's new binlog_order_commits=OFF tests would have just added more instances of the same pattern. Removed: - pkg/migration/check/configuration_test.go: the binlog_row_image=NOBLOB section + the binlog_order_commits=OFF section added by PR block#819's initial revision. - pkg/move/check/configuration_test.go: TestConfigurationCheckBinlogOrderCommits added by PR block#819's initial revision. - pkg/migration/check/privileges_test.go: TestPrivilegesWithRDSSuperuserRole. - pkg/move/check/privileges_test.go: TestMovePrivilegesWithRDSSuperuserRole. The production-code change in PR block#819 — requiring `binlog_order_commits=ON` in both configuration checks — is unchanged. Negative-path coverage for all of these branches (wrong binlog_row_image, OFF binlog_order_commits, wrong activate_all_roles_on_login) is now exercised only at startup against a real misconfigured server, until configurationCheck and privilegesCheck are refactored to be unit-testable without touching server globals. Each removed test left a comment block explaining the rationale and flagging the unit-testability refactor as the path back to coverage. Refs block#818, block#746. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…no SET GLOBAL)
The previous commit dropped both TestPrivilegesWithRDSSuperuserRole tests
because their assertions required `SET GLOBAL activate_all_roles_on_login`,
which races with concurrent Go test binaries. That removed more coverage
than necessary — the *acceptance* path can be tested without flipping the
global, by:
1. Reading @@global.activate_all_roles_on_login at test start.
2. t.Skip if not ON.
3. Otherwise, set up the RDS-style role + user and verify
privilegesCheck passes (the role-tolerance path).
The rejection path (activate_all_roles_on_login=OFF still rejects the
role) remains uncovered until privilegesCheck is unit-testable without
touching server globals.
Both restored tests:
- migration/check/privileges_test.go::TestPrivilegesWithRDSSuperuserRole
- move/check/privileges_test.go::TestMovePrivilegesWithRDSSuperuserRole
No behaviour change for callers; only the negative-path assertion is
gone.
Refs block#818.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aparajon
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #818.
Summary
Two changes:
Production check (the original purpose of this PR). Both
pkg/migration/check/configuration.goandpkg/move/check/configuration.gonow query@@global.binlog_order_commitsalongside the other binlog-related variables they already validate, and reject startup with a clear error if it isn't1.Test cleanup (added in the second commit, broader than originally scoped). Removed all
SET GLOBALtests from both check packages — thebinlog_row_image=NOBLOBtest, the newbinlog_order_commits=OFFtests this PR initially added, and the twoTestPrivilegesWithRDSSuperuserRolevariants that flippedactivate_all_roles_on_login. See "Why both" below.Why the production check (item 1)
With
binlog_order_commits=OFF, MySQL is allowed to commit transactions to InnoDB in a different order than they appear in the binary log: a transaction T's row events can be written to the binlog before T's commit becomes visible to freshSELECTs on other connections.Spirit's binlog applier path assumes the opposite. When the streamer delivers a row event to
HasChanged, the row is expected to be visible todeltaMap/bufferedMap.Flush'sREPLACE INTO _new ... SELECT FROM original WHERE pk IN (...)running on a separate autocommit connection. Withbinlog_order_commits=OFFthat assumption breaks and rows can be silently lost during cutover.This is exactly the failure shape on the still-open second issue-#746 bug observed in CI on PRs #813 and #814 (
Flush stmt num_keys=3 affected_rows=2, where one PK is in the delta map but invisible to the SELECT).binlog_order_commits=ONis the MySQL default; this check is mostly defensive, but it eliminates one whole hypothesis branch from the #746 investigation and protects future users on customised configurations from a silent data-loss mode.Why the test cleanup (item 2)
Initial review of the new tests pointed out that
SET GLOBALflips a server-wide variable, and any other Go test binary runningconfigurationCheck(or any check that reads the same global) during the flipped window sees the wrong value and fails.t.Parallel()only governs within-binary scheduling — Go test runs distinct package binaries in parallel via-p(default GOMAXPROCS), andpkg/migration/check,pkg/move/check,pkg/migration, etc. all share the same test MySQL.That race wasn't introduced by this PR — the existing
binlog_row_image=NOBLOBtest inconfiguration_test.goand the twoTestPrivilegesWithRDSSuperuserRolevariants that flipactivate_all_roles_on_loginhad it too. They're a plausible source of cross-package flakes elsewhere in the suite.Rather than adding more
SET GLOBALtests to a pattern we already wanted to clean up, this PR drops them all. Each removed test leaves a short comment explaining the rationale and pointing at the path back: refactor the affected check to take its variable values via an injectable struct, then test it as a pure unit without touching the server. That refactor is out of scope for this PR; tracked for follow-up.The negative branches (wrong
binlog_row_image,OFFbinlog_order_commits, wrongactivate_all_roles_on_login) are now exercised only at startup against a real misconfigured server, but the trade-off is a lot of cross-package flake risk gone.Will this fix #746's second bug?
binlog_order_commits=ONis the MySQL default and the docker-compose images don't override it, so unless the CI MySQL is somehow running with it OFF, this check won't fire on the failing lanes. Worth a quickSELECT @@global.binlog_order_commitson the docker-compose lane to verify — if it's OFF, this PR is the actual fix; if it's ON, the second bug is something else and this PR still hardens spirit against future users running with it OFF.Changes
pkg/migration/check/configuration.go— addsbinlog_order_commitsto the existing combinedSELECTand a hard-fail check after the other variable validations.pkg/move/check/configuration.go— same, with the per-source error wrapping that file uses.pkg/migration/check/configuration_test.go,pkg/move/check/configuration_test.go,pkg/migration/check/privileges_test.go,pkg/move/check/privileges_test.go— drop allSET GLOBALtests; happy-path coverage retained.Test plan
go build ./...clean.go vet ./...clean.go test ./pkg/migration/check ./pkg/move/check— pass.🤖 Generated with Claude Code