Test kill capability directly instead of parsing privilege labels#658
Test kill capability directly instead of parsing privilege labels#658aparajon wants to merge 1 commit into
Conversation
53c367a to
63a6dee
Compare
Spirit's privilege preflight check string-matches SHOW GRANTS for CONNECTION_ADMIN. This fails on managed MySQL services (e.g. RDS MySQL 8.4) where the capability is granted via roles without exposing the label in SHOW GRANTS output. Replace the CONNECTION_ADMIN string check with a KILL 0 probe: MySQL returns error 1094 (Unknown thread id) when the user has the privilege, or error 1095 (Access denied) when they don't. This tests the actual capability regardless of how it was granted. Also activate granted roles (SET ROLE ALL) before SHOW GRANTS so role-inherited privileges like REPLICATION CLIENT are visible.
63a6dee to
248c823
Compare
There was a problem hiding this comment.
Pull request overview
Updates Spirit’s MySQL privilege preflight to avoid relying solely on string-matching SHOW GRANTS for CONNECTION_ADMIN, improving compatibility with managed MySQL services where effective privileges can be inherited via roles and/or not shown in grants output.
Changes:
- Add
showGrantsWithRoles()to runSET ROLE ALL(session-scoped) beforeSHOW GRANTS, improving role-aware grant parsing. - Add
canKillConnections()fallback probe that verifies kill capability by creating a disposable “victim” connection from another user and attemptingKILL. - Add
TestCanKillConnectionscoverage for the kill capability probe behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pkg/migration/check/privileges.go | Adds role-aware grant retrieval and a direct kill-capability probe fallback for force-kill preflight checks. |
| pkg/migration/check/privileges_test.go | Extends privilege tests and adds a new test validating the kill-capability probe. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| if _, err := db.ExecContext(ctx, fmt.Sprintf("CREATE USER %s", probeUser)); err != nil { | ||
| return fmt.Errorf("cannot verify kill capability (CREATE USER failed): %w", err) | ||
| } | ||
| defer func() { | ||
| _, _ = db.ExecContext(context.Background(), fmt.Sprintf("DROP USER IF EXISTS %s", probeUser)) | ||
| }() | ||
|
|
||
| // Connect as the probe user to create the victim connection. | ||
| victimCfg := gmysql.NewConfig() | ||
| victimCfg.User = probeUser | ||
| victimCfg.Net = "tcp" | ||
| victimCfg.Addr = host | ||
| victimDB, err := sql.Open("mysql", victimCfg.FormatDSN()) |
There was a problem hiding this comment.
The probe user is created without any authentication (CREATE USER _spirit_kill_probe) and then Spirit tries to connect with an empty password. Some MySQL configurations/policies disallow passwordless accounts, which would make the capability probe fail even when the caller has kill capability. Consider creating the probe user with a strong random password (and using it in the DSN) to make the probe work reliably across environments.
| errStr := killErr.Error() | ||
| if strings.Contains(errStr, "1094") || strings.Contains(errStr, "Unknown thread id") { |
There was a problem hiding this comment.
canKillConnections() determines whether the error is ER_NO_SUCH_THREAD by string-matching the error text ("1094" / "Unknown thread id"). This is brittle across drivers/locales and can misclassify errors. Prefer checking the concrete MySQL error type (e.g., *mysql.MySQLError) and comparing the numeric error code (1094/1095).
| errStr := killErr.Error() | |
| if strings.Contains(errStr, "1094") || strings.Contains(errStr, "Unknown thread id") { | |
| var mysqlErr *gmysql.MySQLError | |
| if errors.As(killErr, &mysqlErr) && mysqlErr.Number == 1094 { |
| config, err := mysql.ParseDSN(testutils.DSN()) | ||
| require.NoError(t, err) | ||
| config.User = "root" | ||
| db, err := sql.Open("mysql", fmt.Sprintf("%s:%s@tcp(%s)/%s", config.User, config.Passwd, config.Addr, config.DBName)) | ||
| require.NoError(t, err) |
There was a problem hiding this comment.
TestCanKillConnections rebuilds DSNs with fmt.Sprintf instead of using the parsed mysql.Config’s FormatDSN(). This drops any parameters present in testutils.DSN() (e.g., TLS, timeouts, params), which can make the test flaky if the test DSN changes. Prefer updating the mysql.Config fields (User/Passwd/DBName) and calling FormatDSN().
| unprivDB, err := sql.Open("mysql", fmt.Sprintf("testkillprobe:@tcp(%s)/", host)) | ||
| require.NoError(t, err) |
There was a problem hiding this comment.
In TestCanKillConnections the unprivileged connection DSN is hard-coded as testkillprobe:@tcp(%s)/, which again bypasses mysql.Config parsing/escaping and omits any required DSN params (notably TLS). Building this DSN via mysql.Config (and using the same host/params as testutils.DSN()) will make the test more robust across environments.
| const probeUser = "_spirit_kill_probe" | ||
|
|
||
| // Create a temporary user to own the victim connection. | ||
| if _, err := db.ExecContext(ctx, fmt.Sprintf("DROP USER IF EXISTS %s", probeUser)); err != nil { | ||
| return fmt.Errorf("cannot verify kill capability (DROP USER failed): %w", err) | ||
| } | ||
| if _, err := db.ExecContext(ctx, fmt.Sprintf("CREATE USER %s", probeUser)); err != nil { | ||
| return fmt.Errorf("cannot verify kill capability (CREATE USER failed): %w", err) | ||
| } | ||
| defer func() { | ||
| _, _ = db.ExecContext(context.Background(), fmt.Sprintf("DROP USER IF EXISTS %s", probeUser)) | ||
| }() |
There was a problem hiding this comment.
canKillConnections() uses a fixed global username ("_spirit_kill_probe") and unconditionally runs DROP USER IF EXISTS / CREATE USER. This can (a) clobber a legitimate existing user with that name, and (b) race if multiple Spirit instances (or tests) run concurrently against the same server. Use a per-run unique probe username (random suffix / connection id) and only drop the specific user you created; avoid deleting a pre-existing account that you didn't create.
|
Implemented in #659 instead |
Problem
Spirit's preflight privilege check string-matches
SHOW GRANTSforCONNECTION_ADMIN. This fails on managed MySQL services where the capability is granted via roles without exposing the label inSHOW GRANTSoutput.On RDS MySQL 8.4,
CONNECTION_ADMINwas removed fromrds_superuser_role. The user can kill connections (the capability is inherited through the role), but:GRANT CONNECTION_ADMIN ON *.*fails — the RDS admin user lacksGRANT OPTIONfor itrds_superuser_rolegives the kill capability, butSHOW GRANTSonly shows the role name, not expanded privilegesSET ROLE ALL+SHOW GRANTSstill doesn't surfaceCONNECTION_ADMINbecause it's not a standard grant within the role on 8.4Result: Spirit rejects users who can actually
KILLconnections because the stringCONNECTION_ADMINnever appears in their grants.Fix
Two-layer approach: grant parsing (with role activation) as a fast path, with a direct capability probe as fallback.
1.
showGrantsWithRoles()— role-aware grant parsingRuns
SET ROLE ALLon a pinned connection beforeSHOW GRANTS, so role-inherited privileges (likeREPLICATION CLIENT) are visible. This handles most managed service cases where privileges are assigned via roles that aren't set asDEFAULT ROLE.CONNECTION_ADMINis still parsed from grants as a first-pass check. When visible (normal MySQL, or afterSET ROLE ALL), the probe is skipped entirely.2.
canKillConnections()— direct capability probe (fallback)When
CONNECTION_ADMINisn't visible in grants (e.g. RDS 8.4), we test the actual capability by spawning a victim connection:_spirit_kill_probeMySQL userKILL <victim_id>from the caller's connectionThis works because MySQL only checks kill privileges when the target thread belongs to a different user — same-user kills always succeed, and non-existent thread IDs skip the privilege check entirely (returning
ER_NO_SUCH_THREADregardless of privilege level). By targeting a real connection from a different user, we get a definitive answer:ER_KILL_DENIED_ERROR(1095) → caller lacks the privilegeER_NO_SUCH_THREAD(1094) → caller has the privilegeThe victim connection is purpose-built for this test — no risk to active workloads.
Additional changes
PROCESSis still string-matched from grants rather than capability-tested. Unlike kill capability,PROCESSis always directly granted (never hidden behind roles), and there's no cheap side-effect-free probe for it.TestCanKillConnections— new test covering root (has privilege), unprivileged user with CREATE USER but no kill (lacks privilege), and user after grantingCONNECTION_ADMINTested
rds_superuser_role— deployment succeeds with force-kill enabled