Skip to content

Test kill capability directly instead of parsing privilege labels#658

Closed
aparajon wants to merge 1 commit into
block:mainfrom
aparajon:rds-superuser-privilege-check
Closed

Test kill capability directly instead of parsing privilege labels#658
aparajon wants to merge 1 commit into
block:mainfrom
aparajon:rds-superuser-privilege-check

Conversation

@aparajon
Copy link
Copy Markdown
Collaborator

@aparajon aparajon commented Mar 12, 2026

Problem

Spirit's preflight privilege check string-matches SHOW GRANTS for CONNECTION_ADMIN. This fails on managed MySQL services where the capability is granted via roles without exposing the label in SHOW GRANTS output.

On RDS MySQL 8.4, CONNECTION_ADMIN was removed from rds_superuser_role. The user can kill connections (the capability is inherited through the role), but:

  1. GRANT CONNECTION_ADMIN ON *.* fails — the RDS admin user lacks GRANT OPTION for it
  2. Granting rds_superuser_role gives the kill capability, but SHOW GRANTS only shows the role name, not expanded privileges
  3. SET ROLE ALL + SHOW GRANTS still doesn't surface CONNECTION_ADMIN because it's not a standard grant within the role on 8.4

Result: Spirit rejects users who can actually KILL connections because the string CONNECTION_ADMIN never appears in their grants.

Fix

Two-layer approach: grant parsing (with role activation) as a fast path, with a direct capability probe as fallback.

1. showGrantsWithRoles() — role-aware grant parsing

Runs SET ROLE ALL on a pinned connection before SHOW GRANTS, so role-inherited privileges (like REPLICATION CLIENT) are visible. This handles most managed service cases where privileges are assigned via roles that aren't set as DEFAULT ROLE.

CONNECTION_ADMIN is still parsed from grants as a first-pass check. When visible (normal MySQL, or after SET ROLE ALL), the probe is skipped entirely.

2. canKillConnections() — direct capability probe (fallback)

When CONNECTION_ADMIN isn't visible in grants (e.g. RDS 8.4), we test the actual capability by spawning a victim connection:

  1. Create a temporary _spirit_kill_probe MySQL user
  2. Connect as that user to create a victim connection
  3. Attempt KILL <victim_id> from the caller's connection
  4. Clean up the probe user

This works because MySQL only checks kill privileges when the target thread belongs to a different user — same-user kills always succeed, and non-existent thread IDs skip the privilege check entirely (returning ER_NO_SUCH_THREAD regardless of privilege level). By targeting a real connection from a different user, we get a definitive answer:

  • ER_KILL_DENIED_ERROR (1095) → caller lacks the privilege
  • Success or ER_NO_SUCH_THREAD (1094) → caller has the privilege

The victim connection is purpose-built for this test — no risk to active workloads.

Additional changes

  • PROCESS is still string-matched from grants rather than capability-tested. Unlike kill capability, PROCESS is always directly granted (never hidden behind roles), and there's no cheap side-effect-free probe for it.
  • TestCanKillConnections — new test covering root (has privilege), unprivileged user with CREATE USER but no kill (lacks privilege), and user after granting CONNECTION_ADMIN

Tested

  • Deployed to AWS App Runner + RDS MySQL 8.4 with a user that has kill capability via rds_superuser_role — deployment succeeds with force-kill enabled

@aparajon aparajon force-pushed the rds-superuser-privilege-check branch from 53c367a to 63a6dee Compare March 12, 2026 18:00
Spirit's privilege preflight check string-matches SHOW GRANTS for
CONNECTION_ADMIN. This fails on managed MySQL services (e.g. RDS
MySQL 8.4) where the capability is granted via roles without
exposing the label in SHOW GRANTS output.

Replace the CONNECTION_ADMIN string check with a KILL 0 probe:
MySQL returns error 1094 (Unknown thread id) when the user has the
privilege, or error 1095 (Access denied) when they don't. This
tests the actual capability regardless of how it was granted.

Also activate granted roles (SET ROLE ALL) before SHOW GRANTS so
role-inherited privileges like REPLICATION CLIENT are visible.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Spirit’s MySQL privilege preflight to avoid relying solely on string-matching SHOW GRANTS for CONNECTION_ADMIN, improving compatibility with managed MySQL services where effective privileges can be inherited via roles and/or not shown in grants output.

Changes:

  • Add showGrantsWithRoles() to run SET ROLE ALL (session-scoped) before SHOW GRANTS, improving role-aware grant parsing.
  • Add canKillConnections() fallback probe that verifies kill capability by creating a disposable “victim” connection from another user and attempting KILL.
  • Add TestCanKillConnections coverage for the kill capability probe behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
pkg/migration/check/privileges.go Adds role-aware grant retrieval and a direct kill-capability probe fallback for force-kill preflight checks.
pkg/migration/check/privileges_test.go Extends privilege tests and adds a new test validating the kill-capability probe.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +165 to +177
if _, err := db.ExecContext(ctx, fmt.Sprintf("CREATE USER %s", probeUser)); err != nil {
return fmt.Errorf("cannot verify kill capability (CREATE USER failed): %w", err)
}
defer func() {
_, _ = db.ExecContext(context.Background(), fmt.Sprintf("DROP USER IF EXISTS %s", probeUser))
}()

// Connect as the probe user to create the victim connection.
victimCfg := gmysql.NewConfig()
victimCfg.User = probeUser
victimCfg.Net = "tcp"
victimCfg.Addr = host
victimDB, err := sql.Open("mysql", victimCfg.FormatDSN())
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The probe user is created without any authentication (CREATE USER _spirit_kill_probe) and then Spirit tries to connect with an empty password. Some MySQL configurations/policies disallow passwordless accounts, which would make the capability probe fail even when the caller has kill capability. Consider creating the probe user with a strong random password (and using it in the DSN) to make the probe work reliably across environments.

Copilot uses AI. Check for mistakes.
Comment on lines +202 to +203
errStr := killErr.Error()
if strings.Contains(errStr, "1094") || strings.Contains(errStr, "Unknown thread id") {
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canKillConnections() determines whether the error is ER_NO_SUCH_THREAD by string-matching the error text ("1094" / "Unknown thread id"). This is brittle across drivers/locales and can misclassify errors. Prefer checking the concrete MySQL error type (e.g., *mysql.MySQLError) and comparing the numeric error code (1094/1095).

Suggested change
errStr := killErr.Error()
if strings.Contains(errStr, "1094") || strings.Contains(errStr, "Unknown thread id") {
var mysqlErr *gmysql.MySQLError
if errors.As(killErr, &mysqlErr) && mysqlErr.Number == 1094 {

Copilot uses AI. Check for mistakes.
Comment on lines +151 to +155
config, err := mysql.ParseDSN(testutils.DSN())
require.NoError(t, err)
config.User = "root"
db, err := sql.Open("mysql", fmt.Sprintf("%s:%s@tcp(%s)/%s", config.User, config.Passwd, config.Addr, config.DBName))
require.NoError(t, err)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestCanKillConnections rebuilds DSNs with fmt.Sprintf instead of using the parsed mysql.Config’s FormatDSN(). This drops any parameters present in testutils.DSN() (e.g., TLS, timeouts, params), which can make the test flaky if the test DSN changes. Prefer updating the mysql.Config fields (User/Passwd/DBName) and calling FormatDSN().

Copilot uses AI. Check for mistakes.
Comment on lines +178 to +179
unprivDB, err := sql.Open("mysql", fmt.Sprintf("testkillprobe:@tcp(%s)/", host))
require.NoError(t, err)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TestCanKillConnections the unprivileged connection DSN is hard-coded as testkillprobe:@tcp(%s)/, which again bypasses mysql.Config parsing/escaping and omits any required DSN params (notably TLS). Building this DSN via mysql.Config (and using the same host/params as testutils.DSN()) will make the test more robust across environments.

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +170
const probeUser = "_spirit_kill_probe"

// Create a temporary user to own the victim connection.
if _, err := db.ExecContext(ctx, fmt.Sprintf("DROP USER IF EXISTS %s", probeUser)); err != nil {
return fmt.Errorf("cannot verify kill capability (DROP USER failed): %w", err)
}
if _, err := db.ExecContext(ctx, fmt.Sprintf("CREATE USER %s", probeUser)); err != nil {
return fmt.Errorf("cannot verify kill capability (CREATE USER failed): %w", err)
}
defer func() {
_, _ = db.ExecContext(context.Background(), fmt.Sprintf("DROP USER IF EXISTS %s", probeUser))
}()
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canKillConnections() uses a fixed global username ("_spirit_kill_probe") and unconditionally runs DROP USER IF EXISTS / CREATE USER. This can (a) clobber a legitimate existing user with that name, and (b) race if multiple Spirit instances (or tests) run concurrently against the same server. Use a per-run unique probe username (random suffix / connection id) and only drop the specific user you created; avoid deleting a pre-existing account that you didn't create.

Copilot uses AI. Check for mistakes.
@morgo
Copy link
Copy Markdown
Collaborator

morgo commented Mar 16, 2026

Implemented in #659 instead

@morgo morgo closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants