Skip to content

fix: explicitly RELEASE_LOCK before closing metadata lock connection#754

Merged
morgo merged 5 commits into
block:mainfrom
morgo:fix-flaky-resume-mdl-race
May 1, 2026
Merged

fix: explicitly RELEASE_LOCK before closing metadata lock connection#754
morgo merged 5 commits into
block:mainfrom
morgo:fix-flaky-resume-mdl-race

Conversation

@morgo
Copy link
Copy Markdown
Collaborator

@morgo morgo commented May 1, 2026

Summary

Fixes #761: MetadataLock.Close() was relying on connection close (COM_QUIT) alone to release GET_LOCK, which leaves a small window where MySQL has not yet finished tearing down the session. A fresh connection arriving in that window sees the lock as still held by another connection.

This PR calls RELEASE_LOCK on the same session before closing the connection. RELEASE_LOCK is synchronous within the session, so by the time MetadataLock.Close() returns, MySQL guarantees the named locks are no longer held — eliminating the race.

The query uses SELECT RELEASE_LOCK(?) (rather than DO RELEASE_LOCK(?)) so it's familiar to readers and handled correctly by routing layers / proxies that may not understand DO.

Relationship to #742

#742 (the flaky TestResumeFromCheckpointStrictTooOld test) was originally reported with the lock is held by another connection symptom, which is exactly the race this PR eliminates. However, after this fix lands, the test still has a separate failure mode — m2 sometimes succeeds at acquiring the lock and reading the checkpoint, but doesn't detect it as too old, so it resumes when it should error. That second mode is unrelated to lock release and will be addressed in a follow-up PR. #742 stays open until that's done.

Test plan

  • CI passes on the resume-test matrix.
  • Stress-loop TestMetadataLock / TestMetadataLockContextCancel — close-then-reacquire should now be deterministic.

🤖 Generated with Claude Code

morgo and others added 2 commits May 1, 2026 07:54
Relying on connection close (COM_QUIT) alone to release GET_LOCK leaves
a small window where MySQL has not yet finished tearing down the
session. A rapid reacquire on a new connection can see the lock as
still held, which manifests as flaky resume tests where m1 closes and
m2 immediately tries to acquire the same lock.

Calling RELEASE_LOCK on the same session before closing the connection
makes release synchronous and removes the race.

Fixes block#761.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DO is a less familiar MySQL statement and may not be handled correctly
by all proxies / routing layers. SELECT works everywhere and matches
the GET_LOCK call pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@morgo morgo force-pushed the fix-flaky-resume-mdl-race branch from 00bba5e to fe4b3db Compare May 1, 2026 13:54
@morgo morgo enabled auto-merge May 1, 2026 14:18
@morgo morgo merged commit e2970da into block:main May 1, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metadata lock not released synchronously on close — concurrent reacquire can fail

2 participants