Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: read lease under mutex when switching lease type #124223

Merged

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented May 15, 2024

A race could occur when a replica queue and post lease application both attempted to switch the lease type. This race would cause the queue to not process the replica because the lease type had already changed. As a result, lease preference violations might not have been quickly resolved by the lease queue.

Read the lease under the same mutex used for requesting the lease, when possibly switching the lease type.

Resolves: #123998
Release note: None

Copy link

blathers-crl bot commented May 15, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@kvoli kvoli self-assigned this May 15, 2024
@kvoli kvoli added the backport-24.1.x Flags PRs that need to be backported to 24.1. label May 15, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli force-pushed the 240513.consistent-lease-status-for-switch branch from cab4b67 to d08fe81 Compare May 15, 2024 17:19
@kvoli kvoli marked this pull request as ready for review May 15, 2024 19:51
@kvoli kvoli requested a review from a team as a code owner May 15, 2024 19:51
@kvoli kvoli requested a review from nvanbenschoten May 15, 2024 19:51
@kvoli
Copy link
Collaborator Author

kvoli commented May 16, 2024

This passes 10/10 tests for lease-preferences/manual-violating-transfer. Previously, it would fail 1/3 of the time.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/replica_range_lease.go line 1621 at r1 (raw file):

		defer r.mu.Unlock()

		st := r.leaseStatusAtRLocked(ctx, r.store.Clock().NowAsClockTimestamp())

We should pull the r.store.Clock().NowAsClockTimestamp() outside of the replica mutex.


pkg/kv/kvserver/replica_range_lease.go line 1627 at r1 (raw file):

		if !r.hasCorrectLeaseTypeRLocked(st.Lease) {
			return r.requestLeaseLocked(ctx, st, nil /* limiter */)

ignorable nit: consider structuring this to separate the no-op cases from the switch lease case:

if !st.OwnedBy(r.store.StoreID()) {
    return nil
}
if r.hasCorrectLeaseTypeRLocked(st.Lease) {
    return nil
}
return r.requestLeaseLocked(ctx, st, nil /* limiter */)

A race could occur when a replica queue and post lease application both
attempted to switch the lease type. This race would cause the queue to
not process the replica because the lease type had already changed. As a
result, lease preference violations might not have been quickly
resolved by the lease queue.

Read the lease under the same mutex used for requesting the lease, when
possibly switching the lease type.

Resolves: cockroachdb#123998
Release note: None
@kvoli kvoli force-pushed the 240513.consistent-lease-status-for-switch branch from d08fe81 to 574f667 Compare May 16, 2024 17:18
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/replica_range_lease.go line 1621 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should pull the r.store.Clock().NowAsClockTimestamp() outside of the replica mutex.

Done.


pkg/kv/kvserver/replica_range_lease.go line 1627 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

ignorable nit: consider structuring this to separate the no-op cases from the switch lease case:

if !st.OwnedBy(r.store.StoreID()) {
    return nil
}
if r.hasCorrectLeaseTypeRLocked(st.Lease) {
    return nil
}
return r.requestLeaseLocked(ctx, st, nil /* limiter */)

This is easier to follow, updated.

@kvoli
Copy link
Collaborator Author

kvoli commented May 17, 2024

bors r=nvanbenschoten

@craig craig bot merged commit 3732a92 into cockroachdb:master May 17, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.1.x Flags PRs that need to be backported to 24.1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: lease-preferences/manual-violating-transfer failed
3 participants