Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: shallow copy BatchRequest on mutate in tryBumpBatchTimestamp #124634

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

lyang24
Copy link
Contributor

@lyang24 lyang24 commented May 23, 2024

This avoids a data race on tryBumpBatchTimestamp, which was fallout from the new logging introduced in ba13697.

Fixes: #124553

Release note: None

Copy link

blathers-crl bot commented May 23, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

I have added a few people who may be able to assist in reviewing:

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels May 23, 2024
@blathers-crl blathers-crl bot requested review from kvoli and stevendanna May 23, 2024 22:19
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link

blathers-crl bot commented May 23, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@lyang24 lyang24 marked this pull request as ready for review May 23, 2024 22:22
@lyang24 lyang24 requested a review from a team as a code owner May 23, 2024 22:22
txn := ba.Txn.Clone()
txn.BumpReadTimestamp(ts)
readTs := ba.Txn.ReadTimestamp
ba = ba.ShallowCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't currently have the desired effect, as we're not doing anything with the shallow copy. We'll need to return it if we don't want to mutate the provided reference.

Copy link
Contributor Author

@lyang24 lyang24 May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for guidance, I am trying to understand this pattern - the data race happens a pointer level, we return a new shallow copy pointer on mutate to prevent race on the same pointer down the line. But this approach cannot prevent the race of the value in memory right?

@lyang24 lyang24 force-pushed the shallow_copy_ba branch 2 times, most recently from 9dfcdaf to a15b420 Compare May 25, 2024 05:00
@nvanbenschoten nvanbenschoten added the backport-24.1.x Flags PRs that need to be backported to 24.1. label May 28, 2024
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @lyang24, and @stevendanna)


pkg/kv/kvserver/replica_batch_updates.go line 245 at r1 (raw file):

Previously, lyang24 (Lanqing Yang) wrote…

Thanks for guidance, I am trying to understand this pattern - the data race happens a pointer level, we return a new shallow copy pointer on mutate to prevent race on the same pointer down the line. But this approach cannot prevent the race of the value in memory right?

I don't fully understand the question. There's aliasing on the heap memory, so we treat it as copy-on-write to avoid mutating the memory that others are reading.


pkg/kv/kvserver/replica_batch_updates.go line 233 at r2 (raw file):

	if ba.Txn == nil {
		log.VEventf(ctx, 2, "bumping batch timestamp to %s from %s", ts, ba.Timestamp)
		ba.Timestamp = ts

We'll need a ba = ba.ShallowCopy() here as well.


pkg/kv/kvserver/replica_batch_updates.go line 247 at r2 (raw file):

	shallowCopy := ba.ShallowCopy()
	shallowCopy.Txn = txn
	shallowCopy.Timestamp = readTs // Refresh just updated ReadTimestamp

We should just be able to do:

ba = ba.ShallowCopy()
ba.Txn = ba.Txn.Clone()
ba.Txn.BumpReadTimestamp(ts)
ba.Timestamp = ba.Txn.ReadTimestamp // Refresh just updated ReadTimestamp
return ba, true

pkg/kv/kvserver/replica_read.go line 493 at r2 (raw file):

		latchesHeld := g != nil
		var serverSideRetryOk bool
		ba, serverSideRetryOk = canDoServersideRetry(ctx, pErr, ba, g, hlc.Timestamp{})

I don't think we want to call canDoServersideRetry if latchesHeld == false.


pkg/kv/kvserver/replica_send.go line 813 at r2 (raw file):

	// re-acquire them if the retry is allowed.
	var serverSideRetryOk bool
	if ba, serverSideRetryOk = canDoServersideRetry(ctx, pErr, ba, nil /* g */, hlc.Timestamp{} /* deadline */); !serverSideRetryOk {

nit: var ok bool is probably sufficient here. It's pretty clear what the value's meaning is without the longer serverSideRetryOk. Same thing elsewhere in this PR.

Copy link
Contributor Author

@lyang24 lyang24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @nvanbenschoten, and @stevendanna)


pkg/kv/kvserver/replica_batch_updates.go line 245 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't fully understand the question. There's aliasing on the heap memory, so we treat it as copy-on-write to avoid mutating the memory that others are reading.

oh my original thinking is this technique works great for read write data race (that is the case we care solving) but for future references this approach will probably not work for write write data race no?


pkg/kv/kvserver/replica_batch_updates.go line 233 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We'll need a ba = ba.ShallowCopy() here as well.

good call thanks


pkg/kv/kvserver/replica_batch_updates.go line 247 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should just be able to do:

ba = ba.ShallowCopy()
ba.Txn = ba.Txn.Clone()
ba.Txn.BumpReadTimestamp(ts)
ba.Timestamp = ba.Txn.ReadTimestamp // Refresh just updated ReadTimestamp
return ba, true

done thanks


pkg/kv/kvserver/replica_read.go line 493 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think we want to call canDoServersideRetry if latchesHeld == false.

great call


pkg/kv/kvserver/replica_send.go line 813 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: var ok bool is probably sufficient here. It's pretty clear what the value's meaning is without the longer serverSideRetryOk. Same thing elsewhere in this PR.

thanks will apply the same thinking in future


pkg/kv/kvserver/replica_write.go line 400 at r2 (raw file):

	minCommitTS := r.MinTxnCommitTS(ctx, ba.Txn.ID, ba.Txn.Key)
	if ba.Timestamp.Less(minCommitTS) {
		ba.Txn.WriteTimestamp = minCommitTS

I am reviewing it again, we might need a shallow copy here?

Copy link
Contributor Author

@lyang24 lyang24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @nvanbenschoten, and @stevendanna)


pkg/kv/kvserver/replica_read.go line 496 at r3 (raw file):

			ba, ok = canDoServersideRetry(ctx, pErr, ba, g, hlc.Timestamp{})
		}
		if !latchesHeld || !ok {

we might just get aways with !ok as the condition here, up to the style preference

This avoids a data race on tryBumpBatchTimestamp, which was fallout from the
new logging introduced in ba13697.

Fixes: cockroachdb#124553

Release note: None
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lyang24!

bors r+

@craig craig bot merged commit ddf5ef9 into cockroachdb:master Jun 12, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.1.x Flags PRs that need to be backported to 24.1. O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ccl/streamingccl/streamingest: TestAlterTenantUpdateExistingCutoverTime failed
3 participants