Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: Refresh requests should use WaitPolicy_Error #108190

Merged
merged 4 commits into from
Aug 26, 2023

Conversation

miraradeva
Copy link
Contributor

See individual commits.

@miraradeva miraradeva requested a review from a team as a code owner August 4, 2023 17:43
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more things to do:

  1. Make sure both changes (and especially the refresh one) are migration safe.
  2. Add a test for a multi-version cluster.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@miraradeva miraradeva changed the title Mira 104728 kv: Refresh requests should use WaitPolicy_Error Aug 4, 2023
@miraradeva miraradeva force-pushed the mira-104728 branch 2 times, most recently from 8f11997 to afcfd64 Compare August 9, 2023 16:55
Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the 2 todos above. @nvanbenschoten this should be ready for review.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing comments for the first commit. Will continue to review the second.

Reviewed 20 of 20 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva)


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 169 at r1 (raw file):

			switch state.kind {
			case waitFor, waitForDistinguished:
				if req.WaitPolicy == lock.WaitPolicy_Error {

(discussed on a call) Now that this logic more closely parallels the normal path, do you think we can get rid of this branch entirely and instead treat wait-policy handling like we treat the other reasons to perform a push? Something like:

...
deadlockPush := true
waitPolicyPush := req.WaitPolicy == lock.WaitPolicy_Error

...

if waitPolicyPush {
    delay = 0
}

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 179 at r1 (raw file):

						err = w.pushLockTxn(ctx, req, state)
					} else {
						err = newWriteIntentErr(req, state, reasonWaitPolicy)

(discussed on a call) can we remove this case and the corresponding case in pushNoWait? Are they just optimizations?


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 502 at r1 (raw file):

	if !w.st.Version.IsActive(ctx, clusterversion.V23_2_RemoveLockTableWaiterTouchPush) &&
		req.WaitPolicy == lock.WaitPolicy_Error {

nit: to avoid the atomic load in the common case, swap this condition around.


pkg/kv/kvserver/concurrency/testdata/concurrency_manager/lock_timeout line 108 at r1 (raw file):

[4] sequence reqTimeout1: lock wait-queue event: wait for (distinguished) txn 00000001 holding lock @ key ‹"k"› (queuedWriters: 0, queuedReaders: 1)
[4] sequence reqTimeout1: pushing after 0s for: liveness detection = true, deadlock detection = true, timeout enforcement = true, priority enforcement = false
[4] sequence reqTimeout1: pushing txn 00000001 to check if abandoned

(discussed on a call) can we revive this logic for the mixed-version case by creating a lock_timeout_v23_1 file and a wait_policy_error_v23_1 file and then hooking the framework up to run under the older cluster version for these files?


pkg/kv/kvserver/txnwait/queue_test.go line 126 at r1 (raw file):

				shouldPush := ShouldPushImmediately(&req, pusheeStatus, wp)
				if waitPolicyError || forcePush {
					require.Equal(t, true, shouldPush)

nit: require.True


pkg/kv/kvserver/txnwait/queue_test.go line 147 at r1 (raw file):

	}
	for _, test := range testCases {
		t.Run("", func(t *testing.T) {

Did we lose the subtest for each testCase? Ideally, we would re-introduce them and then give each subtest a descriptive name.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva)


pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go line 535 at r2 (raw file):

			msg.Printf(" due to a conflict: %s on key %s", err.FailureReason(), err.Key)
		} else if wiErr, ok := refreshErr.GetDetail().(*kvpb.WriteIntentError); ok {
			msg.Printf(" due to a conflict: intent on key %s", wiErr.Intents)

WriteIntentError.Intents is a slice of roachpb.Intent structs. Does this format correctly?


pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go line 579 at r2 (raw file):

	refreshSpanBa := &kvpb.BatchRequest{}
	refreshSpanBa.Txn = refreshToTxn
	// Ensure that refresh requests do not block on conflicts in the lock table.

Could you add a sentence or two about what these requests do instead of blocking, and how this is handled by the txnSpanRefresher?


pkg/kv/kvclient/priority_refresh_test.go line 26 at r2 (raw file):

)

func TestRefreshWithHighPriority(t *testing.T) {

Could we give this test a comment that explains what it's testing? We could also reference #104728.


pkg/kv/kvclient/priority_refresh_test.go line 47 at r2 (raw file):

	// 3 seconds is the closed timestamp target duration. Writing to a timestamp
	// below that will be treated as a read-write conflict.
	time.Sleep(3 * time.Second)

Instead of waiting out the full three seconds, can we drop the closed timestamp target to something shorter to speed up the test?


pkg/kv/kvserver/batcheval/cmd_refresh_range_test.go line 304 at r2 (raw file):

				},
				Timestamp:  ts3,
				WaitPolicy: lock.WaitPolicy_Error,

Do we want to test with and without this while we're in the mixed version state?


pkg/kv/kvserver/batcheval/cmd_refresh_test.go line 94 at r2 (raw file):

				},
				Timestamp:  ts3,
				WaitPolicy: lock.WaitPolicy_Error,

Do we want to test with and without this while we're in the mixed version state?


pkg/kv/kvserver/concurrency/concurrency_manager_test.go line 1031 at r2 (raw file):

// Its logic mirrors that in Replica.collectSpans.
func (c *cluster) collectSpans(
	t *testing.T, txn *roachpb.Transaction, ts hlc.Timestamp, reqs []kvpb.Request, wp lock.WaitPolicy,

nit: let's order wp before reqs to keep the header-related fields together in the signature.


pkg/kv/kvserver/concurrency/datadriven_util_test.go line 217 at r2 (raw file):

	case "refresh":
		var r kvpb.RefreshRequest
		r.Sequence = maybeGetSeq()

RefreshRequests don't receive sequence numbers, so we can remove this.

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go line 535 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

WriteIntentError.Intents is a slice of roachpb.Intent structs. Does this format correctly?

I'll use the error's own print function instead.


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 179 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

(discussed on a call) can we remove this case and the corresponding case in pushNoWait? Are they just optimizations?

Yes, I think they're just optimizations. I removed them and I see a few more pushes in the data-driven tests but everything else seems correct.


pkg/kv/kvserver/concurrency/testdata/concurrency_manager/lock_timeout line 108 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

(discussed on a call) can we revive this logic for the mixed-version case by creating a lock_timeout_v23_1 file and a wait_policy_error_v23_1 file and then hooking the framework up to run under the older cluster version for these files?

This worked out well for verifying that TOUCH_PUSH is used for v23_1, which is the only version-gated change and the main thing we want to test in these v23_1 files. But I couldn't just leave the lock_timeout and wait_policy_error the same as on master and add the v23_1 suffix because removing the two writeIntentError optimizations caused some extra pushes, as expected. I don't think we want to version gate the removal of these optimizations, so it makes sense that there would be a small change in the expected output. Let me know if you have any suggestions to make this neater.


pkg/kv/kvserver/txnwait/queue_test.go line 147 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did we lose the subtest for each testCase? Ideally, we would re-introduce them and then give each subtest a descriptive name.

My goal was to replace t.Run("", func(t *testing.T) with the following to be able to test both forcePush and waitPolicyError variants less verbosely. I didn't think I lost any tests but maybe I'm missing something? I see 48 * 4 test cases run.

testutils.RunTrueAndFalse(t, "forcePush", func(t *testing.T, forcePush bool) {
  testutils.RunTrueAndFalse(t, "waitPolicyError", func(t *testing.T, waitPolicyError bool) {

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 18 of 18 files at r3, 14 of 14 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva)


pkg/kv/kvserver/concurrency/testdata/concurrency_manager/lock_timeout line 108 at r1 (raw file):

Let me know if you have any suggestions to make this neater.

My only suggestion would be to pull this behavior change out into a separate commit. This allows us to refer to it directly (if it causes any issues) and makes it clearer where these changes are coming from.


pkg/kv/kvclient/kvcoord/priority_refresh_test.go line 37 at r4 (raw file):

//
// This can potentially repeat indefinitely as the high-priority txn retries,
// causing a livelock.

Could we add an explanation of why this livelock is no longer possible?


pkg/kv/kvclient/kvcoord/priority_refresh_test.go line 43 at r4 (raw file):

	st := cluster.MakeTestingClusterSettings()
	closedts.TargetDuration.Override(context.Background(), &st.SV, 0)

I fear this may cause the test to be flaky because it impacts all transactions in the system. For the sake of keeping the test targetted, perhaps we should imitate the closed timestamp by performing a conflicting read (kvDB.Get(ctx, keyB)) before txn2's Put to keyB. That should have the same effect.


pkg/kv/kvserver/concurrency/concurrency_manager_test.go line 103 at r3 (raw file):

	datadriven.Walk(t, datapathutils.TestDataPath(t, "concurrency_manager"), func(t *testing.T, path string) {
		c := newCluster()
		if strings.Contains(path, "_v23_1") {

nit: strings.HasSuffix is slightly better.


pkg/kv/kvserver/concurrency/lock_table_waiter_test.go line 710 at r3 (raw file):

				expBlockingPush := !timeoutBeforePush
				sawBlockingPush := false
				sawNonBlockingPush := false

Do we want to assert on the WaitPolicy instead?

@miraradeva miraradeva requested a review from a team as a code owner August 17, 2023 13:49
@miraradeva miraradeva requested review from rachitgsrivastava and DarrylWong and removed request for a team August 17, 2023 13:49
Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made changes based on your comments, and I'll start a slack thread with the other transaction people to figure out how to resolve the conflicting change in #105717.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @nvanbenschoten, and @rachitgsrivastava)


pkg/kv/kvserver/concurrency/testdata/concurrency_manager/lock_timeout line 108 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let me know if you have any suggestions to make this neater.

My only suggestion would be to pull this behavior change out into a separate commit. This allows us to refer to it directly (if it causes any issues) and makes it clearer where these changes are coming from.

Great! I think that worked out well.


pkg/kv/kvserver/concurrency/lock_table_waiter_test.go line 710 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we want to assert on the WaitPolicy instead?

At this point in the test we know we expect a non-blocking push, so I added:

require.True(t, sawNonBlockingPush)

@miraradeva miraradeva force-pushed the mira-104728 branch 2 times, most recently from 56490d7 to c3ade96 Compare August 17, 2023 16:35
Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just doing a drive-by review of the things related to plumbing conflicting transaction meta; it all looks good to me!

Reviewed 5 of 17 files at r11, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @nvanbenschoten, and @rachitgsrivastava)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 38 of 38 files at r9, 27 of 27 files at r10, 17 of 17 files at r11, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @miraradeva, and @rachitgsrivastava)


-- commits line 7 at r9:
"claimant"


pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go line 520 at r11 (raw file):

}

func newRetryErrorOnFailedPreemptiveRefresh(txn *roachpb.Transaction, err *kvpb.Error) *kvpb.Error {

nit: s/err/pErr/


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 250 at r9 (raw file):

				log.VEventf(ctx, 3, "pushing after %s for: "+
					"liveness detection = %t, deadlock detection = %t, "+
					"timeout enforcement = %t, priority enforcement = %t",

Should we update this log line with the new waitPolicyPush reason? Doing so will impact all datadriven tests, so consider doing it in a separate commit. --rewrite will be your friend.


pkg/kv/kvclient/kvcoord/priority_refresh_test.go line 43 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I fear this may cause the test to be flaky because it impacts all transactions in the system. For the sake of keeping the test targetted, perhaps we should imitate the closed timestamp by performing a conflicting read (kvDB.Get(ctx, keyB)) before txn2's Put to keyB. That should have the same effect.

Thanks for updating. Should we remove this line now? Also, we'll want to update the "in this test, the closed timestamp target" text up above as well.


pkg/kv/kvclient/kvcoord/priority_refresh_test.go line 65 at r11 (raw file):

	_, err = txn2.Get(ctx, keyA)
	require.NoError(t, err)
	_, err = kvDB.Get(ctx, keyB)

nit: let's give this line a comment.


pkg/kv/kvserver/concurrency/lock_table_waiter_test.go line 710 at r3 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

At this point in the test we know we expect a non-blocking push, so I added:

require.True(t, sawNonBlockingPush)

Should we be checking h.WaitPolicy in ir.pushTxn though?

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @DarrylWong, @nvanbenschoten, and @rachitgsrivastava)


pkg/kv/kvclient/kvcoord/priority_refresh_test.go line 43 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for updating. Should we remove this line now? Also, we'll want to update the "in this test, the closed timestamp target" text up above as well.

Sorry, I missed updating this.


pkg/kv/kvserver/concurrency/lock_table_waiter_test.go line 710 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we be checking h.WaitPolicy in ir.pushTxn though?

Yes, but to make it work in this commit I had to move the part that propagates the wait policy in the push header from another commit. It does belong here though because without that, push requests have wait policy block, and after removing the optimization, they will block.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:, but the CI failure of TestCleanupIntentsDuringBackupPerformanceRegression is interesting. The test is timing based, so it is subject to flaking in extreme cases, but in practice, we rarely see it flake, especially not by this much. Does the test get slower due to this patch?

Reviewed 33 of 33 files at r12, 22 of 22 files at r13, 12 of 12 files at r14, 24 of 24 files at r15, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong and @rachitgsrivastava)

@dt
Copy link
Member

dt commented Aug 22, 2023

CI failure of TestCleanupIntentsDuringBackupPerformanceRegression is interesting

Heads up that as of #108907, that test is gone. If it really did catch anything interesting in this case, perhaps we should bring back some form of it.

@miraradeva
Copy link
Contributor Author

Thank you @nvanbenschoten and @dt for spotting this and for the context. I stressed the test in different scenarios (1000 runs):

  1. From my branch: fails reliably within 100-200 runs.
  2. From master (before the test was removed): passes in ~2 minutes.
  3. From my branch up to and including the WriteIntentError commit: failed on run 220.
  4. From my branch up to and including the touch push and wait policy commit: failed on run 260 (but passes in ~20 min if I remove the WriteIntentError optimization changes).
  5. From my branch with all commits except the WriteIntentError optimization changes: passes in ~20 minutes.

So it definitely looks related to this PR, and most likely the WriteIntentError optimization. I'll look for the root cause some more and then probably bring this test back in some form in kvserver.

@miraradeva
Copy link
Contributor Author

Actually, point 2 above was wrong. It took 2 min because the test was already not on master, so it didn't run anything. Without any of my changes it still takes ~16-17 min for 1000 runs of the test. So it seems like removing the WriteIntentError optimization slows it down significantly to where it fails, but the rest of my change slows it down a little as well.

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so the failing test did catch a real bug in my PR: upon a successful push in the WaitOn function, the function returned nil instead of continuing to wait for further state changes. In the backup test in particular, this resulted in unnecessary sequencing and pushing of requests instead of resolving the abandoned intents in batch (which is done in ResolveDeferredIntents in the doneWaiting state).

In this revision, I fixed the bug and added one more clear-intents data-driven test that covers requests with wait policy error. @nvanbenschoten let me know if you think more of the clear-intent tests would be helpful for wait policy error requests.

In a separate PR, I will also revive the removed backup test and try to convert it to a non-time based one.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong, @nvanbenschoten, and @rachitgsrivastava)


pkg/kv/kvserver/concurrency/testdata/concurrency_manager/clear_abandoned_intents_wait_policy_error line 84 at r19 (raw file):

[3] sequence req1: pushing timestamp of txn 00000002 above 10.000000000,1
[3] sequence req1: resolving intent ‹"a"› for txn 00000002 with ABORTED status
[3] sequence req1: lock wait-queue event: done waiting

Before fixing the bug, this same cleanup of intents was done in two rounds of sequencing instead of just one; it was not the case that every intent was cleaned up separately. Instead, first the intent on key "a" was resolved as part of the push, then the function returned, and in another round of sequencing, conflictsWithLockHolders added the rest of the intents to the toResolve slice and they were resolved in the next iteration of WaitOn, which went straight to doneWaiting.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: thanks for tracking down the cause of the failure. I agree with the plan to merge this and then revive the backup test without a timing dependency. We also discussed adding a second variant of it where the other txns are all still pending instead of being aborted.

Reviewed 52 of 52 files at r16, 23 of 23 files at r17, 12 of 12 files at r18, 25 of 25 files at r19, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong and @rachitgsrivastava)

@miraradeva
Copy link
Contributor Author

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Aug 25, 2023

Build failed (retrying...):

@arulajmani
Copy link
Collaborator

bors r-

@miraradeva you might need to rebase this on top of #109454. Sorry for causing the trouble.

@craig
Copy link
Contributor

craig bot commented Aug 25, 2023

Canceled.

The lock table waiter included an optimization to return a
WriteIntentError immediately, and not attempt a push, if a lock in the
lock table is currently not held but is claimed by a transaction. This
helps prevent a potentially pointless push because we know the claimant
transaction is active. This optimization was used in two scenarios:
(1) for requests with WaitPolicy_Error, and (2) after the lock deadline
has expired.

This commit removes the optimization in an effort to simplify the lock
table waiter code. We might observe an increase in push requests as a
result but hopefully not significant, given the specific scenarios in
which this optimization was enabled.

Informs: cockroachdb#104728

Release note: None
Previously, a request with WaitPolicy_Error could only push a
transaction using PUSH_TOUCH to check that the transaction is not
abandoned. However, sometimes it is useful for a WaitPolicy_Error
request to be able to push the timestamp of a transaction with a lower
priority (see cockroachdb#104728).

As part of implementing the above, this change also simplifies the
handling of WaitPolicy_Error by removing the use of PUSH_TOUCH in the
lock table waiter, and using a regular (PUSH_TIMESTAMP or PUSH_ABORT)
push instead. The wait policy is passed as part of the push request and
handled at the server/pushee side to ensure that if the push fails the
request does not wait in the txn wait queue.

Fixes: cockroachdb#104728
Epic: CRDB-28698

Release note: None
Previously, refresh requests did not declare any locks and did not go
through the lock table; they either evaluated successfully or returned
an error (because they encountered a committed value or an intent).
However, in some cases it is convenient to have high-priority refresh
requests push lower-priority transactions instead of failing due to the
encountered intent. See cockroachdb#104728 for one such scenario that can result
in a livelock.

This change lets refresh requests declare isolated keys and go through
the lock table. If a high-priority refresh encounters a lower-priority
contending transaction, it will push its timestamp and evaluate
successfully. If pushing with priority is not possible, the refresh
request will return an error without blocking on the contending
transaction; to ensure that, refresh requests use WaitPolicy_Error.

Fixes: cockroachdb#104728

Epic: CRDB-28698

Release note (bug fix): Prevents a potential livelock between a
high-priority transactional read and a normal-priority write. The read
pushes the timestamp of the write, but if the read gets pushed as well,
it may repeatedly fail to refresh because it keeps encountering the
intent of the write. See cockroachdb#104728 for more details.
This patch adds a phrase to the push log message to denote if the push
has wait policy error or not. As a result, the text in many data-driven
tests changes.

Informs: cockroachdb#104728

Release note: None
@miraradeva
Copy link
Contributor Author

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Aug 26, 2023

Build succeeded:

@craig craig bot merged commit 58204d3 into cockroachdb:master Aug 26, 2023
7 of 8 checks passed
miraradeva added a commit to miraradeva/cockroach that referenced this pull request Aug 29, 2023
The TestCleanupIntentsDuringBackupPerformanceRegression was recently
removed mosrtly because it was a flaky time-based unit test. Right
before being removed it caught a bug in cockroachdb#108190, so this patch revives
the test in a non-timed version. Instead, it checks the number of
intent resolution batches and push requests, and ensures they are of
the right order of magnitude.

Informs: cockroachdb#108226

Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this pull request Aug 31, 2023
The TestCleanupIntentsDuringBackupPerformanceRegression was recently
removed mosrtly because it was a flaky time-based unit test. Right
before being removed it caught a bug in cockroachdb#108190, so this patch revives
the test in a non-timed version. Instead, it checks the number of
intent resolution batches and push requests, and ensures they are of
the right order of magnitude.

Informs: cockroachdb#108226

Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this pull request Sep 18, 2023
The TestCleanupIntentsDuringBackupPerformanceRegression was recently
removed mostly because it was a flaky time-based unit test. Right
before being removed it caught a bug in cockroachdb#108190, so this patch revives
the test in a non-timed version. Instead, it checks the number of
intent resolution batches and push requests, and ensures they are of
the right order of magnitude.

Informs: cockroachdb#108226

Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this pull request Sep 21, 2023
The TestCleanupIntentsDuringBackupPerformanceRegression was recently
removed mostly because it was a flaky time-based unit test. Right
before being removed it caught a bug in cockroachdb#108190, so this patch revives
the test in a non-timed version. Instead, it checks the number of
intent resolution batches and push requests, and ensures they are of
the right order of magnitude. It also extends the test to include
intents from both aborted and pending transactions.

Informs: cockroachdb#108226

Release note: None
craig bot pushed a commit that referenced this pull request Sep 22, 2023
109689: backupccl: Revive TestCleanupIntentsDuringBackupPerformanceRegression r=nvanbenschoten a=miraradeva

The TestCleanupIntentsDuringBackupPerformanceRegression was recently removed mosrtly because it was a flaky time-based unit test. Right before being removed it caught a bug in #108190, so this patch revives the test in a non-timed version. Instead, it checks the number of intent resolution batches and push requests, and ensures they are of the right order of magnitude.

Informs: #108226

Release note: None

111106: changefeedccl: Support pubsub emulator in new pubsub sink r=miretskiy a=miretskiy

The new implementation of pubsub sink uses a more
restrictive (PublisherClient) interface than the old one (Client).  As a result, it has lost support for the official pubsub emulation client
https://pkg.go.dev/cloud.google.com/go/pubsub#hdr-Emulator

This PR improves testability of pubsub sink by supporting pubsub emulator.

Informs #https://github.com/cockroachlabs/support/issues/2512
Epic: None

Release note: None

111113: roachprod: enhance/clarify terminology r=herkolategan,renatolabs a=knz

Fixes #111103.
Epic: CRDB-29380

We want to distinguish the concepts for "tenant", "virtual cluster", "SQL/HTTP instance" through the team. For this it is useful when the tools and their source code also reflect these concepts.

This commit achieves that.

Release note: None


Co-authored-by: Mira Radeva <mira@cockroachlabs.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants