Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: extend resolverQueue to support waiter transactions #76719

Merged
merged 4 commits into from Feb 26, 2022

Conversation

Azhng
Copy link
Contributor

@Azhng Azhng commented Feb 17, 2022

Reviewer note: only last two commits are relevant


First commit

sql: add waiter txn id to extended contention event

Previously, contention event only include the txnID of the transaction
that held the lock (a.k.a. blocking transaction). The txnID of the
waiter transaction is missing from the contention event.

This commit includes the txnID of the waiter transaction into contention
events.

Release note: None


Second commit

sql: extend resolverQueue to resolve waiter txn id

Previously, the resovlerQueue used in the contention event store only
resolved the txnID of the blocking transaction.
This commit, the resolverQueue would also resolve the txnID of the
waiting transaction.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Azhng Azhng force-pushed the ctn-waiter-txn-id branch 4 times, most recently from f2d2629 to 5130e56 Compare February 17, 2022 04:32
@Azhng Azhng marked this pull request as ready for review February 17, 2022 05:57
@Azhng Azhng requested review from a team as code owners February 17, 2022 05:57
@Azhng Azhng marked this pull request as draft February 17, 2022 05:57
@Azhng Azhng force-pushed the ctn-waiter-txn-id branch 2 times, most recently from 271a67a to 6cf0c75 Compare February 17, 2022 14:47
Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like where this is going.

Reviewed 2 of 5 files at r3, 1 of 7 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)


pkg/sql/contention/resolver.go, line 107 at r4 (raw file):

		// remainingRetries stores a mapping of each contention event to its
		// remaining number of retries attempts. The key in the map is the
		// transaction ID of the blocking transaction.

this comment is no longer true


pkg/sql/contention/resolver.go, line 172 at r4 (raw file):

		//  self-throttling once that QPS value exceed certain value.

		remoteReq, localReq := makeRPCRequestsFromBatch(currentBatch)

Now I understand that the remoteReq is to resolve blocking txn ids, and the localReq is for waiting txn ids. But it took me a while. Maybe consider renaming these variables?


pkg/sql/contention/resolver.go, line 188 at r4 (raw file):

		for _, event := range currentBatch {
			needToRetryDueToBlockingTxnID, initialRetryBudget1 :=

I wonder if a name rather than a numeric suffix would help? Could get too long, though.


pkg/sql/contention/resolver.go, line 189 at r4 (raw file):

		for _, event := range currentBatch {
			needToRetryDueToBlockingTxnID, initialRetryBudget1 :=
				maybeUpdateBlockingTxnFingerprintID(

drop the word blocking


pkg/sql/contention/resolver.go, line 233 at r4 (raw file):

	}

	if uuid.Nil.Equal(txnID) {

comment here?


pkg/sql/contention/resolver.go, line 242 at r4 (raw file):

	if _, ok := inProgressTxnIDs[txnID]; ok {
		return true /* needToRetry */, retryBudgetInfinite

nit: retryBudgetForTxnInProgress


pkg/sql/contention/resolver.go, line 335 at r4 (raw file):

func makeRPCRequestsFromBatch(
	batch []contentionpb.ExtendedContentionEvent,
) (remoteReq, localReq *serverpb.TxnIDResolutionRequest) {

consider the same renaming here


pkg/sql/contentionpb/contention.go, line 90 at r4 (raw file):

}

func hashUUID(u uuid.UUID, fnv *util.FNV64) {

nit: since this code is a little clever, it's probably worth a direct unit test that it works? (You can craft UUID bytes to show things like first half same, second half different -> different hash, etc.)


pkg/sql/contentionpb/contention.go, line 92 at r4 (raw file):

func hashUUID(u uuid.UUID, fnv *util.FNV64) {
	b := u.GetBytes()
	// Hash the blocking txn id.

maybe say more about what's happening here, why 2 calls to DecodeUint64Descending


pkg/sql/contention/resolver_test.go, line 415 at r4 (raw file):

		f.setTxnIDEntry(tc.coordinatorNodeID, tc.TxnID, tc.TxnFingerprintID)
		// TODO(azhng): wip: this is kind of a hack heh?
		f.setTxnIDEntry("local", tc.TxnID, tc.TxnFingerprintID)

delete this? Also add some more test cases: want to populate remote & local coordinators with different data? Also test the retry behavior.

@Azhng Azhng force-pushed the ctn-waiter-txn-id branch 5 times, most recently from 34a5a87 to 2e3e487 Compare February 17, 2022 22:45
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)


pkg/sql/contention/resolver.go, line 107 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

this comment is no longer true

Done.


pkg/sql/contention/resolver.go, line 172 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

Now I understand that the remoteReq is to resolve blocking txn ids, and the localReq is for waiting txn ids. But it took me a while. Maybe consider renaming these variables?

Done.


pkg/sql/contention/resolver.go, line 188 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

I wonder if a name rather than a numeric suffix would help? Could get too long, though.

Done. renamed


pkg/sql/contention/resolver.go, line 189 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

drop the word blocking

Done.


pkg/sql/contention/resolver.go, line 233 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

comment here?

Done.


pkg/sql/contention/resolver.go, line 242 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: retryBudgetForTxnInProgress

Done.


pkg/sql/contention/resolver.go, line 335 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

consider the same renaming here

Done.


pkg/sql/contentionpb/contention.go, line 90 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: since this code is a little clever, it's probably worth a direct unit test that it works? (You can craft UUID bytes to show things like first half same, second half different -> different hash, etc.)

Done.


pkg/sql/contentionpb/contention.go, line 92 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

maybe say more about what's happening here, why 2 calls to DecodeUint64Descending

Done.


pkg/sql/contention/resolver_test.go, line 415 at r4 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

delete this? Also add some more test cases: want to populate remote & local coordinators with different data? Also test the retry behavior.

Done. Extended test harness to support have different waiting/blocking txn metadata.

@Azhng Azhng marked this pull request as ready for review February 18, 2022 00:20
@@ -274,12 +272,12 @@ func (s *eventStore) forEachEvent(
}

func (s *eventStore) getEventByBlockingTxnID(
txnID uuid.UUID,
hash uint64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs renaming (by Blocking + Waiting + Timestamp hash ID)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

@Azhng Azhng force-pushed the ctn-waiter-txn-id branch 2 times, most recently from f0c5aa8 to c82d6a7 Compare February 24, 2022 19:23
Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebasing these stacked PRs makes finding just the new code confusing -- maybe something to strategize around in the future? But I think I found it here! Just a few nits, but otherwise :lgtm:

Reviewed 1 of 5 files at r10, 5 of 8 files at r11, 1 of 5 files at r14, 1 of 8 files at r15.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @Azhng, @matthewtodd, and @THardy98)


pkg/sql/contention/resolver.go, line 172 at r15 (raw file):

		//  self-throttling once that QPS value exceed certain value.

		blockTxnIDsRPCReq, waitingTxnIDsRPCReq := makeRPCRequestsFromBatch(currentBatch)

nit: maybe block -> blocking


pkg/sql/contention/resolver.go, line 172 at r15 (raw file):

		//  self-throttling once that QPS value exceed certain value.

		blockTxnIDsRPCReq, waitingTxnIDsRPCReq := makeRPCRequestsFromBatch(currentBatch)

nit: maybe remove the RPC from these names? It's redundant with Req IMO.


pkg/sql/contention/resolver.go, line 239 at r15 (raw file):

	}

	// Sometimes DistSQL engine is used by in weird ways. It is possible for a

nit: "is used by in"


pkg/sql/contention/resolver.go, line 343 at r15 (raw file):

func makeRPCRequestsFromBatch(
	batch []contentionpb.ExtendedContentionEvent,
) (blockingTxnIDRPCReq, waitingTxnIDRPCReq *serverpb.TxnIDResolutionRequest) {

nit: ditto re: the redundancy of RPC here.

Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @matthewtodd and @THardy98)


pkg/sql/contention/resolver.go, line 172 at r15 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: maybe block -> blocking

Done.


pkg/sql/contention/resolver.go, line 172 at r15 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: maybe remove the RPC from these names? It's redundant with Req IMO.

Done.


pkg/sql/contention/resolver.go, line 239 at r15 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: "is used by in"

Done.


pkg/sql/contention/resolver.go, line 343 at r15 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: ditto re: the redundancy of RPC here.

Done.

This commit introduces contention event resolver. The new resolver
implements a queue-like interface, where unresolved contention events
will be queued into the resolver. The resolution is performed by
invoking the `resolve()` method on the resolver queue, and the result of
the resolution can be retrieved by calling `dequeue()` method.

Release note: None
This commit introduces contention event store, an in-memory bounded
FIFO store that stores historical contention events and map collected
contention events to their corresponding transaction fingerprint IDs.

Release note (sql change): A new cluster setting
`sql.contention.event_store.capacity` is added . This cluster setting
can be used to control the in-memory capacity of contention event store.
When this setting is set to zero, the contention event store is
disabled.
Previously, contention event only include the txnID of the transaction
that held the lock (a.k.a. blocking transaction). The txnID of the
waiter transaction is missing from the contention event.

This commit includes the txnID of the waiter transaction into contention
events.

Release note: None
Previously, the resovlerQueue used in the contention event store only
resolved the txnID of the blocking transaction.
This commit, the resolverQueue would also resolve the txnID of the
waiting transaction.

Release note: None
@Azhng
Copy link
Contributor Author

Azhng commented Feb 26, 2022

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 26, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants