kv: resolve conflicting intents immediately for latency-sensitive requests #103265

nvanbenschoten · 2023-05-14T17:51:32Z

Fixes #50390.
Related to #60585.
Related to #103126.

Closes #64500, which was an earlier attempt to solve this issue using a similar approach. This commit addresses the comments on that PR, which focused on the pagination of intent resolution when bypassing the request batcher. We still don't try to solve this issue, and instead limit the cases where intent resolution bypasses the request batcher to those where pagination is not necessary.

This commit adds a new sendImmediately option to the IntentResolver ResolveOptions, which instructs the IntentResolver to send the provided intent resolution requests immediately, instead of adding them to a batch and waiting up to 10ms (defaultIntentResolutionBatchWait) for that batch to fill up with other intent resolution requests. This can be used to avoid any batching-induced latency and should be used only by foreground traffic that is willing to trade off some throughput for lower latency.

The commit then sets this flag for single-key intent resolution requests initiated by the lockTableWaiter. Unlike most calls into the IntentResolver, which are performed by background tasks that are more than happy to trade increased latency for increased throughput, the lockTableWaiter calls into the IntentResolver on behalf of a foreground operation. It wants intent resolution to complete as soon as possible, so it is willing to trade reduced throughput for reduced latency.

I tested this out by writing 10,000 different intents in a normal-priority transaction and then scanning over the table using a high-priority transaction. The test was performed on a 3-node roachprod cluster to demonstrate the effect with realistic network and disk latencies.

-- session 1
CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY);
BEGIN; INSERT INTO keys SELECT generate_series(1, 10000);

-- session 2
BEGIN PRIORITY HIGH; SELECT count(*) FROM keys;

Without this change, the scan took 70.078s. With this change, the scan took 15.958s. This 78% speed-up checks out. Each encountered intent is resolved serially (see #103126), so the per-intent latency drops from 7ms to 1.6ms. This improvement by about 5ms agrees with the defaultIntentResolutionBatchIdle, which delays each resolution request that passes through a RequestBatcher. With this change, these resolve intent requests are issued immediately and this delay is not experienced.

While this is a significant improvement and will be important for Read Committed transactions after #102014, this is still not quite enough to resolve #103126. For that, we need to batch the resolve intent requests themselves using a form of deferred intent resolution like we added in #49218 (for finalized transactions).

A similar improvement is seen for scans that encounter many abandoned intents from many different transactions. This scenario bypasses the deferred intent resolution path added in #49218, because the intents are each written by different transactions. The speedup for this scenario was presented in #64500.

Release note (performance improvement): SQL statements that must clean up intents from many different previously abandoned transactions now do so moderately more efficiently.

cockroach-teamcity · 2023-05-14T17:51:45Z

This change is

erikgrinaker · 2023-05-14T22:47:33Z

Drive-by: are these requests now subject to admission control, or will that require #97108? Is there a risk that this more aggressive foreground resolution can overload clusters? We often reach for high-priority scans to resolve intents during escalations, and would want to avoid those causing more trouble.

nvanbenschoten · 2023-05-14T23:39:52Z

That's a good question. These requests are not yet subject to admission control. However, I don't see much of a risk of this change overloading clusters, because we only bypass the request batcher for single, point intent resolution. That means that the kinds of bulk multi-intent resolution we hit during high-priority scans over intent-ridden keyspaces won't behave any differently. Another way to look at this is that we only bypass the request batcher (and its not entirely intentional throttling effects) for forms of intent resolution that aren't amplifying the user request — we only do so when there's a 1:1 relationship between a user request and the intent resolution request.

arulajmani

Thanks for reviving this! Do you think it's worth opening an issue to expand the cases we bypass the batcher to include deferred intent resolution by foreground requests as well?

Reviewed 2 of 2 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/intentresolver/intent_resolver.go line 937 at r3 (raw file):

// ResolveIntent synchronously resolves an intent according to opts.
func (ir *IntentResolver) ResolveIntent(

Consider adding a comment explaining that this method should only be called by foreground requests as it sets the sendImmediately flag; it's true now, so mostly for future modifiers of this code.

pkg/kv/kvserver/intentresolver/intent_resolver.go line 995 at r3 (raw file):

		if err := ir.db.Run(ctx, b); err != nil {
			return b.MustPErr()
		}

nit: consider early returning here; that'll let you can get rid of the else statement and have some nesting back.

pkg/kv/kvserver/concurrency/lock_table_waiter_test.go line 379 at r2 (raw file):

						require.Equal(t, roachpb.ABORTED, intent.Status)
						require.Zero(t, intent.ClockWhilePending)
						require.Equal(t, true, opts.Poison)

nit: here, and elsewhere, let's use require.True instead?

This commit batches allocations of `kvpb.Request` objects (both `kvpb.ResolveIntentRequest` and `kvpb.ResolveIntentRangeRequest`) in calls to `IntentResolver.resolveIntents`. This is a performance optimization for callers who wish to resolve many intents. It also cleans up the code, separating the logic to construct request objects from the logic to send them. Epic: None Release note: None

This commit adds assertions around the lock-table waiter's use of `intentresolver.ResolveOptions`. It asserts that the `Poison` flag is set to true and that the `MinTimestamp` is left empty. Epic: None Release note: None

…uests Fixes cockroachdb#50390. Related to cockroachdb#60585. Related to cockroachdb#103126. Closes cockroachdb#64500, which was an earlier attempt to solve this issue using a similar approach. This commit addresses the comments on that PR, which focused on the pagination of intent resolution when bypassing the request batcher. We still don't try to solve this issue, and instead limit the cases where intent resolution bypasses the request batcher to those where pagination is not necessary. This commit adds a new `sendImmediately` option to the `IntentResolver` `ResolveOptions`, which instructs the `IntentResolver` to send the provided intent resolution requests immediately, instead of adding them to a batch and waiting up to 10ms (defaultIntentResolutionBatchWait) for that batch to fill up with other intent resolution requests. This can be used to avoid any batching-induced latency and should be used only by foreground traffic that is willing to trade off some throughput for lower latency. The commit then sets this flag for single-key intent resolution requests initiated by the `lockTableWaiter`. Unlike most calls into the `IntentResolver`, which are performed by background tasks that are more than happy to trade increased latency for increased throughput, the `lockTableWaiter` calls into the `IntentResolver` on behalf of a foreground operation. It wants intent resolution to complete as soon as possible, so it is willing to trade reduced throughput for reduced latency. I tested this out by writing 10,000 different intents in a normal-priority transaction and then scanning over the table using a high-priority transaction. The test was performed on a 3-node roachprod cluster to demonstrate the effect with realistic network and disk latencies. ``` -- session 1 CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys SELECT generate_series(1, 10000); -- session 2 BEGIN PRIORITY HIGH; SELECT count(*) FROM keys; ``` Without this change, the scan took 70.078s. With this change, the scan took 15.958s. This 78% speed up checks out. Each encountered intent is resolved serially (see cockroachdb#103126), so the per-intent latency drops from 7ms to 1.6ms. This improvement by about 5ms agrees with the `defaultIntentResolutionBatchIdle`, which delays each resolution request that passes through a RequestBatcher. With this change, these resolve intent requests are issued immediately and this delay is not experienced. While this is a significant improvement and will be important for Read Committed transactions after cockroachdb#102014, this is still not quite enough to resolve cockroachdb#103126. For that, we need to batch the resolve intent requests themselves using a form of deferred intent resolution like we added in cockroachdb#49218 (for finalized transactions). A similar improvement is seen for scans that encounter many abandoned intents from many different transactions. This scenario bypasses the deferred intent resolution path added in cockroachdb#49218, because the intents are each written by different transactions. The speedup for this scenario was presented in cockroachdb#64500. Release note (performance improvement): SQL statements that must clean up intents from many different previously abandoned transactions now do so moderately more efficiently.

nvanbenschoten

TFTR!

Do you think it's worth opening an issue to expand the cases we bypass the batcher to include deferred intent resolution by foreground requests as well?

Since we're not planning to address this any time soon, I decided to add a TODO to the code instead of opening an issue.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)

pkg/kv/kvserver/intentresolver/intent_resolver.go line 937 at r3 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Consider adding a comment explaining that this method should only be called by foreground requests as it sets the sendImmediately flag; it's true now, so mostly for future modifiers of this code.

Done.

pkg/kv/kvserver/intentresolver/intent_resolver.go line 995 at r3 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: consider early returning here; that'll let you can get rid of the else statement and have some nesting back.

Done.

pkg/kv/kvserver/concurrency/lock_table_waiter_test.go line 379 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: here, and elsewhere, let's use require.True instead?

Done.

nvanbenschoten · 2023-05-18T19:48:45Z

bors r+

craig · 2023-05-18T20:36:12Z

Build succeeded:

Bazel Essential CI (Cockroach)

Fixes cockroachdb#103126. This commit extends the infrastructure introduced in cockroachdb#49218 for transaction timestamp pushes. It avoids redundant txn pushes of PENDING transactions and batches the resolution of PENDING intents. This breaks the O(num_intents) work performed by high-priority scans (e.g. backups) over intent-heavy keyspaces into something closer to O(num_ranges) work. The commit accomplishes its goals by adding a second per-Range LRU cache of transactions that are PENDING and are known to have been pushed to higher timestamps. We use this cache for two purposes: 1. when we are a non-locking read and we see a lock at a conflicting timestamp who is held by a pushed txn above our read timestamp, we neither wait out the kv.lock_table.coordinator_liveness_push_delay (50 ms) nor push the transactions record (RPC to leaseholder of pushee's txn record range). 2. we use the existence of a transaction in the cache as an indication that it may have written multiple intents, so we begin deferring intent resolution to enable batching. Together, these two changes make us much more effective at pushing transactions with a large number of intents. The following example (from cockroachdb#103126) demonstrates this: ```sql -- SETUP: run in a 3-node GCP roachprod cluster --- session 1 - write 100k intents CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys SELECT generate_series(1, 100000); --- session 2 - push intents with high-priority txn without uncertainty interval BEGIN PRIORITY HIGH AS OF SYSTEM TIME '-1ms'; SELECT count(*) FROM keys; --- BEFORE this PR and before cockroachdb#103265 (i.e. v23.1.2): takes ~7.1ms per intent Time: 714.441s total --- BEFORE this PR: takes ~1.5ms per intent Time: 151.880s total --- AFTER this PR: takes ~24μs per intent Time: 2.405s ``` The change does have an unfortunate limitation. Deferred intent resolution is only currently enabled for non-locking readers without uncertainty intervals. Readers with uncertainty intervals must contend with the possibility of pushing a conflicting intent up into their uncertainty interval and causing more work for themselves, which is avoided with care by the lockTableWaiter but difficult to coordinate through the txnStatusCache. This limitation is acceptable because the most important case here is optimizing the Export requests issued by backup. This limitation also hints at the long-term plan for this interaction, which is that non-locking readers can ignore known pending intents without the need to even resolve those intents (see cockroachdb#94730). This will require a request-scoped cache of pending, pushed transactions, which does not have the same problems with uncertainty intervals. Release note (performance improvement): Backups no longer perform work proportional to the number of pending intents that they encounter, so they are over 100x faster when encountering long-running, bulk writing transactions.

104784: kv/concurrency: batch intent resolution of pushed intents from same txn r=arulajmani a=nvanbenschoten Fixes #103126. This commit extends the infrastructure introduced in #49218 for transaction timestamp pushes. It avoids redundant txn pushes of PENDING transactions and batches the resolution of PENDING intents. This breaks the O(num_intents) work performed by high-priority scans (e.g. backups) over intent-heavy keyspaces into something closer to O(num_ranges) work. The commit accomplishes its goals by adding a second per-Range LRU cache of transactions that are PENDING and are known to have been pushed to higher timestamps. We use this cache for two purposes: 1. when we are a non-locking read and we see a lock at a conflicting timestamp who is held by a pushed txn above our read timestamp, we neither wait out the kv.lock_table.coordinator_liveness_push_delay (50 ms) nor push the transactions record (RPC to leaseholder of pushee's txn record range). 2. we use the existence of a transaction in the cache as an indication that it may have written multiple intents, so we begin deferring intent resolution to enable batching. Together, these two changes make us much more effective at pushing transactions with a large number of intents. The following example (from #103126) demonstrates this: ```sql -- SETUP: run in a 3-node GCP roachprod cluster --- session 1 - write 100k intents CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys SELECT generate_series(1, 100000); --- session 2 - push intents with high-priority txn without uncertainty interval BEGIN PRIORITY HIGH AS OF SYSTEM TIME '-1ms'; SELECT count(*) FROM keys; --- BEFORE this PR and before #103265 (i.e. v23.1.2): takes ~7.1ms per intent Time: 714.441s total --- BEFORE this PR: takes ~1.5ms per intent Time: 151.880s total --- AFTER this PR: takes ~24μs per intent Time: 2.405s ``` The change does have an unfortunate limitation. Deferred intent resolution is only currently enabled for non-locking readers without uncertainty intervals. Readers with uncertainty intervals must contend with the possibility of pushing a conflicting intent up into their uncertainty interval and causing more work for themselves, which is avoided with care by the lockTableWaiter but difficult to coordinate through the txnStatusCache. This limitation is acceptable because the most important case here is optimizing the Export requests issued by backup. This limitation also hints at the long-term plan for this interaction, which is that non-locking readers can ignore known pending intents without the need to even resolve those intents (see #94730). This will require a request-scoped cache of pending, pushed transactions, which does not have the same problems with uncertainty intervals. Release note (performance improvement): Backups no longer perform work proportional to the number of pending intents that they encounter, so they are over 100x faster when encountering long-running, bulk writing transactions. Co-authored-by: Arul Ajmani <arulajmani@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

Fixes cockroachdb#103126. This commit extends the infrastructure introduced in cockroachdb#49218 for transaction timestamp pushes. It avoids redundant txn pushes of PENDING transactions and batches the resolution of PENDING intents. This breaks the O(num_intents) work performed by high-priority scans (e.g. backups) over intent-heavy keyspaces into something closer to O(num_ranges) work. The commit accomplishes its goals by adding a second per-Range LRU cache of transactions that are PENDING and are known to have been pushed to higher timestamps. We use this cache for two purposes: 1. when we are a non-locking read and we see a lock at a conflicting timestamp who is held by a pushed txn above our read timestamp, we neither wait out the kv.lock_table.coordinator_liveness_push_delay (50 ms) nor push the transactions record (RPC to leaseholder of pushee's txn record range). 2. we use the existence of a transaction in the cache as an indication that it may have written multiple intents, so we begin deferring intent resolution to enable batching. Together, these two changes make us much more effective at pushing transactions with a large number of intents. The following example (from cockroachdb#103126) demonstrates this: ```sql -- SETUP: run in a 3-node GCP roachprod cluster --- session 1 - write 100k intents CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys SELECT generate_series(1, 100000); --- session 2 - push intents with high-priority txn without uncertainty interval BEGIN PRIORITY HIGH AS OF SYSTEM TIME '-1ms'; SELECT count(*) FROM keys; --- BEFORE this PR and before cockroachdb#103265 (i.e. v23.1.2): takes ~7.1ms per intent Time: 714.441s total --- BEFORE this PR: takes ~1.5ms per intent Time: 151.880s total --- AFTER this PR: takes ~24μs per intent Time: 2.405s ``` The change does have an unfortunate limitation. Deferred intent resolution is only currently enabled for non-locking readers without uncertainty intervals. Readers with uncertainty intervals must contend with the possibility of pushing a conflicting intent up into their uncertainty interval and causing more work for themselves, which is avoided with care by the lockTableWaiter but difficult to coordinate through the txnStatusCache. This limitation is acceptable because the most important case here is optimizing the Export requests issued by backup. This limitation also hints at the long-term plan for this interaction, which is that non-locking readers can ignore known pending intents without the need to even resolve those intents (see cockroachdb#94730). This will require a request-scoped cache of pending, pushed transactions, which does not have the same problems with uncertainty intervals. Release note (performance improvement): Backups no longer perform work proportional to the number of pending intents that they encounter, so they are over 100x faster when encountering long-running, bulk writing transactions.

nvanbenschoten requested a review from arulajmani May 14, 2023 17:51

nvanbenschoten requested a review from a team as a code owner May 14, 2023 17:51

nvanbenschoten mentioned this pull request May 14, 2023

kv: resolve conflicting intents immediately for latency-sensitive requests #64500

Closed

arulajmani approved these changes May 15, 2023

View reviewed changes

nvanbenschoten added 3 commits May 18, 2023 13:34

kv: test intentresolver ResolveOptions in lock-table waiter tests

11c3c83

This commit adds assertions around the lock-table waiter's use of `intentresolver.ResolveOptions`. It asserts that the `Poison` flag is set to true and that the `MinTimestamp` is left empty. Epic: None Release note: None

nvanbenschoten force-pushed the nvanbenschoten/intentResolverAlloc branch from 41f1abb to 3fc29ad Compare May 18, 2023 17:46

nvanbenschoten commented May 18, 2023

View reviewed changes

craig bot merged commit fe71b7c into cockroachdb:master May 18, 2023
2 checks passed

cockroach-teamcity mentioned this pull request May 19, 2023

PR #103265 - kv: resolve conflicting intents immediately for latency-sensitive req… cockroachdb/docs#17057

Open

nvanbenschoten deleted the nvanbenschoten/intentResolverAlloc branch May 19, 2023 21:39

nvanbenschoten mentioned this pull request May 23, 2023

sql: flake in TestExecBuild/autocommit due to non-deterministic ResolveIntent in KV batch #103772

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: resolve conflicting intents immediately for latency-sensitive requests #103265

kv: resolve conflicting intents immediately for latency-sensitive requests #103265

nvanbenschoten commented May 14, 2023

cockroach-teamcity commented May 14, 2023

erikgrinaker commented May 14, 2023

nvanbenschoten commented May 14, 2023

arulajmani left a comment

nvanbenschoten left a comment

nvanbenschoten commented May 18, 2023

craig bot commented May 18, 2023

kv: resolve conflicting intents immediately for latency-sensitive requests #103265

kv: resolve conflicting intents immediately for latency-sensitive requests #103265

Conversation

nvanbenschoten commented May 14, 2023

cockroach-teamcity commented May 14, 2023

erikgrinaker commented May 14, 2023

nvanbenschoten commented May 14, 2023

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

nvanbenschoten commented May 18, 2023

craig bot commented May 18, 2023