Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: read committed transactions should not wait on PENDING intents #102014

Closed
nvanbenschoten opened this issue Apr 21, 2023 · 1 comment · Fixed by #103267
Closed

kv: read committed transactions should not wait on PENDING intents #102014

nvanbenschoten opened this issue Apr 21, 2023 · 1 comment · Fixed by #103267
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Apr 21, 2023

Sibling issue of #94729. When a read committed transaction encounters an intent (with any isolation level), or when a transaction (with any isolation level) encounters a read committed intent, the push should not block.

From the Read Committed RFC:

Instead of waiting on these conflicting intents until the intent's transaction completes, the non-locking readers will perform a "status check" RPC in the form of a PushTxn(PUSH_TIMESTAMP) to the transaction record of encountered intent to determine its visibility. If the intent's transaction has already been committed or aborted, the reader must initiate and wait for intent resolution and then observe the post-resolution state. However, if the intent's transaction is in progress, the reader can push the minimum commit timestamp of the intent holder to prevent it from committing at or before the reader's read timestamp. The reader can then proceed to ignore the intent and continue with its scan.

Jira issue: CRDB-27217

Epic CRDB-26539

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-transactions Relating to MVCC and the transactional model. T-kv KV Team A-read-committed Related to the introduction of Read Committed labels Apr 21, 2023
@nvanbenschoten nvanbenschoten self-assigned this Apr 21, 2023
@nvanbenschoten
Copy link
Member Author

Related prototype commit: nvanbenschoten@9190180.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 14, 2023
…uests

Fixes cockroachdb#50390.
Related to cockroachdb#60585.
Related to cockroachdb#103126.

Closes cockroachdb#64500, which was an earlier attempt to solve this issue using a similar
approach. This commit addresses the comments on that PR, which focused on the
pagination of intent resolution when bypassing the request batcher. We still
don't try to solve this issue, and instead limit the cases where intent
resolution bypasses the request batcher to those where pagination is not
necessary.

This commit adds a new `sendImmediately` option to the `IntentResolver`
`ResolveOptions`, which instructs the `IntentResolver` to send the provided
intent resolution requests immediately, instead of adding them to a batch and
waiting up to 10ms (defaultIntentResolutionBatchWait) for that batch to fill up
with other intent resolution requests. This can be used to avoid any
batching-induced latency and should be used only by foreground traffic that is
willing to trade off some throughput for lower latency.

The commit then sets this flag for single-key intent resolution requests initiated
by the `lockTableWaiter`. Unlike most calls into the `IntentResolver`, which are
performed by background tasks that are more than happy to trade increased
latency for increased throughput, the `lockTableWaiter` calls into the
`IntentResolver` on behalf of a foreground operation. It wants intent resolution
to complete as soon as possible, so it is willing to trade reduced throughput
for reduced latency.

I tested this out by writing 10,000 different intents in a normal-priority transaction
and then scanning over the table using a high-priority transaction. The test was
performed on a 3-node roachprod cluster to demonstrate the effect with realistic
network and disk latencies.
```
-- session 1
CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY);
BEGIN; INSERT INTO keys SELECT generate_series(1, 10000);

-- session 2
BEGIN PRIORITY HIGH; SELECT count(*) FROM keys;
```

Without this change, the scan took 70.078s. With this change, the scan took
15.958s. This 78% speed up checks out. Each encountered intent is resolved
serially (see cockroachdb#103126), so the per-intent latency drops from 7ms to 1.6ms. This
improvement by about 5ms agrees with the `defaultIntentResolutionBatchIdle`,
which delays each resolution request that passes through a RequestBatcher. With
this change, these resolve intent requests are issued immediately and this delay
is not experienced.

While this is a significant improvement and will be important for Read Committed
transactions after cockroachdb#102014, this is still not quite enough to resolve cockroachdb#103126.
For that, we need to batch the resolve intent requests themselves using a form
of deferred intent resolution like we added in cockroachdb#49218 (for finalized transactions).

A similar improvement is seen for scans that encounter many abandoned intents
from many different transactions. This scenario bypasses the deferred intent
resolution path added in cockroachdb#49218, because the intents are each written by
different transactions. The speedup for this scenario was presented in cockroachdb#64500.

Release note (performance improvement): SQL statements that must clean up
intents from many different previously abandoned transactions now do so
moderately more efficiently.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 18, 2023
…uests

Fixes cockroachdb#50390.
Related to cockroachdb#60585.
Related to cockroachdb#103126.

Closes cockroachdb#64500, which was an earlier attempt to solve this issue using a similar
approach. This commit addresses the comments on that PR, which focused on the
pagination of intent resolution when bypassing the request batcher. We still
don't try to solve this issue, and instead limit the cases where intent
resolution bypasses the request batcher to those where pagination is not
necessary.

This commit adds a new `sendImmediately` option to the `IntentResolver`
`ResolveOptions`, which instructs the `IntentResolver` to send the provided
intent resolution requests immediately, instead of adding them to a batch and
waiting up to 10ms (defaultIntentResolutionBatchWait) for that batch to fill up
with other intent resolution requests. This can be used to avoid any
batching-induced latency and should be used only by foreground traffic that is
willing to trade off some throughput for lower latency.

The commit then sets this flag for single-key intent resolution requests initiated
by the `lockTableWaiter`. Unlike most calls into the `IntentResolver`, which are
performed by background tasks that are more than happy to trade increased
latency for increased throughput, the `lockTableWaiter` calls into the
`IntentResolver` on behalf of a foreground operation. It wants intent resolution
to complete as soon as possible, so it is willing to trade reduced throughput
for reduced latency.

I tested this out by writing 10,000 different intents in a normal-priority transaction
and then scanning over the table using a high-priority transaction. The test was
performed on a 3-node roachprod cluster to demonstrate the effect with realistic
network and disk latencies.
```
-- session 1
CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY);
BEGIN; INSERT INTO keys SELECT generate_series(1, 10000);

-- session 2
BEGIN PRIORITY HIGH; SELECT count(*) FROM keys;
```

Without this change, the scan took 70.078s. With this change, the scan took
15.958s. This 78% speed up checks out. Each encountered intent is resolved
serially (see cockroachdb#103126), so the per-intent latency drops from 7ms to 1.6ms. This
improvement by about 5ms agrees with the `defaultIntentResolutionBatchIdle`,
which delays each resolution request that passes through a RequestBatcher. With
this change, these resolve intent requests are issued immediately and this delay
is not experienced.

While this is a significant improvement and will be important for Read Committed
transactions after cockroachdb#102014, this is still not quite enough to resolve cockroachdb#103126.
For that, we need to batch the resolve intent requests themselves using a form
of deferred intent resolution like we added in cockroachdb#49218 (for finalized transactions).

A similar improvement is seen for scans that encounter many abandoned intents
from many different transactions. This scenario bypasses the deferred intent
resolution path added in cockroachdb#49218, because the intents are each written by
different transactions. The speedup for this scenario was presented in cockroachdb#64500.

Release note (performance improvement): SQL statements that must clean up
intents from many different previously abandoned transactions now do so
moderately more efficiently.
craig bot pushed a commit that referenced this issue May 18, 2023
103265: kv: resolve conflicting intents immediately for latency-sensitive requests r=nvanbenschoten a=nvanbenschoten

Fixes #50390.
Related to #60585.
Related to #103126.

Closes #64500, which was an earlier attempt to solve this issue using a similar approach. This commit addresses the comments on that PR, which focused on the pagination of intent resolution when bypassing the request batcher. We still don't try to solve this issue, and instead limit the cases where intent resolution bypasses the request batcher to those where pagination is not necessary.

This commit adds a new `sendImmediately` option to the `IntentResolver` `ResolveOptions`, which instructs the `IntentResolver` to send the provided intent resolution requests immediately, instead of adding them to a batch and waiting up to 10ms (`defaultIntentResolutionBatchWait`) for that batch to fill up with other intent resolution requests. This can be used to avoid any batching-induced latency and should be used only by foreground traffic that is willing to trade off some throughput for lower latency.

The commit then sets this flag for single-key intent resolution requests initiated by the `lockTableWaiter`. Unlike most calls into the `IntentResolver`, which are performed by background tasks that are more than happy to trade increased latency for increased throughput, the `lockTableWaiter` calls into the `IntentResolver` on behalf of a foreground operation. It wants intent resolution to complete as soon as possible, so it is willing to trade reduced throughput for reduced latency.

I tested this out by writing 10,000 different intents in a normal-priority transaction and then scanning over the table using a high-priority transaction. The test was performed on a 3-node roachprod cluster to demonstrate the effect with realistic network and disk latencies.
```sql
-- session 1
CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY);
BEGIN; INSERT INTO keys SELECT generate_series(1, 10000);

-- session 2
BEGIN PRIORITY HIGH; SELECT count(*) FROM keys;
```

Without this change, the scan took **70.078s**. With this change, the scan took **15.958s**. This **78%** speed-up checks out. Each encountered intent is resolved serially (see #103126), so the **per-intent latency** drops from **7ms** to **1.6ms.** This improvement by about 5ms agrees with the `defaultIntentResolutionBatchIdle`, which delays each resolution request that passes through a RequestBatcher. With this change, these resolve intent requests are issued immediately and this delay is not experienced.

While this is a significant improvement and will be important for Read Committed transactions after #102014, this is still not quite enough to resolve #103126. For that, we need to batch the resolve intent requests themselves using a form of deferred intent resolution like we added in #49218 (for finalized transactions).

A similar improvement is seen for scans that encounter many abandoned intents from many different transactions. This scenario bypasses the deferred intent resolution path added in #49218, because the intents are each written by different transactions. The speedup for this scenario was presented in #64500.

Release note (performance improvement): SQL statements that must clean up intents from many different previously abandoned transactions now do so moderately more efficiently.


103362: sql: validate primary / secondary region localities at end of txn r=fqazi a=fqazi

Previously, if a database was restored with skip_localities, there was no way to modify this database to set the primary region since validation would kick in too early during the statement. This meant fixing the regions in a restored database was impossible if the primary region was no longer valid. To address this, this patch, delays locality validation till the end of the transaction.

Fixes: #103290

Release note (bug fix): SET PRIMARY REGION and SET SECONDARY REGION did not validate transactionally, which could prevent cleaning up removed regions after a restore.

103373: concurrency: small refactors in preparation for reservation removal r=nvanbenschoten a=arulajmani

See individual commits for details. 

Informs: #103361

103538: go.mod: bump Pebble to 6f2788660198, rework shared storage wrapper r=RaduBerinde a=RaduBerinde

6f278866 shared: improve interface for more efficient reading
9eb2c407 db: log events to testing.T in unit tests
f32e7dc6 db: add reserved Pebblev4 sstable format
5a6b91b8 objstorage: improve test and add read ahead test
2bc4319e objstorage: remove genericFileReadable
8143ffb9 objstorage: fix readaheadState initialization
06d08888 db: add Compact.Duration metric
e7213de0 db: add Uptime metric
e9005aed db: don't delete files during ingest application
222b43ec internal/arenaskl: fix Skiplist doc typo

Release note: None
Epic: none

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 18, 2023
Fixes cockroachdb#102014.

This commit updates write-read conflict rules to allow non-blocking
behavior for weak isolation transactions. Specifically, PUSH_TIMESTAMP
requests are now allowed to succeed when encountering PENDING pushees
if any of the following conditions is true:
- the pushee is a weak isolation transaction
- OR the pusher is a weak isolation transaction with an equal priority
  as the pushee
- OR the pusher has a greater priority than the pushee (previous behavior)

The rationale for this behavior is that weak isolation transactions can
tolerate write skew without a refresh or retry. Because of this, they
face no consequence from being pushed and also expect others to be
pushable (non-blocking).

Longer-term, we would like all write-read conflicts to become non-blocking. For
now, these rules ensure that all write-read conflicts between weak isolation
transactions are non-blocking, and also that write-read conflicts between a weak
isolation transaction and a strong isolation (serializable) transaction has
reasonable, tunable behavior.

Release note: None
craig bot pushed a commit that referenced this issue May 19, 2023
103267: kv: non-blocking write-read conflicts for weak isolation txns r=arulajmani a=nvanbenschoten

Fixes #102014.

This commit updates write-read conflict rules to allow non-blocking behavior for weak isolation transactions. Specifically, PUSH_TIMESTAMP requests are now allowed to succeed when encountering PENDING pushees if any of the following conditions is true:
- the pushee is a weak isolation transaction
- OR the pusher is a weak isolation transaction with an equal priority as the pushee
- OR the pusher has a greater priority than the pushee (previous behavior)

The rationale for this behavior is that weak isolation transactions can tolerate write skew without a refresh or retry. Because of this, they face no consequence from being pushed and also expect others to be pushable (non-blocking).

Longer-term, we would like all write-read conflicts to become non-blocking. For now, these rules ensure that all write-read conflicts between weak isolation transactions are non-blocking, and also that write-read conflicts between a weak isolation transaction and a strong isolation (serializable) transaction has reasonable, tunable behavior.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in 5a53d25 May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant