Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: investigate elevated batch commit tail latencies #2646

Closed
jbowens opened this issue Jun 16, 2023 · 3 comments
Closed

db: investigate elevated batch commit tail latencies #2646

jbowens opened this issue Jun 16, 2023 · 3 comments

Comments

@jbowens
Copy link
Collaborator

jbowens commented Jun 16, 2023

The following graphs are from one particular instance of this, but we've seen it across several test clusters including the 23.1 test cluster.

Screenshot 2023-06-16 at 4 27 01 PM
Screenshot 2023-06-16 at 4 27 34 PM

Internal discussion: https://cockroachlabs.slack.com/archives/C057ULDSKC0/p1686947338066129?thread_ts=1686924156.296339&cid=C057ULDSKC0

@jbowens
Copy link
Collaborator Author

jbowens commented Jun 16, 2023

(Potentially) related to:

Also, ingests can be disruptive to batch commit tail latencies, although it is definitely not the only factor because we've seen elevated batch commit tail latencies in the absence of ingests.:

@jbowens
Copy link
Collaborator Author

jbowens commented Jun 21, 2023

As far as I can tell, there are two primary sources of tail latency:

  1. There's contention on d.commit.mu. Some batch commits block on acquiring d.commit.mu for 90ms:
slow batch commit: ‹(TotDur: 93.940495ms, SemaWait: 206ns, WALQueueWait: 0s, MemtblStall: 0s, L0Stall: 0s, WALRot: 0s, CommitWait: 489.115µs, MemtblApply: 10.924µs, Publish: 489.572µs, CommitMu: 93.436399ms, WALWrite: 1.094µs, DBMu: 124ns, PendingEnqueue: 308ns, RoomForWrite: 376ns, MemtblRot: 0s, MemtblPrep: 219ns)›

This seems to be driven by slow DB.makeRoomForWrite calls, which tend to be dominated by the memtable rotation:

slow make room for write: ‹(TotDur: 191.177358ms, SemaWait: 336ns, WALQueueWait: 0s, MemtblStall: 0s, L0Stall: 0s, WALRot: 607.996µs, CommitWait: 48ns, MemtblApply: 4.215µs, Publish: 797ns, CommitMu: 88ns, WALWrite: 2.49µs, DBMu: 91ns, PendingEnqueue: 227ns, RoomForWrite: 191.165895ms, MemtblRot: 190.55501ms, MemtblPrep: 347ns)›

I believe we're seeing contention on the table cache mutexes while reserving space for the memtables: #1997. These long memtable rotations keep DB.commit.mu locked throughout.

  1. Once a batch has been applied to the memtable, we need to wait for a) the fsync to complete, if applicable, and b) for the sequence number to be published.
slow batch commit: ‹(TotDur: 80.033189ms, SemaWait: 233ns, WALQueueWait: 0s, MemtblStall: 0s, L0Stall: 0s, WALRot: 0s, CommitWait: 80.027419ms, MemtblApply: 2.887µs, Publish: 80.027626ms, CommitMu: 343ns, WALWrite: 441ns, DBMu: 32ns, PendingEnqueue: 126ns, RoomForWrite: 284ns, MemtblRot: 0s, MemtblPrep: 170ns)›

I tried to look at whether this might be solely due to the fsync by setting kv.raft_log.disable_synchronization_unsafe = true and found some batches still wait tens of milliseconds:

I230621 21:34:35.452617 1048 3@pebble/db.go:881 ⋮ [n2,s2,pebble] 5426  vslow batch commit: ‹(TotDur: 57.603998ms, SemaWait: 172ns, WALQueueWait: 0s, MemtblStall: 0s, L0Stall: 0s, WALRot: 0s, CommitWait: 57.596621ms, MemtblApply: 5.669µs, Publish: 57.596777ms, CommitMu: 72ns, WALWrite: 196ns, DBMu: 38ns, PendingEnqueue: 119ns, RoomForWrite: 183ns, MemtblRot: 0s, MemtblPrep: 99ns)›

I wonder if there's contention on the visible sequence number?:

		// We're responsible for publishing the sequence number for batch t, but
		// another concurrent goroutine might sneak in and publish the sequence
		// number for a subsequent batch. That's ok as all we're guaranteeing is
		// that the sequence number ratchets up.
		for {
			curSeqNum := p.env.visibleSeqNum.Load()
			newSeqNum := t.SeqNum() + uint64(t.Count())
			if newSeqNum <= curSeqNum {
				// t's sequence number has already been published.
				break
			}
			if p.env.visibleSeqNum.CompareAndSwap(curSeqNum, newSeqNum) {
				// We successfully published t's sequence number.
				break
			}
		}

@jbowens jbowens self-assigned this Jun 22, 2023
@jbowens jbowens moved this from Incoming to In Progress (this milestone) in (Deprecated) Storage Jun 22, 2023
@jbowens
Copy link
Collaborator Author

jbowens commented Jun 26, 2023

I played with a few tweaks looking to understand and reduce the tail latencies, including avoiding acquiring DB.mu in order to gain a reference to the mutable memtable: b77606f...a891d5d

It can be difficult to see past the noise, especially in the presence of write stalls which can spike log commit latency dramatically or on process start. The below graph uses a MIN aggregator to see past the spikes. It shows these changes do help in the very tail, but are more muted at p99.

Screenshot 2023-06-26 at 11 58 06 AM

For comparison, the WAL latency:
Screenshot 2023-06-26 at 12 03 16 PM

jbowens added a commit to jbowens/pebble that referenced this issue Jul 25, 2023
We've observed large allocations like the 64MB memtable allocation take 10ms+.
This can contribute to batch commit tail latencies by slowing down WAL/memtable
rotation during which the entire commit pipeline is stalled. This commit adapts
the memtable lifecycle to keep the most recent obsolete memtable around for the
next memtable allocation.

```
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/pebble
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                   │   old.txt    │               new.txt               │
                   │    sec/op    │   sec/op     vs base                │
RotateMemtables-24   105.56µ ± 1%   88.11µ ± 2%  -16.53% (p=0.000 n=25)

                   │   old.txt    │            new.txt             │
                   │     B/op     │     B/op      vs base          │
RotateMemtables-24   124.3Ki ± 0%   124.3Ki ± 0%  ~ (p=0.418 n=25)

                   │  old.txt   │              new.txt              │
                   │ allocs/op  │ allocs/op   vs base               │
RotateMemtables-24   114.0 ± 0%   115.0 ± 0%  +0.88% (p=0.000 n=25)
```

Informs cockroachdb#2646.
jbowens added a commit to jbowens/pebble that referenced this issue Jul 27, 2023
We've observed large allocations like the 64MB memtable allocation take 10ms+.
This can contribute to batch commit tail latencies by slowing down WAL/memtable
rotation during which the entire commit pipeline is stalled. This commit adapts
the memtable lifecycle to keep the most recent obsolete memtable around for the
next memtable allocation.

```
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/pebble
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                   │   old.txt    │               new.txt               │
                   │    sec/op    │   sec/op     vs base                │
RotateMemtables-24   105.56µ ± 1%   88.11µ ± 2%  -16.53% (p=0.000 n=25)

                   │   old.txt    │            new.txt             │
                   │     B/op     │     B/op      vs base          │
RotateMemtables-24   124.3Ki ± 0%   124.3Ki ± 0%  ~ (p=0.418 n=25)

                   │  old.txt   │              new.txt              │
                   │ allocs/op  │ allocs/op   vs base               │
RotateMemtables-24   114.0 ± 0%   115.0 ± 0%  +0.88% (p=0.000 n=25)
```

Informs cockroachdb#2646.
jbowens added a commit to jbowens/pebble that referenced this issue Jul 27, 2023
We've observed large allocations like the 64MB memtable allocation take 10ms+.
This can contribute to batch commit tail latencies by slowing down WAL/memtable
rotation during which the entire commit pipeline is stalled. This commit adapts
the memtable lifecycle to keep the most recent obsolete memtable around for use
as the next memtable.

This reduces the commit latency hiccup during a memtable rotation, and it also
reduces block cache mutex contention (cockroachdb#1997) by reducing the number of times we
must reserve memory from the block cache.

```
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/pebble
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                   │   old.txt   │               new.txt               │
                   │   sec/op    │   sec/op     vs base                │
RotateMemtables-24   120.7µ ± 2%   102.8µ ± 4%  -14.85% (p=0.000 n=25)

                   │   old.txt    │               new.txt               │
                   │     B/op     │     B/op      vs base               │
RotateMemtables-24   124.3Ki ± 0%   124.0Ki ± 0%  -0.27% (p=0.000 n=25)

                   │  old.txt   │              new.txt              │
                   │ allocs/op  │ allocs/op   vs base               │
RotateMemtables-24   114.0 ± 0%   111.0 ± 0%  -2.63% (p=0.000 n=25)
```

Informs cockroachdb#2646.
jbowens added a commit to jbowens/pebble that referenced this issue Jul 28, 2023
We've observed large allocations like the 64MB memtable allocation take 10ms+.
This can contribute to batch commit tail latencies by slowing down WAL/memtable
rotation during which the entire commit pipeline is stalled. This commit adapts
the memtable lifecycle to keep the most recent obsolete memtable around for use
as the next memtable.

This reduces the commit latency hiccup during a memtable rotation, and it also
reduces block cache mutex contention (cockroachdb#1997) by reducing the number of times we
must reserve memory from the block cache.

```
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/pebble
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                   │   old.txt   │               new.txt               │
                   │   sec/op    │   sec/op     vs base                │
RotateMemtables-24   120.7µ ± 2%   102.8µ ± 4%  -14.85% (p=0.000 n=25)

                   │   old.txt    │               new.txt               │
                   │     B/op     │     B/op      vs base               │
RotateMemtables-24   124.3Ki ± 0%   124.0Ki ± 0%  -0.27% (p=0.000 n=25)

                   │  old.txt   │              new.txt              │
                   │ allocs/op  │ allocs/op   vs base               │
RotateMemtables-24   114.0 ± 0%   111.0 ± 0%  -2.63% (p=0.000 n=25)
```

Informs cockroachdb#2646.
jbowens added a commit to jbowens/pebble that referenced this issue Jul 28, 2023
We've observed large allocations like the 64MB memtable allocation take 10ms+.
This can add latency to the WAL/memtable rotation critical section during which
the entire commit pipeline is stalled, contributing to batch commit tail
latencies. This commit adapts the memtable lifecycle to keep the most recent
obsolete memtable around for use as the next mutable memtable.

This reduces the commit latency hiccup during a memtable rotation, and it also
reduces block cache mutex contention (cockroachdb#1997) by reducing the number of times we
must reserve memory from the block cache.

```
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/pebble
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                   │   old.txt   │               new.txt               │
                   │   sec/op    │   sec/op     vs base                │
RotateMemtables-24   120.7µ ± 2%   102.8µ ± 4%  -14.85% (p=0.000 n=25)

                   │   old.txt    │               new.txt               │
                   │     B/op     │     B/op      vs base               │
RotateMemtables-24   124.3Ki ± 0%   124.0Ki ± 0%  -0.27% (p=0.000 n=25)

                   │  old.txt   │              new.txt              │
                   │ allocs/op  │ allocs/op   vs base               │
RotateMemtables-24   114.0 ± 0%   111.0 ± 0%  -2.63% (p=0.000 n=25)
```

Informs cockroachdb#2646.
jbowens added a commit that referenced this issue Jul 28, 2023
We've observed large allocations like the 64MB memtable allocation take 10ms+.
This can add latency to the WAL/memtable rotation critical section during which
the entire commit pipeline is stalled, contributing to batch commit tail
latencies. This commit adapts the memtable lifecycle to keep the most recent
obsolete memtable around for use as the next mutable memtable.

This reduces the commit latency hiccup during a memtable rotation, and it also
reduces block cache mutex contention (#1997) by reducing the number of times we
must reserve memory from the block cache.

```
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/pebble
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
                   │   old.txt   │               new.txt               │
                   │   sec/op    │   sec/op     vs base                │
RotateMemtables-24   120.7µ ± 2%   102.8µ ± 4%  -14.85% (p=0.000 n=25)

                   │   old.txt    │               new.txt               │
                   │     B/op     │     B/op      vs base               │
RotateMemtables-24   124.3Ki ± 0%   124.0Ki ± 0%  -0.27% (p=0.000 n=25)

                   │  old.txt   │              new.txt              │
                   │ allocs/op  │ allocs/op   vs base               │
RotateMemtables-24   114.0 ± 0%   111.0 ± 0%  -2.63% (p=0.000 n=25)
```

Informs #2646.
@jbowens jbowens closed this as completed Aug 14, 2023
(Deprecated) Storage automation moved this from In Progress (this milestone) to Done Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

1 participant