Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record: ensure block.written is accessed atomically #1719

Closed
wants to merge 1 commit into from

Conversation

nicktrav
Copy link
Contributor

Currently, a block's written field is intended to be used atomically.
There are existing call sites where the field is read from or written to
non-atomically, which can result in a data race (as observed in
cockroachdb/cockroach#81511).

Make reads and writes of block.written atomic.

This results in up to a 8% reduction in latency and increase in
throughput, depending on the size of the record, due to a change to use
a stack variable rather that dereferencing a field on the struct.
However, it should be noted that this particular benchmark does not
touch the filesystem (it makes use of io.Discard). In practice the
savings would be dwarfed by the filesystem operation (sync / write).

name                       old time/op    new time/op    delta
RecordWrite/size=8-16        42.9ns ± 5%    40.7ns ± 2%  -5.24%  (p=0.000 n=10+10)
RecordWrite/size=16-16       42.7ns ± 4%    40.7ns ± 1%  -4.89%  (p=0.000 n=10+10)
RecordWrite/size=32-16       44.0ns ± 6%    40.4ns ± 1%  -8.24%  (p=0.000 n=10+10)
RecordWrite/size=64-16       44.1ns ± 1%    41.6ns ± 0%  -5.58%  (p=0.000 n=10+10)
RecordWrite/size=256-16      67.7ns ± 4%    65.2ns ± 1%  -3.71%  (p=0.000 n=10+9)
RecordWrite/size=1028-16      124ns ± 2%     119ns ± 1%  -4.04%  (p=0.000 n=10+10)
RecordWrite/size=4096-16      306ns ± 5%     296ns ± 1%  -3.25%  (p=0.000 n=10+9)
RecordWrite/size=65536-16    4.54µs ± 1%    4.52µs ± 2%    ~     (p=0.247 n=10+10)

name                       old speed      new speed      delta
RecordWrite/size=8-16       186MB/s ± 5%   197MB/s ± 2%  +5.49%  (p=0.000 n=10+10)
RecordWrite/size=16-16      374MB/s ± 3%   394MB/s ± 1%  +5.12%  (p=0.000 n=10+10)
RecordWrite/size=32-16      727MB/s ± 5%   792MB/s ± 1%  +8.91%  (p=0.000 n=10+10)
RecordWrite/size=64-16     1.45GB/s ± 1%  1.54GB/s ± 0%  +5.91%  (p=0.000 n=10+10)
RecordWrite/size=256-16    3.78GB/s ± 4%  3.92GB/s ± 1%  +3.82%  (p=0.000 n=10+9)
RecordWrite/size=1028-16   8.31GB/s ± 2%  8.66GB/s ± 1%  +4.21%  (p=0.000 n=10+10)
RecordWrite/size=4096-16   13.4GB/s ± 5%  13.8GB/s ± 1%  +3.30%  (p=0.000 n=10+9)
RecordWrite/size=65536-16  14.4GB/s ± 1%  14.5GB/s ± 2%    ~     (p=0.247 n=10+10)

name                       old alloc/op   new alloc/op   delta
RecordWrite/size=8-16         0.00B          0.00B         ~     (all equal)
RecordWrite/size=16-16        0.00B          0.00B         ~     (all equal)
RecordWrite/size=32-16        0.00B          0.00B         ~     (all equal)
RecordWrite/size=64-16        0.00B          0.00B         ~     (all equal)
RecordWrite/size=256-16       0.00B          0.00B         ~     (all equal)
RecordWrite/size=1028-16      0.00B          0.00B         ~     (all equal)
RecordWrite/size=4096-16      0.00B          0.00B         ~     (all equal)
RecordWrite/size=65536-16     2.00B ± 0%     2.00B ± 0%    ~     (all equal)

name                       old allocs/op  new allocs/op  delta
RecordWrite/size=8-16          0.00           0.00         ~     (all equal)
RecordWrite/size=16-16         0.00           0.00         ~     (all equal)
RecordWrite/size=32-16         0.00           0.00         ~     (all equal)
RecordWrite/size=64-16         0.00           0.00         ~     (all equal)
RecordWrite/size=256-16        0.00           0.00         ~     (all equal)
RecordWrite/size=1028-16       0.00           0.00         ~     (all equal)
RecordWrite/size=4096-16       0.00           0.00         ~     (all equal)
RecordWrite/size=65536-16      0.00           0.00         ~     (all equal)

Fixes #1692.

Currently, a block's `written` field is intended to be used atomically.
There are existing call sites where the field is read from or written to
non-atomically, which can result in a data race (as observed in
cockroachdb/cockroach#81511).

Make reads and writes of `block.written` atomic.

This results in up to a 8% reduction in latency and increase in
throughput, depending on the size of the record, due to a change to use
a stack variable rather that dereferencing a field on the struct.
However, it should be noted that this particular benchmark does not
touch the filesystem (it makes use of `io.Discard`). In practice the
savings would be dwarfed by the filesystem operation (sync / write).

```
name                       old time/op    new time/op    delta
RecordWrite/size=8-16        42.9ns ± 5%    40.7ns ± 2%  -5.24%  (p=0.000 n=10+10)
RecordWrite/size=16-16       42.7ns ± 4%    40.7ns ± 1%  -4.89%  (p=0.000 n=10+10)
RecordWrite/size=32-16       44.0ns ± 6%    40.4ns ± 1%  -8.24%  (p=0.000 n=10+10)
RecordWrite/size=64-16       44.1ns ± 1%    41.6ns ± 0%  -5.58%  (p=0.000 n=10+10)
RecordWrite/size=256-16      67.7ns ± 4%    65.2ns ± 1%  -3.71%  (p=0.000 n=10+9)
RecordWrite/size=1028-16      124ns ± 2%     119ns ± 1%  -4.04%  (p=0.000 n=10+10)
RecordWrite/size=4096-16      306ns ± 5%     296ns ± 1%  -3.25%  (p=0.000 n=10+9)
RecordWrite/size=65536-16    4.54µs ± 1%    4.52µs ± 2%    ~     (p=0.247 n=10+10)

name                       old speed      new speed      delta
RecordWrite/size=8-16       186MB/s ± 5%   197MB/s ± 2%  +5.49%  (p=0.000 n=10+10)
RecordWrite/size=16-16      374MB/s ± 3%   394MB/s ± 1%  +5.12%  (p=0.000 n=10+10)
RecordWrite/size=32-16      727MB/s ± 5%   792MB/s ± 1%  +8.91%  (p=0.000 n=10+10)
RecordWrite/size=64-16     1.45GB/s ± 1%  1.54GB/s ± 0%  +5.91%  (p=0.000 n=10+10)
RecordWrite/size=256-16    3.78GB/s ± 4%  3.92GB/s ± 1%  +3.82%  (p=0.000 n=10+9)
RecordWrite/size=1028-16   8.31GB/s ± 2%  8.66GB/s ± 1%  +4.21%  (p=0.000 n=10+10)
RecordWrite/size=4096-16   13.4GB/s ± 5%  13.8GB/s ± 1%  +3.30%  (p=0.000 n=10+9)
RecordWrite/size=65536-16  14.4GB/s ± 1%  14.5GB/s ± 2%    ~     (p=0.247 n=10+10)

name                       old alloc/op   new alloc/op   delta
RecordWrite/size=8-16         0.00B          0.00B         ~     (all equal)
RecordWrite/size=16-16        0.00B          0.00B         ~     (all equal)
RecordWrite/size=32-16        0.00B          0.00B         ~     (all equal)
RecordWrite/size=64-16        0.00B          0.00B         ~     (all equal)
RecordWrite/size=256-16       0.00B          0.00B         ~     (all equal)
RecordWrite/size=1028-16      0.00B          0.00B         ~     (all equal)
RecordWrite/size=4096-16      0.00B          0.00B         ~     (all equal)
RecordWrite/size=65536-16     2.00B ± 0%     2.00B ± 0%    ~     (all equal)

name                       old allocs/op  new allocs/op  delta
RecordWrite/size=8-16          0.00           0.00         ~     (all equal)
RecordWrite/size=16-16         0.00           0.00         ~     (all equal)
RecordWrite/size=32-16         0.00           0.00         ~     (all equal)
RecordWrite/size=64-16         0.00           0.00         ~     (all equal)
RecordWrite/size=256-16        0.00           0.00         ~     (all equal)
RecordWrite/size=1028-16       0.00           0.00         ~     (all equal)
RecordWrite/size=4096-16       0.00           0.00         ~     (all equal)
RecordWrite/size=65536-16      0.00           0.00         ~     (all equal)
```

Fixes cockroachdb#1692.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nicktrav nicktrav marked this pull request as ready for review May 23, 2022 23:27
@nicktrav
Copy link
Contributor Author

This is just one approach to solving this issue - honoring the contract that the struct field comment implies re: atomicity.

It looks like there are also some comments around using the commit pipeline mutex for external synchronization (for example), though evidently something is slipping through the cracks are we're seeing the race in Cockroach.

For context, in cockroachdb/cockroach#81133, parallel writes are made to each engine's WAL as part of liveness checking, which can race if the context is cancelled and the log is closed.

Seems like this code in Pebble has been this way since inception, and we're only tickling the issue now due to the nature of the concurrent access introduced in the Cockroach PR.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some gaps in the comments which make it difficult to tease out how this is intended to work. I believe its synchronization is designed for a concurrent single-consumer, single-producer scheme. Each log writer has a single long-lived flushing goroutine. (the consumer). The commit pipeline (d.commit.mu) ensures at most a single writer is adding pending data to be flushed at a time (the producer).

Since we're guaranteed to have at most 1 writer appending data to be flushed, methods that are only used in the writing code path do not need to use atomic loads: There's no potential for another goroutine to write to b.written concurrently, because only the goroutine with d.commit.mu is permitted to write to b.written. However, the flushing goroutine needs to be able to access b.written concurrent with writes, so it uses atomic loads, and the writing goroutine must use atomic stores.

I think the data race is caused by the omission of a commit-pipeline lock in DB.Close. Closing the WAL writer is required to hold the commit pipeline mutex normally (eg, makeRoomForWrite always holds d.commit.mu).

I think there's still a problem though. The (*pebble.DB) contract is that calling any method concurrently with Close is prohibited:

// Close closes the DB.
//
// It is not safe to close a DB until all outstanding iterators are closed
// or to call Close concurrently with any other DB method. It is not valid
// to call any of a DB's methods after the DB has been closed.

If we were to make the adjustment to lock the commit pipeline at the beginning of DB.Close, the sync write batch application will panic when it acquires the mutex.

func (d *DB) Apply(batch *Batch, opts *WriteOptions) error {
	if err := d.closed.Load(); err != nil {
		panic(err)
	}

It seems like we could change this into an ordinary error return, but I'm not sure whether that solves the problem. What happens in cockroachdb/cockroach#81133 if the liveness write returns an error? Can we avoid the race between engine close and the liveness write? cc @erikgrinaker

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @sumeerbhola)

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to make the adjustment to lock the commit pipeline at the beginning of DB.Close, the sync write batch application will panic when it acquires the mutex.

This is not quite right. There's already the risk of a panic if Close() completes its atomic store before the liveness goroutine enters Apply.

It's hard to reason what a concurrent liveness write would do, because the internal state of the database is ~undefined during Close. I think we should find a way to avoid the batch application racing with the closing of the database above Pebble, and then adjust Close to make concurrent operations a little easier to catch. eg, changing the first line of Close() to swap d.closed's value to ErrClosed, increasing the chances a concurrent operation observes the ErrClosed.

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @sumeerbhola)

Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe its synchronization is designed for a concurrent single-consumer, single-producer scheme

This was my read too. The commentary was a little confusing though.

While this patch fixes this particular issue, we're still getting a race (when used in Cockroach) between the following two call sites when we close the writer:

Read:

b := w.block

Write:

w.block = nextBlock

Rather than sprinkling mutexes and atomics around the place in Pebble to cover use-cases above Pebble, I wonder if there's a way we can continue to honor the existing contract and adapt what we're doing above Pebble:

I think we should find a way to avoid the batch application racing with the closing of the database above Pebble,

Basically +1 on this part.

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @sumeerbhola)

Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to close this out as a signal that the code shouldn't be reviewed, but we can continue the conversation here.

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @sumeerbhola)

@nicktrav nicktrav closed this May 24, 2022
@nicktrav nicktrav deleted the nickt.record-atomic branch May 24, 2022 14:54
@petermattis
Copy link
Collaborator

LogWriter.{Write,SyncRecord,Close} need external synchronization. Inside Pebble this synchronization is provided by commitPipeline.mu. It looks like we're not holding commitPipeline.mu correctly in the DB.Close() path.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than sprinkling mutexes and atomics around the place in Pebble to cover use-cases above Pebble, I wonder if there's a way we can continue to honor the existing contract and adapt what we're doing above Pebble:

I think we should find a way to avoid the batch application racing with the closing of the database above Pebble,

Basically +1 on this part.

Ok, I'll add some synchronization in CRDB.

Reviewable status: 0 of 1 files reviewed, all discussions resolved

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cockroachdb/cockroach#81813

Reviewable status: 0 of 1 files reviewed, all discussions resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

record: non atomic access of atomic field
5 participants