New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: add disk-full roachtest #32674

Merged
merged 3 commits into from Nov 30, 2018

Conversation

Projects
None yet
4 participants
@petermattis
Copy link
Contributor

petermattis commented Nov 28, 2018

RocksDB was missing some error checks on the code paths involved in FlushWAL. The result was that writes that failed when the disk is full were erroneously being considered success.

Fixes #32631
Fixes #25173

Release note (performance improvement): Re-enable usage of RocksDB FlushWAL which is a minor performance improvement for synchronous RocksDB write operations.

@petermattis petermattis requested a review from cockroachdb/core-prs as a code owner Nov 28, 2018

@cockroach-teamcity

This comment has been minimized.

Copy link
Member

cockroach-teamcity commented Nov 28, 2018

This change is Reviewable

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Nov 28, 2018

Unfortunately, this doesn't reproduce the Raft log corruption. The disk becomes full, cockroach crashes, we clear the disk full condition and restart cockroach. Everything is fine.

@petermattis petermattis force-pushed the petermattis:pmattis/disk-full branch 2 times, most recently from 3f2c6e9 to 538c6fa Nov 28, 2018

@petermattis petermattis changed the title [DO NOT MERGE] roachtest: add disk_full roachtest [DO NOT MERGE] roachtest: add disk-full roachtest Nov 28, 2018

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Nov 28, 2018

The test now reliably reproduces the Raft log corruption. Time to figure out what is going on with FlushWAL.

@petermattis petermattis force-pushed the petermattis:pmattis/disk-full branch from 538c6fa to 0434c98 Nov 29, 2018

@petermattis petermattis changed the title [DO NOT MERGE] roachtest: add disk-full roachtest roachtest: add disk-full roachtest Nov 29, 2018

@petermattis petermattis force-pushed the petermattis:pmattis/disk-full branch from 0434c98 to 5167ae8 Nov 29, 2018

@petermattis petermattis requested a review from cockroachdb/build-prs as a code owner Nov 29, 2018

@petermattis petermattis force-pushed the petermattis:pmattis/disk-full branch 2 times, most recently from d1e779e to b400134 Nov 29, 2018

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Nov 29, 2018

This is ready for a look. Note that I'm claiming this fixes #25173 as the occurrences we've seen there all seem to be after we started using FlushWAL.

@petermattis petermattis requested review from bdarnell and benesch Nov 29, 2018

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Nov 29, 2018

Hold off on the review. I'm running the test in a loop and just hit another crash. Really hoping this is user error.

@petermattis petermattis force-pushed the petermattis:pmattis/disk-full branch from b400134 to 886e7e0 Nov 30, 2018

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Nov 30, 2018

After updating the RocksDB cherry-picks this looks good to go (for realz this time). I'm still running the disk-full roachtest in a loop, but it has now passed 12 times in a row while previously it would fail about 1 of 3 runs.

@petermattis petermattis force-pushed the petermattis:pmattis/disk-full branch from 886e7e0 to a018f72 Nov 30, 2018

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Nov 30, 2018

I'm still running the disk-full roachtest in a loop.

Still running in a loop, though now I'm up over 50 successful runs. I'm very confident that the swallowing of the error in DB::FlushWAL was the problem.

petermattis added some commits Nov 28, 2018

roachtest: add disk-full roachtest
See #32631

Release note: None
Revert "Revert "libroach: use FlushWAL instead of SyncWAL""
This reverts commit a485597.

Release note (performance improvement): Re-enable usage of RocksDB
FlushWAL which is a minor performance improvement for synchronous
RocksDB write operations.

@petermattis petermattis force-pushed the petermattis:pmattis/disk-full branch from a018f72 to cd71838 Nov 30, 2018

@petermattis

This comment has been minimized.

Copy link
Contributor

petermattis commented Nov 30, 2018

bors r=bdarnell

craig bot pushed a commit that referenced this pull request Nov 30, 2018

Merge #32674
32674: roachtest: add disk-full roachtest r=bdarnell a=petermattis

RocksDB was missing some error checks on the code paths involved in `FlushWAL`. The result was that writes that failed when the disk is full were erroneously being considered success. 

Fixes #32631
Fixes #25173

Release note (performance improvement): Re-enable usage of RocksDB FlushWAL which is a minor performance improvement for synchronous RocksDB write operations.

Co-authored-by: Peter Mattis <petermattis@gmail.com>
@tbg

tbg approved these changes Nov 30, 2018

Copy link
Member

tbg left a comment

LGTM

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 4 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/cmd/roachtest/disk_full.go, line 61 at r3 (raw file):

				// command to exit with an error. The "|| true" is used to ignore that
				// error.
				c.Run(ctx, c.Node(n), "./cockroach debug ballast {store-dir}/ballast --size=100% || true")

nit: this seems a little dirty, can't you do something like ./cockroach debug ballast --size=-10mb? Though I guess that can also fail with an error.

@craig

This comment has been minimized.

Copy link

craig bot commented Nov 30, 2018

Build succeeded

@craig craig bot merged commit cd71838 into cockroachdb:master Nov 30, 2018

3 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details
@petermattis
Copy link
Contributor

petermattis left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/cmd/roachtest/disk_full.go, line 61 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: this seems a little dirty, can't you do something like ./cockroach debug ballast --size=-10mb? Though I guess that can also fail with an error.

I originally used --size=99%, but frequently saw the ballast command failing due to running out of disk space. The || true is dirty, but foolproof.

@petermattis petermattis deleted the petermattis:pmattis/disk-full branch Nov 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment