New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: add bluestore_prefer_wal_size option #13217

Merged
merged 2 commits into from Mar 8, 2017

Conversation

Projects
None yet
4 participants
@liewegas
Member

liewegas commented Feb 1, 2017

Add option to prefer a WAL write if the write is below a size threshold,
even if we could avoid it. This lets you trade some write-amp (by
journaling data to rocksdb) for latency in cases where the WAL device is
much faster than the main device.

This affects:

  • writes to new extents locations below min_alloc_size
  • writes to unallocated space below min_alloc_size
  • "big" writes above min_alloc_size that are below the prefer_wal_size
    threshold.

Note that it's applied to individual blobs, not the entirety of the write,
so if your have a larger write torn into two pieces/blobs that are below
the threshold then they will both go through the wal.

Also, do not wake kv sync thread to retire wal events; do that lazily or on flush() (~50 -> ~100 IOPS)

@liewegas liewegas requested a review from ifed01 Feb 1, 2017

@ifed01

ifed01 approved these changes Feb 1, 2017

@ifed01

This comment has been minimized.

Show comment
Hide comment
@ifed01

ifed01 Feb 1, 2017

Contributor

needs rebase

Contributor

ifed01 commented Feb 1, 2017

needs rebase

@varadakari

This comment has been minimized.

Show comment
Hide comment
@varadakari

varadakari Feb 2, 2017

Contributor

some getattr failures in the make check.

Contributor

varadakari commented Feb 2, 2017

some getattr failures in the make check.

@liewegas liewegas added needs-qa and removed needs-qa labels Feb 2, 2017

@liewegas

This comment has been minimized.

Show comment
Hide comment
@liewegas

liewegas Mar 5, 2017

Member

rebased, and updated to avoid spurious kv thread wakes+commits just to retire wal records. now 2x as fast as before when qd=1

Member

liewegas commented Mar 5, 2017

rebased, and updated to avoid spurious kv thread wakes+commits just to retire wal records. now 2x as fast as before when qd=1

@liewegas

This comment has been minimized.

Show comment
Hide comment
@liewegas

liewegas Mar 6, 2017

Member

Now 4x faster (400 IOPS) with the wal aio dispatch batching!

Member

liewegas commented Mar 6, 2017

Now 4x faster (400 IOPS) with the wal aio dispatch batching!

liewegas added some commits Feb 2, 2017

os/bluestore: add bluestore_prefer_wal_size[_hdd,_ssd] options
Add option to prefer a WAL write if the write is below a size threshold,
even if we could avoid it.  This lets you trade some write-amp (by
journaling data to rocksdb) for latency in cases where the WAL device is
much faster than the main device.

This affects:

 - writes to new extents locations below min_alloc_size
 - writes to unallocated space below min_alloc_size
 - "big" writes above min_alloc_size that are below the prefer_wal_size
   threshold.

Note that it's applied to individual blobs, not the entirety of the write,
so if your have a larger write torn into two pieces/blobs that are below
the threshold then they will both go through the wal.

Set different defaults for HDD and SSD, since this makes more sense for HDD
where seeks are expensive.

Add some test cases to exercise the option.

Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore: drop unused OpSequencer::wait_for_wal_on_seq()
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas

This comment has been minimized.

Show comment
Hide comment
@liewegas

liewegas Mar 7, 2017

Member

Okay, there are more fundamental problems with the wal writes:

#. we don't prevent multiple in-flight wal writes to the same block (the block layer is unordered)
#. we don't prevent a block with an in-flight wal write from being deallocated (except in the most trivial cases).

I also have a bunch of cleanups to flush() and friends, but I'll wrap those into a larger bugfix. In the meantime, making this change minimal to address just the prefer_wal_size case!

Member

liewegas commented Mar 7, 2017

Okay, there are more fundamental problems with the wal writes:

#. we don't prevent multiple in-flight wal writes to the same block (the block layer is unordered)
#. we don't prevent a block with an in-flight wal write from being deallocated (except in the most trivial cases).

I also have a bunch of cleanups to flush() and friends, but I'll wrap those into a larger bugfix. In the meantime, making this change minimal to address just the prefer_wal_size case!

@liewegas liewegas added the needs-qa label Mar 7, 2017

@liewegas

This comment has been minimized.

Show comment
Hide comment
@liewegas

liewegas Mar 8, 2017

Member

...aaand dropping the no-wakeup commit since it can also easily deadlock. I'll include it with the flush cleanup later.

Member

liewegas commented Mar 8, 2017

...aaand dropping the no-wakeup commit since it can also easily deadlock. I'll include it with the flush cleanup later.

@liewegas liewegas merged commit 92bacb6 into ceph:master Mar 8, 2017

2 of 3 checks passed

default Build finished.
Details
Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details

@liewegas liewegas deleted the liewegas:wip-bluestore-prefer-wal-size branch Mar 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment