Skip to content

Commit

Permalink
Merge pull request #7767 from athanatos/wip-sam-journal-throttle-4
Browse files Browse the repository at this point in the history
osd: filestore: restructure journal and op queue throttling

Reviewed-by: Samuel Just <sjust@redhat.com>
  • Loading branch information
athanatos committed Mar 1, 2016
2 parents efd7625 + 554b643 commit c1e41af
Show file tree
Hide file tree
Showing 17 changed files with 991 additions and 113 deletions.
94 changes: 91 additions & 3 deletions doc/dev/osd_internals/osd_throttles.rst
@@ -1,5 +1,93 @@
===============
OSD Internals
===============
=============
OSD Throttles
=============

There are three significant throttles in the filestore: wbthrottle,
op_queue_throttle, and a throttle based on journal usage.

WBThrottle
----------
The WBThrottle is defined in src/os/filestore/WBThrottle.[h,cc] and
included in FileStore as FileStore::wbthrottle. The intention is to
bound the amount of outstanding IO we need to do to flush the journal.
At the same time, we don't want to necessarily do it inline in case we
might be able to combine several IOs on the same object close together
in time. Thus, in FileStore::_write, we queue the fd for asyncronous
flushing and block in FileStore::_do_op if we have exceeded any hard
limits until the background flusher catches up.

The relevant config options are filestore_wbthrottle*. There are
different defaults for btrfs and xfs. Each set has hard and soft
limits on bytes (total dirty bytes), ios (total dirty ios), and
inodes (total dirty fds). The WBThrottle will begin flushing
when any of these hits the soft limit and will block in throttle()
while any has exceeded the hard limit.

Tighter soft limits will cause writeback to happen more quickly,
but may cause the OSD to miss oportunities for write coalescing.
Tighter hard limits may cause a reduction in latency variance by
reducing time spent flushing the journal, but may reduce writeback
parallelism.

op_queue_throttle
-----------------
The op queue throttle is intended to bound the amount of queued but
uncompleted work in the filestore by delaying threads calling
queue_transactions more and more based on how many ops and bytes are
currently queued. The throttle is taken in queue_transactions and
released when the op is applied to the filesystem. This period
includes time spent in the journal queue, time spent writing to the
journal, time spent in the actual op queue, time spent waiting for the
wbthrottle to open up (thus, the wbthrottle can push back indirectly
on the queue_transactions caller), and time spent actually applying
the op to the filesystem. A BackoffThrottle is used to gradually
delay the queueing thread after each throttle becomes more than
filestore_queue_low_threshhold full (a ratio of
filestore_queue_max_(bytes|ops)). The throttles will block once the
max value is reached (filestore_queue_max_(bytes|ops)).

The significant config options are:
filestore_queue_low_threshhold
filestore_queue_high_threshhold
filestore_expected_throughput_ops
filestore_expected_throughput_bytes
filestore_queue_high_delay_multiple
filestore_queue_max_delay_multiple

While each throttle is at less than low_threshhold of the max,
no delay happens. Between low and high, the throttle will
inject a per-op delay (per op or byte) ramping from 0 at low to
high_delay_multiple/expected_throughput at high. From high to
1, the delay will ramp from high_delay_multiple/expected_throughput
to max_delay_multiple/expected_throughput.

filestore_queue_high_delay_multiple and
filestore_queue_max_delay_multiple probably do not need to be
changed.

Setting these properly should help to smooth out op latencies by
mostly avoiding the hard limit.

See FileStore::throttle_ops and FileSTore::thottle_bytes.

journal usage throttle
----------------------
See src/os/filestore/JournalThrottle.h/cc

The intention of the journal usage throttle is to gradually slow
down queue_transactions callers as the journal fills up in order
to smooth out hiccup during filestore syncs. JournalThrottle
wraps a BackoffThrottle and tracks journaled but not flushed
journal entries so that the throttle can be released when the
journal is flushed. The configs work very similarly to the
op_queue_throttle.

The significant config options are:
journal_throttle_low_threshhold
journal_throttle_high_threshhold
filestore_expected_throughput_ops
filestore_expected_throughput_bytes
journal_throttle_high_multiple
journal_throttle_max_multiple

.. literalinclude:: osd_throttles.txt
6 changes: 3 additions & 3 deletions doc/dev/osd_internals/osd_throttles.txt
@@ -1,11 +1,11 @@
Messenger throttle (number and size)
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
FileStore op_queue throttle (number and size)
FileStore op_queue throttle (number and size, includes a soft throttle based on filestore_expected_throughput_(ops|bytes))
|--------------------------------------------------------|
WBThrottle
|---------------------------------------------------------------------------------------------------------|
Journal (size)
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
Journal (size, includes a soft throttle based on filestore_expected_throughput_bytes)
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|----------------------------------------------------------------------------------------------------> flushed ----------------> synced
|
Op: Read Header --DispatchQ--> OSD::_dispatch --OpWQ--> PG::do_request --journalq--> Journal --FileStore::OpWQ--> Apply Thread --Finisher--> op_applied -------------------------------------------------------------> Complete
Expand Down
1 change: 1 addition & 0 deletions src/CMakeLists.txt
Expand Up @@ -638,6 +638,7 @@ set(libos_srcs
os/filestore/DBObjectMap.cc
os/filestore/FileJournal.cc
os/filestore/FileStore.cc
os/filestore/JournalThrottle.cc
os/filestore/GenericFileStoreBackend.cc
os/filestore/JournalingObjectStore.cc
os/filestore/HashIndex.cc
Expand Down
180 changes: 180 additions & 0 deletions src/common/Throttle.cc
Expand Up @@ -2,6 +2,7 @@
// vim: ts=8 sw=2 smarttab

#include <errno.h>
#include <thread>

#include "common/Throttle.h"
#include "common/dout.h"
Expand Down Expand Up @@ -237,6 +238,185 @@ int64_t Throttle::put(int64_t c)
return count.read();
}

bool BackoffThrottle::set_params(
double _low_threshhold,
double _high_threshhold,
double _expected_throughput,
double _high_multiple,
double _max_multiple,
uint64_t _throttle_max,
ostream *errstream)
{
bool valid = true;
if (_low_threshhold > _high_threshhold) {
valid = false;
if (errstream) {
*errstream << "low_threshhold (" << _low_threshhold
<< ") > high_threshhold (" << _high_threshhold
<< ")" << std::endl;
}
}

if (_high_multiple > _max_multiple) {
valid = false;
if (errstream) {
*errstream << "_high_multiple (" << _high_multiple
<< ") > _max_multiple (" << _max_multiple
<< ")" << std::endl;
}
}

if (_low_threshhold > 1 || _low_threshhold < 0) {
valid = false;
if (errstream) {
*errstream << "invalid low_threshhold (" << _low_threshhold << ")"
<< std::endl;
}
}

if (_high_threshhold > 1 || _high_threshhold < 0) {
valid = false;
if (errstream) {
*errstream << "invalid high_threshhold (" << _high_threshhold << ")"
<< std::endl;
}
}

if (_max_multiple < 0) {
valid = false;
if (errstream) {
*errstream << "invalid _max_multiple ("
<< _max_multiple << ")"
<< std::endl;
}
}

if (_high_multiple < 0) {
valid = false;
if (errstream) {
*errstream << "invalid _high_multiple ("
<< _high_multiple << ")"
<< std::endl;
}
}

if (_expected_throughput < 0) {
valid = false;
if (errstream) {
*errstream << "invalid _expected_throughput("
<< _expected_throughput << ")"
<< std::endl;
}
}

if (!valid)
return false;

locker l(lock);
low_threshhold = _low_threshhold;
high_threshhold = _high_threshhold;
high_delay_per_count = _high_multiple / _expected_throughput;
max_delay_per_count = _max_multiple / _expected_throughput;
max = _throttle_max;

if (high_threshhold - low_threshhold > 0) {
s0 = high_delay_per_count / (high_threshhold - low_threshhold);
} else {
low_threshhold = high_threshhold;
s0 = 0;
}

if (1 - high_threshhold > 0) {
s1 = (max_delay_per_count - high_delay_per_count)
/ (1 - high_threshhold);
} else {
high_threshhold = 1;
s1 = 0;
}

_kick_waiters();
return true;
}

std::chrono::duration<double> BackoffThrottle::_get_delay(uint64_t c) const
{
if (max == 0)
return std::chrono::duration<double>(0);

double r = ((double)current) / ((double)max);
if (r < low_threshhold) {
return std::chrono::duration<double>(0);
} else if (r < high_threshhold) {
return c * std::chrono::duration<double>(
(r - low_threshhold) * s0);
} else {
return c * std::chrono::duration<double>(
high_delay_per_count + ((r - high_threshhold) * s1));
}
}

std::chrono::duration<double> BackoffThrottle::get(uint64_t c)
{
locker l(lock);
auto delay = _get_delay(c);

// fast path
if (delay == std::chrono::duration<double>(0) &&
waiters.empty() &&
((max == 0) || (current == 0) || ((current + c) <= max))) {
current += c;
return std::chrono::duration<double>(0);
}

auto ticket = _push_waiter();

while (waiters.begin() != ticket) {
(*ticket)->wait(l);
}

auto start = std::chrono::system_clock::now();
delay = _get_delay(c);
while (((start + delay) > std::chrono::system_clock::now()) ||
!((max == 0) || (current == 0) || ((current + c) <= max))) {
assert(ticket == waiters.begin());
(*ticket)->wait_until(l, start + delay);
delay = _get_delay(c);
}
waiters.pop_front();
_kick_waiters();

current += c;
return std::chrono::system_clock::now() - start;
}

uint64_t BackoffThrottle::put(uint64_t c)
{
locker l(lock);
assert(current >= c);
current -= c;
_kick_waiters();
return current;
}

uint64_t BackoffThrottle::take(uint64_t c)
{
locker l(lock);
current += c;
return current;
}

uint64_t BackoffThrottle::get_current()
{
locker l(lock);
return current;
}

uint64_t BackoffThrottle::get_max()
{
locker l(lock);
return max;
}

SimpleThrottle::SimpleThrottle(uint64_t max, bool ignore_enoent)
: m_lock("SimpleThrottle"),
m_max(max),
Expand Down

0 comments on commit c1e41af

Please sign in to comment.