common/OpHistory: move insert/cleanup into separate thread #20540

branch-predictor · 2018-02-22T14:35:31Z

Cluster that's flooded with incoming ops (and enabled optracker) is bottlenecked by OpHistory::insert. Reduce that by:

pushing incoming ops into separate queue that'll be processed by separate thread.
using std::atomic_bool for shutdown flag so ops_history_lock doesn't need to be taken as often

My initial testing has shown this noticeably reduced optracker impact on cluster perfornance:

Using separate thread ("threaded optracker") didn't improve things by much, neither did replacing OpHistorySvc thread mutex with spinlock ("threaded optracker + spin"). Removing the ops_history_lock from the processing path (by either removing it entirely or replacing shutdown bool flag with atomic) did the trick and the optracker perf impact is still there, albeit much smaller.
Note that I intentionally used spin loop with scaling sleep, as conditional variables/signaling turned out to be too slow for this purpose and it actually made it work much worse. Side effect of scaling sleep is that it reduces cpu time consumed by OpHistorySvc thread as it processes data in batches. This might incur some data latency in OpHistory, but up to around 128ms - data is still guaranteed to go in FIFO order.

Signed-off-by: Piotr Dałek piotr.dalek@corp.ovh.com

Replace push_back with explicit constructor with push_back for minor perf increase. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

No need to do this twice. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

liewegas · 2018-02-22T14:48:54Z

src/common/TrackedOp.h

+      _break_thread(false) { }
+
+  void BreakThread();
+  void InsertOp(utime_t& now, TrackedOpRef op);


lower_case not CamelCaps for these please

Oops, forgot.

liewegas · 2018-02-22T14:49:52Z

src/common/TrackedOp.cc

+    return;
+
+  opsvc.InsertOp(now, op);
+}


would it make sense to put this function in teh header so the bool check gets inlined? that'll let us skip one additional stack frame/function call

I'll check. Maybe it'll make sense to inline also OpServiceThread::insert_op().

liewegas

awesome to see this mitigates most of the current optracker overhead! just a few style nits

gregsfortytwo

Yep, good design but a few more nits. :)

gregsfortytwo · 2018-02-23T01:44:39Z

src/common/TrackedOp.h

  ~OpHistory() {
    assert(arrived.empty());
    assert(duration.empty());
    assert(slow_op.empty());
  }
-  void insert(utime_t now, TrackedOpRef op);
+  void insert(utime_t& now, TrackedOpRef op);
+  void _insert_delayed(utime_t& now, TrackedOpRef op);


Please make any pass-by-reference values const. (It's part of our style guide and assures users that the function won't modify their value.)

gregsfortytwo · 2018-02-23T01:47:19Z

src/common/TrackedOp.cc

+  opsvc.InsertOp(now, op);
+}
+
+void OpHistory::_insert_delayed(utime_t& now, TrackedOpRef op)


Can we come up with a better name than _insert_delayed(), given that it's now finished being delayed? I think I'd prefer even just _insert() if nothing better suggests itself.

gregsfortytwo · 2018-02-23T01:50:37Z

src/common/TrackedOp.cc

+
+void OpHistoryServiceThread::InsertOp(utime_t& now, TrackedOpRef op) {
+  queue_spinlock.lock();
+  _external_queue.emplace_back(now, op);


This allocates, right? Is using a mutex really not okay here?

I wanted threads to wait for lock to be freed without being preempted. See the graph, there's a slight difference in how it affects machinery performance.

gregsfortytwo · 2018-02-23T22:49:26Z

gregsfortytwo wrote

This allocates, right? Is using a mutex really not okay here?

branch-predictor wrote

I wanted threads to wait for lock to be freed without being preempted. See the graph, there's a slight difference in how it affects machinery performance.

Yeah, I get that, but there are reasons people tell you not to use spinlocks when memory allocation might happen. So I wonder if we can do some trick to allocate the memory and then put it on the end of the list. I'm probably worrying about it too much given the simple case presented here, though.

(Or, probably out of scope here, but did you consider giving a separate queue to each OSD op thread and having the OpTracker one pick off the front of each? That would greatly reduce the sharing...I notice you emphasize the preserved ordering but the OpTracker worker could handle that by walking forward with timestamp comparisons, and I don't think it's that big a deal anyway.)

branch-predictor · 2018-02-26T09:14:20Z

gregsfortytwo wrote

gregsfortytwo wrote
This allocates, right? Is using a mutex really not okay here?
branch-predictor wrote
I wanted threads to wait for lock to be freed without being preempted. See the graph, there's a slight difference in how it affects machinery performance.
Yeah, I get that, but there are reasons people tell you not to use spinlocks when memory allocation might happen. So I wonder if we can do some trick to allocate the memory and then put it on the end of the list. I'm probably worrying about it too much given the simple case presented here, though.

That's what I'm thinking too. I mean, sure - your concerns are perfectly valid, but it's not like I'm allocating megabytes or even kilobytes of data. Just a few hundred of bytes max, should be little enough to not get slowed down badly by memory allocator.

(Or, probably out of scope here, but did you consider giving a separate queue to each OSD op thread and having the OpTracker one pick off the front of each? That would greatly reduce the sharing...I notice you emphasize the preserved ordering but the OpTracker worker could handle that by walking forward with timestamp comparisons, and I don't think it's that big a deal anyway.)

Yeah, I've been thinking about it too, but at this point I think it's too early for that. Let's see how far this one takes us, and then let's optimize further. I already see few possibilities to optimize it without complicating matters much.

Cluster that's flooded with incoming ops (and enabled optracker) is bottlenecked by OpHistory::insert. Reduce that by: - pushing incoming ops into separate queue that'll be processed by separate thread. - using std::atomic_bool for shutdown flag so ops_history_lock doesn't need to be taken as often Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

It's unused anyway. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

Now that it has its own processing thread, it must be shut down explicitly or it'll sigsegv randomly. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

Reorder smaller fields around so they're aligning naturally, regaining a few bytes of storage. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

gregsfortytwo · 2018-02-26T22:52:05Z

Be nice to squash a few of those down, but looks good.
@liewegas, looks like he got your named issues, can you approve?

gregsfortytwo · 2018-02-26T22:53:28Z

Eek, wrong button.

tchaikov · 2018-02-28T01:29:42Z

http://pulpito.ceph.com/kchai-2018-02-27_10:33:49-rados-wip-kefu-testing-2018-02-27-1348-distro-basic-mira/

branch-predictor · 2018-02-28T08:08:29Z

@liewegas @gregsfortytwo does this qualify for backport to luminous?

gregsfortytwo · 2018-02-28T21:51:38Z

I don't have strong feelings either way. It probably qualifies but it's enough of a change I wouldn't do it immediately or casually, I guess?

liewegas · 2018-02-28T22:15:07Z

Yeah, let's give it some time in master to make sure there isn't fallout before backporting

branch-predictor added 2 commits February 20, 2018 14:47

common/TrackedOp: use emplace_back

8841540

Replace push_back with explicit constructor with push_back for minor perf increase. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

common/TrackedOp: get duration just once when inserting

046f635

No need to do this twice. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

liewegas reviewed Feb 22, 2018

View reviewed changes

liewegas requested changes Feb 22, 2018

View reviewed changes

liewegas added common performance labels Feb 22, 2018

gregsfortytwo requested changes Feb 23, 2018

View reviewed changes

branch-predictor force-pushed the bp-optracker-cleanup branch from 4a368ab to 313a35c Compare February 23, 2018 13:08

branch-predictor force-pushed the bp-optracker-cleanup branch from abbbcd2 to 52d8f09 Compare February 26, 2018 09:22

branch-predictor added 4 commits February 26, 2018 15:27

mon/OSDMonitor: remove op_tracker

6b75ee5

It's unused anyway. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

mon/Monitor: add missing shutdown of OpTracker

b4ba5e7

Now that it has its own processing thread, it must be shut down explicitly or it'll sigsegv randomly. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

common/OpTracker: reorder fields

136642d

Reorder smaller fields around so they're aligning naturally, regaining a few bytes of storage. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

branch-predictor force-pushed the bp-optracker-cleanup branch from 52d8f09 to 136642d Compare February 26, 2018 14:27

gregsfortytwo approved these changes Feb 26, 2018

View reviewed changes

gregsfortytwo added the needs-qa label Feb 26, 2018

gregsfortytwo closed this Feb 26, 2018

gregsfortytwo reopened this Feb 26, 2018

liewegas approved these changes Feb 26, 2018

View reviewed changes

tchaikov added the wip-kefu-testing label Feb 27, 2018

tchaikov merged commit 1b8ad4a into ceph:master Feb 28, 2018

rzarzynski mentioned this pull request Mar 5, 2018

common: optimize OpTracker #20702

Closed

branch-predictor deleted the bp-optracker-cleanup branch May 27, 2019 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common/OpHistory: move insert/cleanup into separate thread #20540

common/OpHistory: move insert/cleanup into separate thread #20540

branch-predictor commented Feb 22, 2018

liewegas Feb 22, 2018

branch-predictor Feb 23, 2018

liewegas Feb 22, 2018

branch-predictor Feb 23, 2018

liewegas left a comment

gregsfortytwo left a comment

gregsfortytwo Feb 23, 2018

gregsfortytwo Feb 23, 2018

gregsfortytwo Feb 23, 2018

branch-predictor Feb 23, 2018

gregsfortytwo commented Feb 23, 2018

branch-predictor commented Feb 26, 2018

gregsfortytwo commented Feb 26, 2018

gregsfortytwo commented Feb 26, 2018

tchaikov commented Feb 28, 2018

branch-predictor commented Feb 28, 2018

gregsfortytwo commented Feb 28, 2018

liewegas commented Feb 28, 2018

common/OpHistory: move insert/cleanup into separate thread #20540

common/OpHistory: move insert/cleanup into separate thread #20540

Conversation

branch-predictor commented Feb 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liewegas left a comment

Choose a reason for hiding this comment

gregsfortytwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsfortytwo commented Feb 23, 2018

branch-predictor commented Feb 26, 2018

gregsfortytwo commented Feb 26, 2018

gregsfortytwo commented Feb 26, 2018

tchaikov commented Feb 28, 2018

branch-predictor commented Feb 28, 2018

gregsfortytwo commented Feb 28, 2018

liewegas commented Feb 28, 2018